Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove new/assignment-approved from the ReqMgr2/WMStats list of active workflows #11263

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

amaltaro
Copy link
Contributor

Fixes #11246

Status

not-tested

Description

Changes the definition of ACTIVE workflows in ReqMgr2, which now excludes workflows in status new or assignment-approved. This will affect this REST API (with this exact query string):
https://cmsweb-testbed.cern.ch/reqmgr2/data/request?status=ACTIVE

The same change now applies to the WMStatsServer DataCache CherryPy thread, which will no longer pull data for workflows in status new or assignment-approved.

Lastly, it will also affect WMStatsServer REST APIs like: requestcache, filtered_requests, protectedlfns, protectedlfns_final and globallocks, which will no longer provide data for requests in the 2 statuses aforementioned.

Is it backward compatible (if not, which system it affects?)

NO

Related PRs

None

External dependencies / deployment changes

None

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 23 warnings
    • 83 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 6 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13560/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

I have mixed feelings with this development!

WMStatsServer actually provides a nice request (and job detail) caching mechanism, meant to reduce traffic going to the backend database (CouchDB). With this development, our cache becomes smaller, which is good and bad at the same time.

A better development would be to actually change the data structure within the DataCache object, likely moving from a giant dictionary to a list of dictionaries. In addition to that, we should separate the job details in its own REST endpoint (and object in the DataCache). The impact this though goes a bit deeper, like:

  • clients consuming requestcache would have to be updated
  • WMStats CouchApp would have to be updated
  • and maybe something in Unified and/or the I&T tools for autoACDC and etc.

@vkuznet
Copy link
Contributor

vkuznet commented Aug 31, 2022

To reduce processing overhead with large JSON, either single dict or list of dicts, you should consider using application/ndjson MIME data-type which provide dicts without list, e.g.

{dict1}
{dict2}
...
{dictN}

In other words, it is list of dicts without list brackets where every dict is separated by newline. This format allows to reduce RAM utilization on a client side to size of one dict. On contrary, the list of dicts will still required to allocate RAM for all N dicts until JSON parser finishes its job. I already provided support for applicatin/ndjson in DBS and you should follow the trend. Please note, such data format should be returned if client asks for it, i.e. client should have a choice to get json (list of dicts) or ndjson (dicts separated by newline). The nice thing about this format is that on a client side you can read the stream since each row represent single dict, and you JSON parser will only read one dict at a time. That implies that RAM usage can only be used as size of the single dict. Once back-end server can provide such format clients can take advantage of it to improve their processing pipeline and reduce resource utilization.

@amaltaro
Copy link
Contributor Author

Yes, this mime type would make the system more robust and make it cheaper for clients to retrieve data.

Note though that it won't affect the server, because over there we need to cache all the data that can be served to the clients, thus memory footprint would still not be small.

@vkuznet
Copy link
Contributor

vkuznet commented Aug 31, 2022

Alan, it really depends on server architecture and implementation. If server fetches data from back-end (CouchDB or data cache) as a stream and stream it back to the client you may reduce its RAM overhead. The example is DBSReader, it reads data from ORACLE as a stream and pass it to the client, therefore its RAM usage is very low, O(100MB), regardless of amount of data requested by the client.

@amaltaro
Copy link
Contributor Author

amaltaro commented Sep 9, 2022

test this please

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 3 new failures
  • Python3 Pylint check: failed
    • 23 warnings
    • 83 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 6 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13575/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Can one of the admins verify this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove new/assignment-approve statuses from the list of ACTIVE status
3 participants