-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find by jobid and output aggregate filter data #1322
base: master
Are you sure you want to change the base?
Conversation
9952079
to
ed2e53b
Compare
0ca9f9b
to
3ddb74b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just gave it a brief look-through, how are you feeling about getting this in by Tuesday which is I think the deadline for getting the RPM into the next TOSS version?
I'd lie to have @trws feedback before merging this PR. |
Problem: Partial cancellation is failing under rabbit-related circumstances, possibly orphaning job-related data on allocated vertices. Having a way to find all vertices with jobid-related data would be very useful in diagnosing the problem. Furthermore, it is likely the partial cancellation failures are related to aggregate filter (pruning filter) counts. Add capability to the evaulator to validate and evaluate by jobid, span, tag, reservation, allocation, and agfilter.
Problem: a jobid and aggregate filter boolean are now supported in the evaluator and validator. The evaluator needs support for returning the parsed jobid and agfilter bool to the traverser, so it can change behavior based on whether a jobid or agfilter boolean is present in the evaluation string. Add the capability to extract the jobid as a uint64_t and agfilter as a bool and return them to the client.
Problem: the evaluator unit tests do not support extraction and the new options. Add support in the tests and test cases for the new options.
Problem: the issues with partial cancel have exposed the need for printing the state of resource graph vertices based on the jobid and aggregate filter. Add traverser support for gathering agfilter utilization and total counts for all resources via updated function calls.
Problem: a planner_multi C API function that returns the number of resources utilized by a jobid by resource type is needed to write out aggregate filter counts. Add a new C API function to planner_multi that returns the number of resources utilized by a jobid by calling the planner C API function for the specified resource type index.
Problem: the writer needs support for outputting the aggregate filter counts by type. Add the capability to write out the aggregate filter state by passing the data to map2json.
Problem: after extracting the jobid and/or agfilter boolean with the evaluator, the traverser needs to fetch the aggregate filter counts by jobid (if the ID is present in the find string) or total utilization. Add conditional behavior to fetch appropriate aggregate filter data depending on the presence of jobid and the value of the agfilter boolean.
Problem: there is a typo in the planner C API function to reduce a span. Fix the typo.
Problem: the testsuite doesn't have tests for the new find behavior Add tests.
I updated the PR to include an alternative method of extracting This PR is no longer WIP and is ready for review. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1322 +/- ##
========================================
+ Coverage 75.3% 75.5% +0.2%
========================================
Files 111 111
Lines 16042 16253 +211
========================================
+ Hits 12081 12276 +195
- Misses 3961 3977 +16
|
Errors in partial cancel are causing inability to schedule resources in production. While evidence points to orphaned (i.e., not cleared) vertex data such as aggregate filters, it is difficult to investigate and debug without being able to inspect the state of the vertices.
The PR adds capability to print vertex-specific data by
jobid
and anagfilter
boolean. The state can be printed based on whether the jobid is in theallocations
,reservations
,job2span
, andtags
maps.This PR is WIP because I need to add support for all writers.