Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set output datasets to VALID in DBS3 before announcing a standard workflow #10394

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

haozturk
Copy link

@haozturk haozturk commented Mar 31, 2021

Fixes #9837

Status

DBS API calls are tested locally in a WMAgent env. (1.4.3.patch2), however the rest of the changes is not tested

Description

This PR includes the migration of one feature from Unified to WMCore: Setting output datasets to VALID in DBS3 before announcing a standard workflow.

This feature is implemented as a part of the MSOutput pipeline. In this PR setDBSStatus is the first step of the pipeline (before making actual output data placement) and also note that this functionality works for both RelVal and non-RelVal workflows.

Check dmwm/deployment#1044 for the configuration of this feature. Note that enableDbsStatusChange is False, right now. Because, we need to coordinate for disabling the feature in Unified and enabling in WMCore. Soon, I'll come up a PR to disable it in Unified.

Is it backward compatible (if not, which system it affects?)

yes

Related PRs

Not for now - soon a related PR in https://github.com/CMSCompOps/WmAgentScripts will come.

External dependencies / deployment changes

DBS

@amaltaro @todor-ivanov As I mentioned, the feature is not tested properly, it would be great if you can help me testing it.

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 3 new failures
  • Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 11 warnings
    • 34 comments to review
  • Pylint py3k check: succeeded
    • 0 errors and warnings that should be fixed
    • 0 warnings
    • 0 comments to review
  • Pycodestyle check: succeeded
    • 13 comments to review
  • Python3 compatibility checks: succeeded
    • there are suggested fixes for newer python3 idioms

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/11576/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor

@haozturk thanks for providing this new feature.

I have one general comment/request though, could you please move the getter/setter methods to our DBS3 wrapper module:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Services/DBS/DBS3Reader.py#L843

and use our wrapper instead of making direct calls with the DbsApi object? I'm sorry that it wasn't made explicit in the GH issue.

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: succeeded
  • Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 11 warnings
    • 39 comments to review
  • Pylint py3k check: succeeded
    • 0 errors and warnings that should be fixed
    • 0 warnings
    • 0 comments to review
  • Pycodestyle check: succeeded
    • 13 comments to review
  • Python3 compatibility checks: succeeded
    • there are suggested fixes for newer python3 idioms

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/11578/artifact/artifacts/PullRequestReport.html

@haozturk
Copy link
Author

haozturk commented Apr 8, 2021

@amaltaro should I move the setter function to https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Services/DBS/DBSWriterObjects.py as it looks like that DBS3Reader module is intended to include only readonly functions?

@amaltaro
Copy link
Contributor

amaltaro commented Apr 8, 2021

That DBSWriterObjects.py module has just a set of functions, not really a class implemented.

Given that DBS3Reader.py already has a couple of write functions, it would be fine to add it there as well. However, if we really want to to the right thing, then I'd suggest creating a DBS3Writer.py function that inherits from DBS3Reader, something like this:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Services/WMStats/WMStatsWriter.py#L61

@haozturk
Copy link
Author

haozturk commented Apr 8, 2021

Okay, I create a DBS3Writer class by inheriting from DBS3Reader. Thanks Alan

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 2 tests added
    • 1 changes in unstable tests
  • Pylint check: failed
    • 6 warnings and errors that must be fixed
    • 12 warnings
    • 83 comments to review
  • Pylint py3k check: succeeded
    • 0 errors and warnings that should be fixed
    • 0 warnings
    • 0 comments to review
  • Pycodestyle check: succeeded
    • 26 comments to review
  • Python3 compatibility checks: failed
    • fails python3 compatibility test
    • there are suggested fixes for newer python3 idioms

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/11612/artifact/artifacts/PullRequestReport.html

@haozturk
Copy link
Author

@amaltaro I've moved the DBS API calls to their dedicated classes with 2 new unit tests. Looks like there are problems with two unit tests:

WMCore_t.Services_t.DBS_t.DBSWriter_t.DBSWriterTest:testSetDBSStatus
WMCore_t.Services_t.LogDB_t.LogDB_t.LogDBTest:test_heartbeat

I pass all unit tests on my testing environment, can you check what's wrong with them, please?

Regarding the changes within MSOutput, as I mentioned earlier, I could not test them due to lack of unit test setup for MSOutput. Those changes requires testing in some way.

@amaltaro
Copy link
Contributor

test this please

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 2 tests added
  • Pylint check: failed
    • 6 warnings and errors that must be fixed
    • 12 warnings
    • 83 comments to review
  • Pylint py3k check: succeeded
    • 0 errors and warnings that should be fixed
    • 0 warnings
    • 0 comments to review
  • Pycodestyle check: succeeded
    • 26 comments to review
  • Python3 compatibility checks: failed
    • fails python3 compatibility test
    • there are suggested fixes for newer python3 idioms

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/11613/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor

@haozturk thanks for providing these changes Hasan. I'm going to review this PR soon, but just wanted to point you to the unit test failure logs:
https://cmssdt.cern.ch/dmwm-jenkins/job/DMWM-WMCore-PR-test/11613/testReport/junit/WMCore_t.Services_t.DBS_t.DBSWriter_t/DBSWriterTest/testSetDBSStatus/

as you can see, the jenkins bot certificate has no privileges to write to DBS, which makes sense. So we should probably try to mock that write call as well.

@haozturk
Copy link
Author

Ah, thanks for pointing out that Alan, that makes sense. Let me revise that unit test.

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haozturk Hasan, I made a non-exhaustive review of this PR, and I would suggest you to look at those such that we can have another clear review once those changes are in place. I hope that's fine with you.

Just a note/concern: this PR will break the MSOutputTemplate schema, so we need to be careful and make these changes backward compatible with the documents already in MongoDB.

@todor-ivanov might have other comments/requests/concerns as well.

msg += "It needs to be of type: MSOutputTemplate"
raise UnsupportedError(msg)

dbs3Writer = DBS3Writer(url=self.msConfig["dbsReadUrl"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to initialize it in the __init__ method instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine now

dbsUpdateStatus = "done"
for dMap in workflow['OutputMap']:

if self.msConfig['enableDbsStatusChange']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I would remove this option (and the dry-run mode). When we have it implemented, I believe there is no reason to disable this feature in MSOutput, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it to make the switch from Unified to WMCore easy. So, once the PR is good to merge, we can merge it and put it to production with enableDbsStatusChange False and simply it will not take any action but to print some logs. Once we're ready to switch from Unified to WMCore, we can simply change this parameter to True without touching the source code. Does it make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could enable this DBS status change in MSOutput and keep both this service and Unified updating the DBS status for a day or two (I'm almost sure it's harmless and nothing fails). Then we have time to stop Unified.

At this stage, dry-run mode is just an over-complication of the code IMO.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. This option is removed in the latest commit.

@@ -102,6 +102,7 @@ def docSchema(self):
'Copies': 1,
...}],
"TransferStatus": "pending"|"done,
"DBSUpdateStatus": "pending"|"done,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I would make it a boolean, thus something like:

"DBSStatusUpdated": False|True,

but then it would require more changes from your side. So I leave it up to you.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, this is applied in the last commit.

:return: DBS status of the given dataset
"""

allowedDbsStatuses = ["VALID", "INVALID", "PRODUCTION"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a dataset can have other statuses. If I'm not wrong, DELETED and DEPRECATED as well. However, I don't think we need to have a list of valid statuses, we just use whatever is provided by DBS.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also unsure whether it's necessary to put this extra check and honestly I did not have a strong reason to do so. So, it's okay to remove it to reduce the level of complexity.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As agreed, this extra check is also removed.


try:
self.dbs.updateDatasetType(dataset=dataset,
dataset_access_type=status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation needs fixing. I would suggest to remove the blank line at 44 as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, we should fetch the output of this call and log the error in case it happens; or return it to the caller module.
By parsing the output of that call, we can also get rid of of the getDBSStatus implementation in the lines below.

Copy link
Author

@haozturk haozturk Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought that but the function returns None if the call is successful and throws an exception o/w. I would prefer it to return True or False respectively. Since it does not do that, I handled it with getDBSStatus calls to make it more intuitive (according to me :) ) . We can also handle it:

  • Return True if updateDatasetType returns None
  • Return False o/w

Which one do you prefer?

Btw, we already log the error in the except clause, in case the API fails

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, too bad it returns None in case the call is successful..

Return True if updateDatasetType returns None
Return False o/w

This is my preference. Given that - if there is an exception - the error would likely be on the server side, I would suggest to change the log level from exception to error, such that we do not pollute the service logs with unnecessary information.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied your suggestions Alan, but setDBSStatus still returns True if the API call is successful, False otherwise. This looks quite intuitive to me. Do you agree?


if self.msConfig['enableDbsStatusChange']:

res = dbs3Writer.setDBSStatus(dataset=dMap["Dataset"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that my comments on the DBS3Writer module will impact this logic.

Another comment, I failed to see where this self.msConfig['dbsStatus'] gets set in this module. Perhaps I would suggest to define an object variable like:

self.dbsOutputStatus = "VALID"

and simply refer to it here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also made it configurable: https://github.com/dmwm/deployment/pull/1044/files#diff-3a773e3fbfb5ff69f5fd806e7a4ab54a7dfaaf1f080e7e96ac1df9e73589c2ffR115 It's also possible to set it within the module. Which one do you prefer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed the deployment changes. Thank for pointing it out.
I think it's fine to keep it in the service configuration file. Nonetheless, I would set a default value to "VALID" in this module, in case the configuration does not have it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since deployment has the value, I did not set a default value in this module to avoid duplicates. If you think, it's more suitable to keep the configuration in MSOutput module instead of deployment repository, that's also fine, I can change it if you'd like.

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 5 new failures
    • 2 tests added
  • Pylint check: failed
    • 6 warnings and errors that must be fixed
    • 12 warnings
    • 80 comments to review
  • Pylint py3k check: succeeded
    • 0 errors and warnings that should be fixed
    • 0 warnings
    • 0 comments to review
  • Pycodestyle check: succeeded
    • 25 comments to review
  • Python3 compatibility checks: failed
    • fails python3 compatibility test
    • there are suggested fixes for newer python3 idioms

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/11714/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 2 tests added
  • Pylint check: failed
    • 6 warnings and errors that must be fixed
    • 12 warnings
    • 80 comments to review
  • Pylint py3k check: succeeded
    • 0 errors and warnings that should be fixed
    • 0 warnings
    • 0 comments to review
  • Pycodestyle check: succeeded
    • 25 comments to review
  • Python3 compatibility checks: failed
    • fails python3 compatibility test
    • there are suggested fixes for newer python3 idioms

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/11717/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hasan, these changes are looking good. However, I have made some comments along the code. I have a suggestion for this DBS status related schema, which I believe to be a simplified version of what is currently implemented.

I wonder why you changed the DBSMockData.json as well? Did we have the wrong parentage information in there? That information is fed directly from DBS, so I would be surprised if we had such issue. In addition to that, that json file has been updated this week, so you might want to rebase your PR to get the latest changes.

Thank you very much for providing these changes, Hasan. And apologies for the delay.

'OutputMap']),
Functor(self.docDump, pipeLine='PipelineNonRelVal'),
Functor(self.docCleaner)])

wfCounterTotal = 0
mQueryDict = {'TransferStatus': 'pending'}
mQueryDict = { "$or": [ { "TransferStatus": "pending" }, { "DBSUpdateStatus": False } ] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TransferStatus is meant to consider the overall status of a given workflow in MSOutput. In other words, TransferStatus will only be marked as done once ALL the necessary actions have already been taken and the workflow is ready to move to the next status (and move away of the MSOutput watch).

So I'd suggest to keep the original query, and make sure that its status remains pending as long as there are still actions to be taken on it.

status=self.msConfig['dbsStatus']["valid"])

if res:
dMap["DBSStatus"] = self.msConfig['dbsStatus']["valid"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood it right, you are defining a new property for the datasets dictionary to be stored in MongoDB, right?
In that case, I would suggest to drop it from the schema, the DBS status will be one and only one, for every single dataset. So, I'd keep only the top level key DBSUpdateStatus, which would contain the overall DBS status of the output datasets. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do you know what happens if you set a VALID dataset to VALID once again? I believe DBS won't complain, right?

In addition to that, should we take into consideration datasets that have not been produced in the workflow, thus not being available in the DBS database?

for dMap in workflow['OutputMap']:

res = self.dbs3Writer.setDBSStatus(dataset=dMap["Dataset"],
status=self.msConfig['dbsStatus']["valid"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here I would simply replace self.msConfig['dbsStatus']["valid"] by VALID.

:return: DBS status of the given dataset
"""

response = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
response = None
dbsStatus = None

if response:
dbsStatus = response[0]['dataset_access_type']
self.logger.info("%s is %s", dataset, dbsStatus)
return dbsStatus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here we could replace this if/else block by something like:

if response:
    dbsStatus = response[0]['dataset_access_type']
    self.logger.info("%s is %s", dataset, dbsStatus)
return dbsStatus

@@ -0,0 +1,53 @@
#!/usr/bin/env python
"""
_DBSReader_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong copy/pasting? :-D You might just remove it as well.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python2 Unit tests: failed
    • 13 tests deleted
    • 5 tests added
  • Python3 Unit tests: failed
    • 243 new failures
    • 12 tests deleted
    • 5 tests added
    • 18 changes in unstable tests
  • Python2 Pylint check: failed
    • 189 warnings and errors that must be fixed
    • 102 warnings
    • 1708 comments to review
  • Python3 Pylint check: failed
    • 244 warnings and errors that must be fixed
    • 102 warnings
    • 1761 comments to review
  • Pylint py3k check: failed
    • 6 warnings
  • Pycodestyle check: succeeded
    • 894 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12114/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Can one of the admins verify this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MSOutput: set output datasets to VALID in DBS3 before announcing a standard workflow
3 participants