Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking file provenance #3712

Open
wants to merge 68 commits into
base: master
Choose a base branch
from

Conversation

astro-friedel
Copy link

Description

This PR introduces the ability to track file provenance to Parsl. The provenance information includes:

  • File name
  • Creation date
  • File size
  • md5 sum
  • What App created it (if it doesn't already exist)
    • What arguments were given to the App
    • What environment was the App running in
  • What other Apps used the file

This feature uses the existing monitoring infrastructure to capture the requisite information and store it in a databse. The file provenance information can also be accessed via the parsl-visualizer interface. An additional keyword argument for the MonitoringHub has been added capture_file_provenance to enable/disable the provenance tracking. By default it is False, turning off the provenance tracking.

While this type of file information can be tracked manually for smaller workflows, larger workflows require an automated solution, like this one.

Changed Behaviour

In general, users will not change Parsl's behavior, as the provenance tracking code is turned off by default. Enabling it will only gather additional information in the background. The only change users should outwardly see is the workflow slowdown typically seen when unsing the monitoring framework.

Fixes

This fixes #3711

Type of change

  • New feature
  • Update to human readable text: Documentation/error messages/comments

initial code to handle file related monitoring messages
parsl/dataflow/dflow.py Outdated Show resolved Hide resolved
parsl/executors/base.py Outdated Show resolved Hide resolved
parsl/version.py Outdated Show resolved Hide resolved
test-requirements.txt Outdated Show resolved Hide resolved
@astro-friedel
Copy link
Author

I don't know why the tests are failing. They pass fine on my own machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tracking file provenance
2 participants