Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking file provenance #3712

Open
wants to merge 68 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
30f54ca
initial version of dynamic file list classes
astro-friedel May 13, 2024
69d8f02
integrated dynamic file into output file handling
astro-friedel May 21, 2024
882e3ba
data flow kernel changes to accommodate dynamic file lists
astro-friedel Jun 7, 2024
ce369aa
Merge remote-tracking branch 'upstream/master' into fixing_dynamic_fi…
astro-friedel Jun 7, 2024
7138adc
Auto stash before checking out "HEAD"
astro-friedel Jun 7, 2024
5bff70f
creation of file tale in the monitoring
astro-friedel Jun 7, 2024
6025691
added initial file provenance data in database
astro-friedel Jun 14, 2024
efc3b14
fixed error where uuid's were not strings
astro-friedel Jun 17, 2024
222166a
fixed typos in names
astro-friedel Jun 17, 2024
92597f6
initial working version
astro-friedel Jun 18, 2024
8b922d9
Merge branch 'fixing_dynamic_file_inputs_and_outputs' into trackingFi…
astro-friedel Jun 27, 2024
632890b
added flask-wtf to monitoring requirements for form processing
astro-friedel Jun 27, 2024
17e5c43
added file size and md5sum tracking for files
astro-friedel Jun 27, 2024
d8df5fe
fixed issue with clean_copy in dynamic files
astro-friedel Jun 27, 2024
b16cad6
added initial provenance interface to flask pages
astro-friedel Jun 27, 2024
0275b28
indentation fix
astro-friedel Jul 1, 2024
3a1238b
fixed database code for provenance tracking
astro-friedel Jul 1, 2024
bb013fe
added environment tracking to monitoring
astro-friedel Jul 9, 2024
bc8247a
Merge remote-tracking branch 'upstream/master' into trackingFileProve…
astro-friedel Jul 31, 2024
45af5f9
added file provenance tracking as an option to monitoring framework
astro-friedel Jul 31, 2024
cd99828
better reporting on environment
astro-friedel Jul 31, 2024
558d170
ensure that files are tagged with the task id that generated them, no…
astro-friedel Jul 31, 2024
05caec8
get the task reporting the environment correctly
astro-friedel Jul 31, 2024
8f212ba
only provide file link if files were actually used in the workflow
astro-friedel Jul 31, 2024
3ade95a
only provide file link if there were files
astro-friedel Jul 31, 2024
7501cc3
properly report environment with file details
astro-friedel Jul 31, 2024
66238e5
properly format and report files
astro-friedel Jul 31, 2024
00ffa6f
make header responsive to url
astro-friedel Jul 31, 2024
da73f91
fix bug in file size reporting
astro-friedel Jul 31, 2024
76b8008
documentation on file provenance
astro-friedel Jul 31, 2024
93b17b0
fix bug in format
astro-friedel Jul 31, 2024
1e004a6
get the correct timestamp for the file
astro-friedel Sep 17, 2024
8dde82c
remove unneeded prints
astro-friedel Sep 17, 2024
cb550ee
auto determine file size, md5sum, timestamp if possible
astro-friedel Sep 17, 2024
5ebd009
refactor variable
astro-friedel Sep 17, 2024
baf2332
make sure dfk is propagated from dynamic file list to children
astro-friedel Sep 17, 2024
79211bc
documentation and annotation cleanup
astro-friedel Sep 17, 2024
825842f
cleanup
astro-friedel Sep 17, 2024
117e66d
Merge remote-tracking branch 'upstream/master' into trackingFileProve…
astro-friedel Sep 17, 2024
8c9a2a0
backed out DynamicFile stuff so that this branch is pure file tracking
astro-friedel Nov 12, 2024
eff8ab6
Merge branch 'master' into trackingFileProvenance
astro-friedel Nov 12, 2024
5ca48cf
Merge branch 'master' into trackingFileProvenance
astro-friedel Nov 27, 2024
9a05b2c
reorganized to group similar codes together
astro-friedel Nov 27, 2024
14aac2b
fixed message format
astro-friedel Nov 27, 2024
585fd03
fixed some typos
astro-friedel Nov 27, 2024
19f7747
updates to include misc info table
astro-friedel Nov 27, 2024
27f6391
updated docs
astro-friedel Nov 27, 2024
97ade30
fixed bug for remote files
astro-friedel Nov 27, 2024
33be080
test for provenance framework
astro-friedel Nov 27, 2024
07c2e45
flake8 fixes
astro-friedel Nov 27, 2024
97108e1
fixed missing line in docs
astro-friedel Nov 27, 2024
a837f08
removed extraneous ignores
astro-friedel Dec 3, 2024
6bef04f
reverted removal of trailing white spaces
astro-friedel Dec 3, 2024
5057d19
fixes per review comments
astro-friedel Dec 3, 2024
89d5e0a
ensure that md5sum is only calculated when file provenance tracking i…
astro-friedel Dec 3, 2024
c653cbc
fixes based on review comments
astro-friedel Dec 3, 2024
7efebad
added dfk as a required parameter to DataFuture
astro-friedel Dec 3, 2024
d6e7e5b
make sure file md5sum is only calculated
astro-friedel Dec 3, 2024
1fcdbc6
added full path and parsing for path for file database entries
astro-friedel Dec 3, 2024
b443cbb
fixed typos and tests
astro-friedel Dec 3, 2024
69cfc7b
put back required SECRET_KEY so that the file search form works
astro-friedel Dec 3, 2024
0316cf9
isort fixes
astro-friedel Dec 3, 2024
af51f0e
Merge branch 'Parsl:master' into trackingFileProvenance
astro-friedel Dec 3, 2024
9ed699d
removed unneeded import
astro-friedel Dec 3, 2024
ce609cc
mypy fixes
astro-friedel Dec 3, 2024
d646aaa
Merge remote-tracking branch 'upstream/master'
astro-friedel Dec 10, 2024
53f323d
fixed incorrect variable name
astro-friedel Dec 10, 2024
9444f42
Merge branch 'master' into trackingFileProvenance
astro-friedel Dec 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
added file provenance tracking as an option to monitoring framework
  • Loading branch information
astro-friedel committed Jul 31, 2024
commit 45af5f9642075c17a55647447d817f827ba0db1b
10 changes: 5 additions & 5 deletions parsl/dataflow/dflow.py
Original file line number Diff line number Diff line change
@@ -236,13 +236,13 @@ def __exit__(self, exc_type, exc_value, traceback) -> None:
raise InternalConsistencyError(f"Exit case for {mode} should be unreachable, validated by typeguard on Config()")

def _send_task_log_info(self, task_record: TaskRecord) -> None:
if self.monitoring:
if self.monitoring and self.monitoring.capture_file_provenance:
task_log_info = self._create_task_log_info(task_record)
self.monitoring.send(MessageType.TASK_INFO, task_log_info)

def _send_file_log_info(self, file: Union[File, DataFuture, DynamicFileList.DynamicFile],
task_record: TaskRecord) -> None:
if self.monitoring:
if self.monitoring and self.monitoring.capture_file_provenance:
file_log_info = self._create_file_log_info(file, task_record)
self.monitoring.send(MessageType.FILE_INFO, file_log_info)

@@ -280,20 +280,20 @@ def _create_env_log_info(self, environ: ParslExecutor) -> Dict[str, Any]:
return env_log_info

def _register_env(self, environ: ParslExecutor) -> None:
if self.monitoring:
if self.monitoring and self.monitoring.capture_file_provenance:
environ_info = self._create_env_log_info(environ)
self.monitoring.send(MessageType.ENVIRONMENT_INFO, environ_info)

def register_as_input(self, f: Union(DynamicFileList.DynamicFile, File, DataFuture),
task_record: TaskRecord):
if self.monitoring:
if self.monitoring and self.monitoring.capture_file_provenance:
self._send_file_log_info(f, task_record)
file_input_info = self._create_file_io_info(f, task_record)
self.monitoring.send(MessageType.INPUT_FILE, file_input_info)

def register_as_output(self, f: Union(DynamicFileList.DynamicFile, File, DataFuture),
task_record: TaskRecord):
if self.monitoring:
if self.monitoring and self.monitoring.capture_file_provenance:
self._send_file_log_info(f, task_record)
file_output_info = self._create_file_io_info(f, task_record)
self.monitoring.send(MessageType.OUTPUT_FILE, file_output_info)
8 changes: 7 additions & 1 deletion parsl/monitoring/monitoring.py
Original file line number Diff line number Diff line change
@@ -47,7 +47,8 @@ def __init__(self,
logdir: Optional[str] = None,
monitoring_debug: bool = False,
resource_monitoring_enabled: bool = True,
resource_monitoring_interval: float = 30): # in seconds
resource_monitoring_interval: float = 30,
capture_file_provenance: bool = False): # in seconds
"""
Parameters
----------
@@ -86,6 +87,9 @@ def __init__(self,
If set to 0, only start and end information will be logged, and no periodic monitoring will
be made.
Default: 30 seconds
capture_file_provenance : bool
astro-friedel marked this conversation as resolved.
Show resolved Hide resolved
Set this field to True to enable logging of file provenance information.
Default: False
"""

if _db_manager_excepts:
@@ -105,6 +109,8 @@ def __init__(self,
self.resource_monitoring_enabled = resource_monitoring_enabled
self.resource_monitoring_interval = resource_monitoring_interval

self.capture_file_provenance = capture_file_provenance

def start(self, run_id: str, dfk_run_dir: str, config_run_dir: Union[str, os.PathLike]) -> None:

logger.debug("Starting MonitoringHub")