-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored Snakemake workflow and generate example output #42
Conversation
Updated the dockerfile to use a two stage process. Snakemake image is used as the runtime environment. Supports / Partially addresses #17
Updated snakemake workflow to work with Dockerfile. Also fixes module not found error by exposing pythonpath. Partially addresses #17
- Temporary fix for ValueError with multiple context URLs (#34) by returning empty context for URLs that don't match allowed_urls_pattern.
FromAsCasing - FROM and AS should be in the same case.
- Added get_max_id_from_files method. - If --workflow-ids argument is not provided, created_files.json should contain all workflows downloaded. get_max_id_from_files() finds the maximum ID in the sourced workflows. - Part of wider effort to address #17
Removes list_expected_files() method.
- Added progress bar and removed line-by-line logging for easier evaluation of download and processing progress. - Addresses #33
- Added preliminary module create_ro_crate to create an RO crate for a given file. - Addresses #37
- Added a rule in Snakemake workflow to create an RO crate. - Addresses #37
Addresses #37
- Added poetry lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not knowing much about snakemake I could be wrong but should this be (git) ignored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think I misunderstood the purpose of the ro-crate-metadata folder
@@ -57,6 +53,12 @@ def info(self): | |||
def geturl(self): | |||
return url | |||
|
|||
if not allowed_urls_re.match(url): | |||
return Response(json.dumps({"@context": {}})) | |||
# raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this might be redundant now? (edit: the comment I mean)
@@ -32,7 +35,7 @@ def merge_all_files( | |||
|
|||
for i, fn in enumerate(filenames): | |||
with open(fn, "r") as f: | |||
print(f"Processing {fn}, {i}/{len(filenames)}") | |||
update_progress_bar(i + 1, len(filenames)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idle comment but could something like https://github.com/tqdm/tqdm work for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looking good to me. I've made a few comments but they are not substantial - approving!
This PR primarily addresses: #37, #34, #33, #28, #17
The snakemake workflow consists of 3 main rules:
source_ro_crates
sources the RO crates and checks their outputmerge_files
creates the merged.ttl
filecreate_ro_crate
creates thero-crate-metadata.json
fileThe example
.ttl
file consists of all versions of all publicly accessible RO crates on WorkflowHub.