Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Workflow Run RO-crate format #39

Open
wants to merge 30 commits into
base: master
Choose a base branch
from

Conversation

famosab
Copy link

@famosab famosab commented Dec 18, 2024

We worked on a first version of the plugin which is able to render valid RO-crates for any workflow run.

Happy to receive feedback to get this finished up :)

Continues #19 and #33.

famosab and others added 16 commits November 18, 2024 15:45
add encodingFormat for nextflow.config
feat: add wrroc to valid formats
* fx: make getIntermediateOutputFiles work again

* Fix bugs

fixes #16
fixes #17

---------

Co-authored-by: fbartusch <[email protected]>
* feat: add README to create

* feat: ignore vscode

* fix: make getIntermediateOutputFiles work again (#18) (#19)

* fx: make getIntermediateOutputFiles work again

* Fix bugs

fixes #16
fixes #17

---------

Co-authored-by: fbartusch <[email protected]>

* feat: add README to json

* feat: check first if readme exists

* Add readme to hasPart

Signed-off-by: fbartusch <[email protected]>

---------

Signed-off-by: fbartusch <[email protected]>
Co-authored-by: fbartusch <[email protected]>
* Add getEncodingFormat function that return the encoding format for a file
* handle YAML files manually

Signed-off-by: fbartusch <[email protected]>
* main workflow complies (more or less) with ComputationalWorkflow profile version 1.0
  (if set in manifest add license, url, version, description, ...)
* Correct value vor ActionStatus

Signed-off-by: fbartusch <[email protected]>
* start with metaYaml imports

* merge dev-wrroc into metaYaml (#23)

* add encodingFormat for nextflow.config

* add encodingFormat for main.nf

* feat: add wrroc to valid formats

* fix: make getIntermediateOutputFiles work again (#18)

* fx: make getIntermediateOutputFiles work again

* Fix bugs

fixes #16
fixes #17

---------

Co-authored-by: fbartusch <[email protected]>

* feat: add README to crate (#14)

* feat: add README to create

* feat: ignore vscode

* fix: make getIntermediateOutputFiles work again (#18) (#19)

* fx: make getIntermediateOutputFiles work again

* Fix bugs

fixes #16
fixes #17

---------

Co-authored-by: fbartusch <[email protected]>

* feat: add README to json

* feat: check first if readme exists

* Add readme to hasPart

Signed-off-by: fbartusch <[email protected]>

---------

Signed-off-by: fbartusch <[email protected]>
Co-authored-by: fbartusch <[email protected]>

---------

Signed-off-by: fbartusch <[email protected]>
Co-authored-by: fbartusch <[email protected]>

* WIP

* only add from meta if meta exists

* remove usage from ext args

* add module name to id

---------

Signed-off-by: fbartusch <[email protected]>
Co-authored-by: fbartusch <[email protected]>
@famosab
Copy link
Author

famosab commented Dec 18, 2024

@bentsherman maybe you can have a look here :)

@simleo
Copy link

simleo commented Dec 18, 2024

Have you got an example RO-Crate generated with this version of the plugin?

@famosab
Copy link
Author

famosab commented Dec 18, 2024

ro-crate-metadata.json
This was created using the plugin and this pipeline: https://github.com/famosab/wrrocmetatest

@bentsherman
Copy link
Member

Is this superseding #33 now?

@bentsherman bentsherman changed the base branch from master to workflow-run-crate December 18, 2024 15:46
@bentsherman bentsherman changed the base branch from workflow-run-crate to master December 18, 2024 15:47
@simleo
Copy link

simleo commented Dec 19, 2024

ro-crate-metadata.json

I ran runcrate report (https://github.com/ResearchObject/runcrate) on a directory containing that file and this is what I got:

action: #7d8bdcb2-6ea3-4132-b134-61b40ef98b8d
  instrument: main.nf (['File', 'SoftwareSourceCode', 'ComputationalWorkflow', 'HowTo'])
  started: 2024-12-18T11:33:47.967336+01:00
  ended: 2024-12-18T11:33:50.017405+01:00
  inputs:
    Users/famke/04_other/BioHackathon24/testdata/read2.fq.gz
    Users/famke/04_other/BioHackathon24/testdata/read1.fq.gz
    work/b2/0766652cd477129c14781bb6d8b148/test_1.fastp.fastq.gz
    work/b2/0766652cd477129c14781bb6d8b148/test_2.fastp.fastq.gz
    testsheet.csv <- #input
    None <- #genome
    s3://ngi-igenomes/igenomes/ <- #igenomes_base
    True <- #igenomes_ignore
    results <- #outdir
    copy <- #publish_dir_mode
    None <- #email
    None <- #email_on_fail
    False <- #plaintext_email
    False <- #monochrome_logs
    False <- #help
    False <- #help_full
    False <- #show_hidden
    None <- #version
    https://raw.githubusercontent.com/nf-core/test-datasets/ <- #pipelines_testdata_base_path
    None <- #config_profile_name
    None <- #config_profile_description
    master <- #custom_config_version
    https://raw.githubusercontent.com/nf-core/configs/master <- #custom_config_base
    None <- #config_profile_contact
    None <- #config_profile_url
    True <- #validate_params
    None <- #genomes
  outputs:
    fastp/test_2.fastp.fastq.gz
    fastp/test.fastp.html
    fastp/test.fastp.log
    fastp/test.fastp.json
    fastp/test_1.fastp.fastq.gz
    megahit/test.contigs.fa.gz
    megahit/intermediate_contigs/k51.contigs.fa.gz
    megahit/intermediate_contigs/k51.final.contigs.fa.gz
    megahit/intermediate_contigs/k71.contigs.fa.gz
    megahit/intermediate_contigs/k51.addi.fa.gz
    megahit/intermediate_contigs/k71.final.contigs.fa.gz
    megahit/intermediate_contigs/k71.addi.fa.gz
    megahit/test.log
    megahit/intermediate_contigs/k51.local.fa.gz
    work/b2/0766652cd477129c14781bb6d8b148/versions.yml
    work/34/f6dbb451e056d59cd528618daaa6cc/versions.yml

action: #b20766652cd477129c14781bb6d8b148
  step: main.nf#main/FAMOSAB_WRROCMETATEST:WRROCMETATEST:FASTP
  instrument: #Script_fd7c4fa8fd93ef0e@7337bd2e (SoftwareApplication)
  inputs:
    Users/famke/04_other/BioHackathon24/testdata/read1.fq.gz
    Users/famke/04_other/BioHackathon24/testdata/read2.fq.gz
  outputs:
    fastp/test_1.fastp.fastq.gz
    fastp/test_2.fastp.fastq.gz
    fastp/test.fastp.json
    fastp/test.fastp.html
    fastp/test.fastp.log
    work/b2/0766652cd477129c14781bb6d8b148/versions.yml

action: #34f6dbb451e056d59cd528618daaa6cc
  step: main.nf#main/FAMOSAB_WRROCMETATEST:WRROCMETATEST:MEGAHIT
  instrument: #Script_55473827b5576695@174cb0d8 (SoftwareApplication)
  inputs:
    work/b2/0766652cd477129c14781bb6d8b148/test_1.fastp.fastq.gz
    work/b2/0766652cd477129c14781bb6d8b148/test_2.fastp.fastq.gz
  outputs:
    megahit/test.contigs.fa.gz
    megahit/intermediate_contigs/k51.contigs.fa.gz
    megahit/intermediate_contigs/k51.final.contigs.fa.gz
    megahit/intermediate_contigs/k71.contigs.fa.gz
    megahit/intermediate_contigs/k71.final.contigs.fa.gz
    megahit/intermediate_contigs/k51.addi.fa.gz
    megahit/intermediate_contigs/k71.addi.fa.gz
    megahit/intermediate_contigs/k51.local.fa.gz
    megahit/intermediate_contigs/k51.final.contigs.fa.gz
    megahit/intermediate_contigs/k71.final.contigs.fa.gz
    megahit/test.log
    work/34/f6dbb451e056d59cd528618daaa6cc/versions.yml

Since I don't have the original crate I'm wondering, do all relative paths correspond to existing files in the crate, e.g., is there a Users/famke/04_other/BioHackathon24/testdata/read2.fq.gz in the crate?

This was created using the plugin and this pipeline: https://github.com/famosab/wrrocmetatest

I tried to run the pipeline following the instructions at the repo above. I had to manually change the plugin's version (to 1.1.0-DEV) after installing it, otherwise Nextflow downloads the [email protected] and replaces the installed one. I got this error (?) message:

Unknown parentDir: /home/simleo/repos/wrrocmetatest/read1.fq.gz
Unknown parentDir: /home/simleo/repos/wrrocmetatest/read2.fq.gz
Unexpected input file: /home/simleo/repos/wrrocmetatest/read1.fq.gz
Unexpected input file: /home/simleo/repos/wrrocmetatest/read2.fq.gz

Despite that, the run went on and I got an RO-Crate, but without the read{1,2}.fq.gz input files. I have Nextflow 23.10.0.

@famosab
Copy link
Author

famosab commented Dec 19, 2024

@simleo I will try and answer to the issues you had :)

Since I don't have the original crate I'm wondering, do all relative paths correspond to existing files in the crate, e.g., is there a Users/famke/04_other/BioHackathon24/testdata/read2.fq.gz in the crate?

Yes this path is copied like this to the folder in which the json can be found. But that might not be the most elegant solution as the input files could also be put in a folder called input or something. What do you think?

image

I tried to run the pipeline following the instructions at the repo above. I had to manually change the plugin's version (to 1.1.0-DEV) after installing it, otherwise Nextflow downloads the [email protected] and replaces the installed one. I got this error (?) message:

Yes the installation of the plugin needs to be fixed. I always run make install from the folder where I worked on the plugin and then use the version that is appropriate - I think it is 1.3.0 in our case.

Seems like something is off with your inputs. Did you download the files as described in the README of the pipeline?

@simleo
Copy link

simleo commented Dec 19, 2024

Yes this path is copied like this to the folder in which the json can be found. But that might not be the most elegant solution as the input files could also be put in a folder called input or something. What do you think?

It's not that important, what matters is that there are no clashes between files due to their names.

Seems like something is off with your inputs. Did you download the files as described in the README of the pipeline?

Yes, and I put them in the root directory of the wrrocmetatest repo as cloned on my machine. My testsheet.tsv is:

sample,fastq_1,fastq_2
test,/home/simleo/repos/wrrocmetatest/read1.fq.gz,/home/simleo/repos/wrrocmetatest/read2.fq.gz

The command I ran is:

nextflow run main.nf -profile docker --input testsheet.csv --outdir results -c testdata.config

@famosab
Copy link
Author

famosab commented Dec 20, 2024

I have Nextflow 23.10.0.

Then I think you might need to update to a more recent Nextflow version as I only tested this for versions above 24.04.

But you should be able to test the plugin with any pipeline - maybe running an nf-core pipeline that is better maintained is more reliable!

@famosab
Copy link
Author

famosab commented Jan 7, 2025

@bentsherman I would appreciate any feedback on this so we can get this finished up this month? As soon as people start using the plugin I guess more requests for changes / improvements / updates will come in.
@simleo Are there things that are missing from your point of view?

@simleo
Copy link

simleo commented Jan 7, 2025

ro-crate-metadata.json

The metadata file looks OK. One thing I suggest changing is PropertyValue instances corresponding to parameters that haven't been specified: better not add them at all rather than add them with a "value": null.

Regarding the crate structure, it looks fine from your screenshot, though some files added to the crate are not listed in the metadata (e.g. README.txt): this is not an error, but listing them in the metadata with a name and/or description (or avoiding adding them to the crate if they are not essential) would improve the crate (this could be left to a future release).

However, I still cannot reproduce your result after upgrading Nextflow to 24.10.3. The output directory is missing several files and directories including ro-crate-metadata.json. The command line output shows this:

Unknown parentDir: /home/simleo/repos/wrrocmetatest/read2.fq.gz
Unknown parentDir: /home/simleo/repos/wrrocmetatest/read1.fq.gz

(with respect to Nextflow 23.10.0 the Unexpected input file messages have disappeared, but not the above ones) and there's a java.lang.IllegalArgumentException: 'other' is different type of Path in the .nextflow.log.

@famosab
Copy link
Author

famosab commented Jan 8, 2025

Can you try changing the config file to something like: (mainly updating the version of the plugin)

plugins {
	id '[email protected]'
}

prov {
	enabled = true
	formats {
    	wrroc {
        	file = "${params.outdir}/ro-crate-metadata.json"
        	overwrite = true
        	agent {
            	name = "John Doe"
            	orcid = "https://orcid.org/0000-0000-0000-0000"
        	}
			license = "https://spdx.org/licenses/MIT"
            profile = "provenance_run_crate"
    	}
	}
}

@simleo
Copy link

simleo commented Jan 9, 2025

Thanks @famosab, I finally managed to install the right version of the plugin. The run finished with no errors and no warnings, and I got a valid RO-Crate.

It's looking pretty good already, but some things need to be fixed. I think the most important one is test_{1,2}.fastp.fastq.gz being listed among the workflow inputs: these are intermediate files, produced by FASTP and consumed by MEGAHIT, so they should not be listed among workflow inputs. Similarly, intermediate files should not be listed among workflow outputs (in fact it's not clear to me which files are the actual workflow outputs: all of MEGAHIT's outputs?). Other remarks:

  • It would be nice to have the testsheet.csv file in the crate. Now only its name is listed, represented as the value of a PropertyValue: I guess this is due to the fact that the workflow somehow sees it as a string rather than a file.

  • The dependencies of main.nf, direct and indirect (e.g. workflows/wrrocmetatest.nf, subworkflows/nf-core/utils_nfcore_pipeline/main.nf, modules/nf-core/fastp/main.nf, etc.) should be included in the crate as files.

@bentsherman bentsherman changed the base branch from workflow-run-crate to master January 10, 2025 20:43
This was referenced Jan 10, 2025
@bentsherman bentsherman changed the title Finalize addition of Workflow Run RO-crate format Add Workflow Run RO-crate format Jan 10, 2025
@bentsherman
Copy link
Member

Taking a look this afternoon. Expect some minor edits soon

Signed-off-by: Ben Sherman <[email protected]>
Copy link
Member

@bentsherman bentsherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some minor cleanup so far. The render() function is pretty long, so I'm going to see if I can move some code into helper functions to make it easier to read at a high-level. Please refrain from making edits for now as I work through the code. I left some comments/questions in the meantime.

Comment on lines 101 to 102
// Copy workflow input files into RO-Crate
workflowInputMapping.each { source, dest ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you are copying the intermediate files into the RO-crate? I don't think this is feasible in general. I think it would be better to save only a record of the task inputs/outputs with a checksum. That is what the BCO does at least.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recording object (inputs) and result (outputs) for CreateAction is respectively a MAY and a SHOULD in WRROC, so it's not strictly required. Also, workflow inputs and outputs are more important than intermediate files. However, if intermediate inputs/outputs are available, it is appropriate to include them in the RO-Crate, so that it provides a more complete representation of the run. If this is not feasible in general, could it be controlled by a boolean option (e.g. include_intermediates) in the configuration file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have kept the object and result metadata without copying the actual intermediate files. This way you can still e.g. construct the provenance graph. I will consider adding a flag to copy the intermediates in a future iteration

Comment on lines 203 to 204
// Copy workflow into crate directory
Files.copy(scriptFile, crateRootDir.resolve(scriptFile.getFileName()), StandardCopyOption.REPLACE_EXISTING)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simleo you mentioned copying all of the pipeline code into the crate, but is this really necessary? If the crate specifies a git repository, revision, and main script path, the user can reproduce the pipeline code at any time.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the crate specifies a git repository, revision, and main script path

This would be a nice alternative, though including the code is safer since a git repository can be deleted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case I'm thinking of is using a standard pipeline e.g. nf-core/rnaseq v3.12, which may have been previously "verified" by a regulatory body. In this case it would be better to include an immutable reference rather than the actual files because then you don't have to check that those files correspond to a "verified" pipeline. But this is another topic I would like to see more discussion from community. I will consider a flag to copy the pipeline in a future iteration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simleo Is there an appropriate entity for recording the git url + commit hash? For example the BCO has an "scm extension" that looks like this:

"scm_extension": {
    "scm_repository": "https://github.com/nextflow-io/rnaseq-nf",
    "scm_type": "git",
    "scm_commit": "1815dc2a18bb2c2a8e4c7915260d77bb04ec8c91",
    "scm_path": "main.nf",
    "scm_preview": "https://github.com/nextflow-io/rnaseq-nf/tree/1815dc2a18bb2c2a8e4c7915260d77bb04ec8c91/main.nf"
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add to the ComputationalWorkflow using codeRepository and version. Would be nice to also record the URL or path of the main script since it is the entrypoint to execute the workflow

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ComputationalWorkflow could have a "url" property pointing to the exact revision, e.g.:

   "url": "https://github.com/famosab/wrrocmetatest/blob/e8665c295b12132aa886f3abd0fc34a71b08d4a6/main.nf"

@bentsherman
Copy link
Member

bentsherman commented Jan 14, 2025

It would be nice to have the testsheet.csv file in the crate. Now only its name is listed, represented as the value of a PropertyValue: I guess this is due to the fact that the workflow somehow sees it as a string rather than a file.

Agreed. We'll have to use either the parameter JSON schema or (in the future) static types to determine which params are files and thereby should be included in the crate.

EDIT: I'll probably table this for a future iteration since the parameter schema is a whole can of worms.

Signed-off-by: Ben Sherman <[email protected]>
@bentsherman bentsherman linked an issue Jan 14, 2025 that may be closed by this pull request
Signed-off-by: Ben Sherman <[email protected]>
…epository URL + commit hash

Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
@bentsherman
Copy link
Member

Okay I think I am done with cleanup and refactoring. You can see my changes from the commits, but to summarize:

  • try to organize the code
  • try to use canonical / readable ids for all entities
  • record intermediate files but don't copy them into the RO-crate
  • use git URL + commit hash instead of copying the pipeline scripts
  • use parameter schema to identify input files and copy them into RO-crate (i.e. samplesheet)

Another round of testing would be good to make sure I didn't break anything. I have tested with the rnaseq-nf toy pipeline which is pretty basic.

@simleo
Copy link

simleo commented Jan 15, 2025

I updated and reinstalled the plugin, then I ran https://github.com/famosab/wrrocmetatest again, but the run failed. There are no error messages on the console, but the results dir does not contain ro-crate-metadata.json, and in .nextflow.log there's a java.nio.file.NoSuchFileException: /home/simleo/repos/wrrocmetatest/s3:/ngi-igenomes/igenomes.

Copy link

@fbartusch fbartusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving the quality of the code so much. I tested it with the nf-core demo pipeline with the test profile and found some problems.

I have some scripts for running the plugin on most nf-core pipelines. Maybe I find more problems in the next days.

}

// -- copy input files from params to crate
params.each { name, value ->

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this with the nf-core demo pipeline and the test profile.
The test profile uses a samplesheet hosted on GitHub. This lets the plugin fail with;

java.nio.file.NoSuchFileException: <pipeline directory>/https:/raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv
[...]
	at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2471)
	at nextflow.prov.WrrocRenderer.render(WrrocRenderer.groovy:199)
	at nextflow.prov.ProvObserver.onFlowComplete(ProvObserver.groovy:121)
[...]

A quick&dirty solution that could fix this looks like this:

params.each { name, value ->
    final schema = paramSchema[name] ?: [:]
    final type = getParameterType(name, value, schema)
    if( type == "File" || type == "Directory" ) {

        String valueString = value.toString()
        Path path = Path.of(valueString);
        if (path.isFile()) {
            final source = Path.of(value.toString()).toAbsolutePath()
            // don't copy params.outdir into itself...
            if( source == crateDir )
                return
            source.copyTo(crateDir)
        } else {
            // Check if file parameter is an URL
            try {
                URL url = new URL(valueString)
                url.toURI()  // Performs more checks than URL constructor
                ReadableByteChannel readableByteChannel = Channels.newChannel(url.openStream());

                // Download file into crate
                String fileName = valueString.substring(valueString.lastIndexOf('/') + 1);
                FileOutputStream fileOutputStream = new FileOutputStream(crateDir.resolve(fileName).toString());
                FileChannel fileChannel = fileOutputStream.getChannel();
                fileChannel.transferFrom(readableByteChannel, 0, Long.MAX_VALUE);
            } catch(Exception e) {

            }
        }
    }
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same problem with:

 igenomes_base              = 's3://ngi-igenomes/igenomes/'

The current thinks it's a File/Directory in the Nextflow sense and fails to copy it.

case CharSequence:
return "Text"
default:
return null

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the additionalType is a MUST for RO-Crates. The resulting crate for the nf-core demo pipeline is invalid, according to the rocrate-validator:

"severity": "REQUIRED",
"message": "FormalParameter MUST have an additionalType",
"violatingEntity": "./main.nf#param#genomes",

This is the only problematic formal parameter in my test and it should originate from here

Can you test for Map and List values and return "Text" as type, as these nested structures are converted to JSON-Strings in the metadata file.

@@ -0,0 +1,820 @@
/*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RO-Crate has to contain the main workflow file main.nf.
We have to at least copy this one file into crate otherwise the rocrate-validator considers the crate invalid:

"severity": "REQUIRED",
"message": "Main Workflow main.nf not found in crate",

@fbartusch
Copy link

I updated and reinstalled the plugin, then I ran https://github.com/famosab/wrrocmetatest again, but the run failed. There are no error messages on the console, but the results dir does not contain ro-crate-metadata.json, and in .nextflow.log there's a java.nio.file.NoSuchFileException: /home/simleo/repos/wrrocmetatest/s3:/ngi-igenomes/igenomes.

I think that could be same issue I described in my review

@simleo
Copy link

simleo commented Jan 16, 2025

  • record intermediate files but don't copy them into the RO-crate

This makes the RO-Crate invalid, though currently the validator does not check for it: see crs4/rocrate-validator#60

@simleo
Copy link

simleo commented Jan 16, 2025

  • record intermediate files but don't copy them into the RO-crate

This makes the RO-Crate invalid, though currently the validator does not check for it: see crs4/rocrate-validator#60

Discussing this at the WRROC meeting. @elichad used File with a randomly-generated UUID hash-prepended local identifier in a previous crate. But that could not have been a data entity because those have to be in the crate payload as discussed above. Does using the type File automatically make an entity a data entity? @stain

See also: https://www.researchobject.org/ro-crate/specification/1.1/contextual-entities.html#contextual-vs-data-entities

@elichad
Copy link

elichad commented Jan 16, 2025

The crate that @simleo references is available here: https://zenodo.org/records/12987289. In the metadata of that crate (v1.0.2), entities like #vaccine-effectiveness-aragon-126e7f43-fd41-45f0-b305-91f33b798ada represent workflow input files that couldn't be included in the crate for privacy reasons.

From what I can tell, this doesn't violate the RO-Crate 1.1 spec, as I don't think there's a direct statement that anything with @type: File is considered to be a data entity (which would need to follow File Data Entity requirements for the @id). But it is maybe a bit of a grey area/confusion point in how we describe/define data vs contextual entities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add RO crate format
5 participants