-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial draft of a design document for the Zenodo like DOI per dandiset #2012
base: master
Are you sure you want to change the base?
Conversation
- If minting a DOI fails, we need to raise exception to inform developers about the issue but proceed with the creation of the dandiset. | ||
- *minimal metadata* entered during creation request (title, description, license) | ||
- DLP URL `https://dandiarchive.org/dandiset/{dandiset.id}` | ||
- For embargoed dandiset, **do not** specify any metadata besides the DLP URL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a thought: perhaps the doi generation process can be implemented only when dataset is unembargoed. i.e. embargoed datasets cannot be pointed to by doi (even if we get reviewer view only access). an owner would have to unembargo it.
for implementation, doi generation happens at creation for public and unembargoing for embargoed dandisets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could. But one of the goals here is to get away from using "fake DOIs" (#1709) which this would "prevent".
In current design (might change) we would make overall dandiset DOI Findable only upon initial publication. As embargoed dataset would never be published, its DOI would remain Draft
thus not available to users, and thus IMHO there is no harm. I think we could even keep updating it with metadata etc. That IMHO would simplify the logic and make "embargoed" less special (thus easier to code/troubleshoot etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In current design (might change) we would make overall dandiset DOI Findable only upon initial publication.
Given demand from zarr users to get DOIs for dandisets with zarrs, we better make them Findable as soon as possible in the life cycle of those dandisets. ref: dandi/helpdesk#165 (reply in thread) . So I think we should proceed that way -- make them Findable as soon as datacite validation passes. Also inform user about datacite model issues as part of the validation.
…ublished" draft version of dandiset
- For `Draft DOI` (dandiset was not published yet), there is no validation, try to update datacite metadata record while keeping the same target URL | ||
- **Question to clear up**: what happens to Draft DOI if metadata record is invalid? Does it fail to update altogether? does it update only the fields it knows about? | ||
- For `Findable DOI` (dandiset was published at least once), we do not update anything since DLP points to that published version. | ||
- **TODO: figure out how to annotate Draft version, so it always says that it is a draft version and thus potentially not used for citation if that could be avoided** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To facilitate citation of dandisets which are not yet published, and immediate current use case -- dandisets with zarrs which we cannot publish yet since we cannot guarantee their versions, we could try migrating Draft
DOI to Findable upon every modification of metadata. If fails -- keep prior state (Draft). Then if Findable already and fails -- doomed to keep prior one until edits bring it to "good state".
- For `Draft DOI` (dandiset was not published yet), there is no validation, try to update datacite metadata record while keeping the same target URL | |
- **Question to clear up**: what happens to Draft DOI if metadata record is invalid? Does it fail to update altogether? does it update only the fields it knows about? | |
- For `Findable DOI` (dandiset was published at least once), we do not update anything since DLP points to that published version. | |
- **TODO: figure out how to annotate Draft version, so it always says that it is a draft version and thus potentially not used for citation if that could be avoided** | |
- For `Draft DOI` (dandiset was not published yet): try to update/make it `Findable`. | |
- If fails - keep Draft since there is no validation, try to update datacite metadata record while keeping the same target URL. | |
- **Question to clear up**: what happens to Draft DOI if metadata record is invalid? It seems to create one with no metadata, but does it update only the fields it knows about? | |
- For `Findable DOI` | |
- if it is still a draft version but which had legit metadata, we try to update metadata. If fails, we either ignore or just add a comment somewhere that "record might not reflect the most recent changes to draft version". | |
- I think we need to add to validation procedures, validation against datacite metadata record, and reporting errors to the user so that users address them before trying to publish. May be we should validate only if no other errors (our schema validation) were detected to reduce noise, or just give a summary that "Metadata is not satisfying datacite model, fix known metadata errors first." | |
- if dandiset was published at least once (has version) -- we do not update anything since DLP points to that published version. | |
- **TODO: figure out how to annotate Draft version, so it always says that it is a draft version and thus potentially not used for citation if that could be avoided** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding validation,
-
for PublishDandiset
I think we aimed to createPublishedDandised
class that has already all the fields that datacite requires, so if we are able to createPublishedDandiset
, we should be able to create dandiset.
In addition, at the end ofto_datacite
we havevalidation_datacite
that checks against datacite schema (or at least one of the versions...)
So I believe once we have published dandiset and findable doi, we should only update with new publish version and if our schema is right, we should not have problem with updating doi.
Of course, datacite can change the schema (or at least the validation function), and we could have issues. -
for Dandiset
We should runto_datacite
with optionvalidate=True
and see if the validation against datacite schema passes
I created some test to simulate the workflow in dandi/dandi-schema#275 |
A design doc composed with @djarecka to avoid dummy DOIs for dandisets
refs:
Cite As
#1932TODOs
but could already be checked out by @dandi/archive-maintainers folks since overall idea is formulated already and some early concerns/questions could already be asked/answered