-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial draft of a design document for the Zenodo like DOI per dandiset #2012
Open
yarikoptic
wants to merge
4
commits into
master
Choose a base branch
from
enh-doi-draft
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 3 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
922e56a
Initial draft of a design document for the Zenodo like DOI per dandiset
yarikoptic 92522f8
Elaborate plan a little more and add TODO for a test script across da…
yarikoptic 1d68eea
fix indentation
yarikoptic 370fd32
Simplify operation -- no changes to dandiset wide DOI record for a "P…
yarikoptic File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# DOI for Draft Dandisets | ||
|
||
Author: Yaroslav O. Halchenko & Dorota Jarecka | ||
|
||
## The current approach | ||
|
||
- [initial design doc](./doi-generation-1.md) | ||
- overall: | ||
- inject fake DOI upon dandiset creation | ||
- mint proper DOI only upon dandiset publication (function `create_doi`) | ||
|
||
### Issues with the Existing Approach | ||
|
||
- [Stop injecting "fake" DOIs into draft dandisets](https://github.com/dandi/dandi-archive/issues/1709) | ||
- [Unpublished Dandisets display a DOI under `Cite As`](https://github.com/dandi/dandi-archive/issues/1932) | ||
|
||
## Proposed Solution | ||
|
||
Initially proposed/discussed in | ||
|
||
- [Create and maintain a "Findable" DOI for the Dandiset as a whole](https://github.com/dandi/dandi-archive/issues/1319) | ||
|
||
and boils down to the adoption of approach of Zenodo of having a DOI which always points to the latest version of the record. | ||
|
||
DataCite allows for three types of DOIs ([DataCite](https://support.datacite.org/docs/what-does-the-state-of-the-doi-mean)): | ||
|
||
- `Draft`. We do not use those. | ||
*Can be deleted, and they require only the identifier itself in order to be created or saved. They can be updated to either Registered or Findable DOIs. Registered and Findable DOIs may not be returned to the Draft state, which means that changing the state of a Draft is final.* | ||
- `Registered`. Like `Findable` but not indexed for search, so we do not use them. | ||
- `Findable`. Is the type we use for published dandisets. | ||
Requires to be valid (pass validation to fit the datacite schema) to be created. | ||
|
||
We propose to: | ||
|
||
- Instead of a fake DOI, upon creation of a **public** dandiset, mint and use a legit `Draft DOI` `10.48324/dandi.{dandiset.id}` with | ||
- *minimal metadata* entered during creation request (title, description, license) | ||
- DLP URL `https://dandiarchive.org/dandiset/{dandiset.id}` | ||
- For embargoed dandiset, **do not** specify any metadata besides the DLP URL. | ||
- If minting a DOI fails, we need to raise exception to inform developers about the issue but proceed with the creation of the dandiset. | ||
- Upon changes to dandiset metadata record, for public (non-embargoed dandisets), try to update datacite metadata record while keeping the same target URL. | ||
- For `Draft DOI` (dandiset was not published yet), there is no validation. | ||
- **Question to clear up**: what happens to Draft DOI if metadata record is invalid? Does it fail to update altogether? does it update only the fields it knows about? | ||
yarikoptic marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- For `Findable DOI` (dandiset was published), metadata record must pass validation, so we might fail to update. | ||
- That should be ok. Alternatively we could try to update only some most important metadata fields from the last released version of the dandiset (title, authors, ...). | ||
- **TODO: figure out how to annotate Draft version, so it always says that it is a draft version and thus potentially not used for citation if that could be avoided** | ||
- Upon changes to dandiset metadata record, for embargoed dandisets don't do anything. | ||
- Upon unembargoing dandiset: update `Draft DOI` metadata record with current metadata **after** unembargoing. | ||
- Upon publication of the dandiset: | ||
- (already done currently) mint a proper `Findable` version DOI `10.48324/dandi.{dandiset.id}/{version}` | ||
- update Dandiset-wide DOI (`Draft` or `Findable`) `10.48324/dandi.{dandiset.id}` with metadata provided for the version DOI, while keeping URL pointing to DLP instead of the released version. | ||
- if Dandiset-wide DOI was in `Draft` state, it would be updated to `Findable` state (should work since we know metadata record passed validation). | ||
- **Question to clear up**:how to do that in API | ||
- **Question to clear up**: behavior on what happens if metadata record is invalid? | ||
|
||
## Concerns to keep in mind/address | ||
|
||
- Draft dandiset might not have sufficient metadata to mint a proper DOI, or metadata might not be "proper" (fail validation) thus causing issues with minting a DOI | ||
- **Solution**: start with Draft (not findable) DOI, and then upon publication mint a "findable" DOI | ||
- **Follow up concern**: after dandiset and DOI publish, metadata of the Draft version of the dandiset could still be changed. | ||
This potentially making changed record again "invalid". | ||
Should be Ok'ish | ||
- Test site of datacite had different result of validation that the primary one | ||
|
||
- `Findable` DOI cannot be deleted, but in principle we allow for deletion of dandisets. | ||
- We might want a dedicated 404 page for deleted dandisets, or at least a message that the dandiset was deleted, and ideally describe the reason why it was deleted ("Upon request of maintainer", "Due to violation of terms of service", etc.) | ||
- Then we adjust DOI record to point to that page. | ||
|
||
- Should we do anything at dandischema level? | ||
|
||
- Should we do anything at DLP level? | ||
|
||
- Should we somehow reflect interactions with DataCite in Audit log? | ||
|
||
|
||
# Targets TODO before implementation | ||
|
||
- develop a script, which tests on test fabric of datacite changes as introduced to all dandisets in the archive by | ||
- for each dandiset | ||
- generate a record for overall *dandiset DOI* corresponding to metadata of the first release if any exists, otherwise corresponding to metadata of the draft version | ||
- for each release: mint a new *version DOI* for that release + possibly update *dandiset DOI* to correspond to potential changes in metadata | ||
- update *dandiset DOI* to metadata of draft version | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a thought: perhaps the doi generation process can be implemented only when dataset is unembargoed. i.e. embargoed datasets cannot be pointed to by doi (even if we get reviewer view only access). an owner would have to unembargo it.
for implementation, doi generation happens at creation for public and unembargoing for embargoed dandisets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could. But one of the goals here is to get away from using "fake DOIs" (#1709) which this would "prevent".
In current design (might change) we would make overall dandiset DOI Findable only upon initial publication. As embargoed dataset would never be published, its DOI would remain
Draft
thus not available to users, and thus IMHO there is no harm. I think we could even keep updating it with metadata etc. That IMHO would simplify the logic and make "embargoed" less special (thus easier to code/troubleshoot etc).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given demand from zarr users to get DOIs for dandisets with zarrs, we better make them Findable as soon as possible in the life cycle of those dandisets. ref: dandi/helpdesk#165 (reply in thread) . So I think we should proceed that way -- make them Findable as soon as datacite validation passes. Also inform user about datacite model issues as part of the validation.