Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial draft of a design document for the Zenodo like DOI per dandiset #2012

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions doc/design/doi-generation-2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# DOI for Draft Dandisets

Author: Yaroslav O. Halchenko & Dorota Jarecka

## The current approach

- [initial design doc](./doi-generation-1.md)
- overall:
- inject fake DOI upon dandiset creation
- mint proper DOI only upon dandiset publication (function `create_doi`)

### Issues with the Existing Approach

- [Stop injecting "fake" DOIs into draft dandisets](https://github.com/dandi/dandi-archive/issues/1709)
- [Unpublished Dandisets display a DOI under `Cite As`](https://github.com/dandi/dandi-archive/issues/1932)

## Proposed Solution

Initially proposed/discussed in

- [Create and maintain a "Findable" DOI for the Dandiset as a whole](https://github.com/dandi/dandi-archive/issues/1319)

and boils down to the adoption of approach of Zenodo of having a DOI which always points to the latest version of the record.

DataCite allows for three types of DOIs ([DataCite](https://support.datacite.org/docs/what-does-the-state-of-the-doi-mean)):

- `Draft`. We do not use those.
*Can be deleted, and they require only the identifier itself in order to be created or saved. They can be updated to either Registered or Findable DOIs. Registered and Findable DOIs may not be returned to the Draft state, which means that changing the state of a Draft is final.*
- `Registered`. Like `Findable` but not indexed for search, so we do not use them.
- `Findable`. Is the type we use for published dandisets.
Requires to be valid (pass validation to fit the datacite schema) to be created.

We propose to:

- Instead of a fake DOI, upon creation of a **public** dandiset, mint and use a legit `Draft DOI` `10.48324/dandi.{dandiset.id}` with
- *minimal metadata* entered during creation request (title, description, license)
- DLP URL `https://dandiarchive.org/dandiset/{dandiset.id}`
- For embargoed dandiset, **do not** specify any metadata besides the DLP URL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a thought: perhaps the doi generation process can be implemented only when dataset is unembargoed. i.e. embargoed datasets cannot be pointed to by doi (even if we get reviewer view only access). an owner would have to unembargo it.

for implementation, doi generation happens at creation for public and unembargoing for embargoed dandisets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. But one of the goals here is to get away from using "fake DOIs" (#1709) which this would "prevent".

In current design (might change) we would make overall dandiset DOI Findable only upon initial publication. As embargoed dataset would never be published, its DOI would remain Draft thus not available to users, and thus IMHO there is no harm. I think we could even keep updating it with metadata etc. That IMHO would simplify the logic and make "embargoed" less special (thus easier to code/troubleshoot etc).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current design (might change) we would make overall dandiset DOI Findable only upon initial publication.

Given demand from zarr users to get DOIs for dandisets with zarrs, we better make them Findable as soon as possible in the life cycle of those dandisets. ref: dandi/helpdesk#165 (reply in thread) . So I think we should proceed that way -- make them Findable as soon as datacite validation passes. Also inform user about datacite model issues as part of the validation.

- If minting a DOI fails, we need to raise exception to inform developers about the issue but proceed with the creation of the dandiset.
- Upon changes to dandiset metadata record, for public (non-embargoed dandisets), try to update datacite metadata record while keeping the same target URL.
- For `Draft DOI` (dandiset was not published yet), there is no validation.
- **Question to clear up**: what happens to Draft DOI if metadata record is invalid? Does it fail to update altogether? does it update only the fields it knows about?
yarikoptic marked this conversation as resolved.
Show resolved Hide resolved
- For `Findable DOI` (dandiset was published), metadata record must pass validation, so we might fail to update.
- That should be ok. Alternatively we could try to update only some most important metadata fields from the last released version of the dandiset (title, authors, ...).
- **TODO: figure out how to annotate Draft version, so it always says that it is a draft version and thus potentially not used for citation if that could be avoided**
- Upon changes to dandiset metadata record, for embargoed dandisets don't do anything.
- Upon unembargoing dandiset: update `Draft DOI` metadata record with current metadata **after** unembargoing.
- Upon publication of the dandiset:
- (already done currently) mint a proper `Findable` version DOI `10.48324/dandi.{dandiset.id}/{version}`
- update Dandiset-wide DOI (`Draft` or `Findable`) `10.48324/dandi.{dandiset.id}` with metadata provided for the version DOI, while keeping URL pointing to DLP instead of the released version.
- if Dandiset-wide DOI was in `Draft` state, it would be updated to `Findable` state (should work since we know metadata record passed validation).
- **Question to clear up**:how to do that in API
- **Question to clear up**: behavior on what happens if metadata record is invalid?

## Concerns to keep in mind/address

- Draft dandiset might not have sufficient metadata to mint a proper DOI, or metadata might not be "proper" (fail validation) thus causing issues with minting a DOI
- **Solution**: start with Draft (not findable) DOI, and then upon publication mint a "findable" DOI
- **Follow up concern**: after dandiset and DOI publish, metadata of the Draft version of the dandiset could still be changed.
This potentially making changed record again "invalid".
Should be Ok'ish
- Test site of datacite had different result of validation that the primary one

- `Findable` DOI cannot be deleted, but in principle we allow for deletion of dandisets.
- We might want a dedicated 404 page for deleted dandisets, or at least a message that the dandiset was deleted, and ideally describe the reason why it was deleted ("Upon request of maintainer", "Due to violation of terms of service", etc.)
- Then we adjust DOI record to point to that page.

- Should we do anything at dandischema level?

- Should we do anything at DLP level?

- Should we somehow reflect interactions with DataCite in Audit log?


# Targets TODO before implementation

- develop a script, which tests on test fabric of datacite changes as introduced to all dandisets in the archive by
- for each dandiset
- generate a record for overall *dandiset DOI* corresponding to metadata of the first release if any exists, otherwise corresponding to metadata of the draft version
- for each release: mint a new *version DOI* for that release + possibly update *dandiset DOI* to correspond to potential changes in metadata
- update *dandiset DOI* to metadata of draft version