Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Migration API documentation improvements #11192

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

mjlassila
Copy link

@mjlassila mjlassila commented Jan 28, 2025

What this PR does / why we need it:
This PR is related to topic Metadata format for Dataset Migration API in 6.5 in the chat.

I added a mention to the docs that the OAI-ORE format used in the export is not directly compatible with the Dataset Migration API and made the example file more comprehensive by including few additional fields from Geospatial and Social Science and Humanities Metadata blocks.

Preview at https://dataverse-guide--11192.org.readthedocs.build/en/11192/developers/dataset-migration-api.html

@qqmyers
Copy link
Member

qqmyers commented Jan 28, 2025

FWIW: The geospatial and social blocks are given local context URIs because those blocks do not have a blockURI defined. See the docs or the citation block for how that can be done -

#metadataBlock name dataverseAlias displayName blockURI
citation Citation Metadata https://dataverse.org/schema/citation/
.

Without that, a block is considered to be locally defined with terms different (w.r.t. URI) from those at other instances. That's possibly useful for experimental blocks, or modified ones, but leads to the problem with the migration API (and I assume semantic API as well) unless the receiving Dataverse instance has blocks that set the blockURI to be the same as the auto-generated one in the source Dataverse.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial thoughts. Thanks for the PR! ❤️

@@ -5,7 +5,7 @@ The Dataverse software includes several ways to add Datasets originally created

This experimental migration API offers an additional option with some potential advantages:

* metadata can be specified using the json-ld format used in the OAI-ORE metadata export
* metadata can be specified using the json-ld format used in the OAI-ORE metadata export (please note that the json-ld generated by OAI-ORE metadata export is not directly compatible with the Migration API, check example file below for reference)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we say a little more about how the generated OAI-ORE format is not compatible? I'm a bit confused and not sure what I should be taking away from a look at the example file. 🤔

@@ -31,7 +31,7 @@ To import a dataset with an existing persistent identifier (PID), the provided j
curl -H X-Dataverse-key:$API_TOKEN -X POST $SERVER_URL/api/dataverses/$DATAVERSE_ID/datasets/:startmigration --upload-file dataset-migrate.jsonld
An example jsonld file is available at :download:`dataset-migrate.jsonld <../_static/api/dataset-migrate.jsonld>` . Note that you would need to replace the PID in the sample file with one supported in your Dataverse instance.
An example jsonld file is available at :download:`dataset-migrate.jsonld <../_static/api/dataset-migrate.jsonld>` . Note that you would need to replace the PID in the sample file with one supported in your Dataverse instance. You also need to replace the dataverse.siteUrl in jsonld @context with your current Dataverse site URL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just putting this comment at the bottom.

@mjlassila from our conversation I was sort of hoping you'd include your BaseX and XQuery script at https://gist.github.com/mjlassila/ecdbd11447ccdf87995db20bfc5e686c 😄 . It it worth writing up and including? (I don't even know how to run it but I'm happy to learn!)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdurbin I added details about OAI-ORE format used in the export and included simple XQuery script for conversion. XQuery is a rather esoteric language and not very well known, but it is highly powerful in many use cases, such as format conversion.

@mjlassila
Copy link
Author

FWIW: The geospatial and social blocks are given local context URIs because those blocks do not have a blockURI defined. See the docs or the citation block for how that can be done [...]
Without that, a block is considered to be locally defined with terms different (w.r.t. URI) from those at other instances.

@qqmyers : Thanks for clarification! If out of the box geospatial and social blocks do not have blockURI defined, should I keep the example file as it is or modify it somehow? If I leave these out from the example file @context, citation metadata gets imported via Dataset Migration API but geospatial and social blocks will drop out silently.

@qqmyers
Copy link
Member

qqmyers commented Jan 29, 2025

@mjlassila - that's hard to answer clearly. I just created #11195 as an issue to get those blocks changed. Once a change is merged there, we probably want the docs to talk about the current situation and probably still note that migrating from earlier versions has the problem you're documenting. To not hold up your PR, I'd suggest that you just go ahead and document how it is now (add additional explanation if you want) and we'll treat updating the docs as part of completing #11195.

…uggest using XQuery to do json-ld transformation and add simple script for reference.
@cmbz cmbz added FY25 Sprint 15 FY25 Sprint 15 (2025-01-15 - 2025-01-29) FY25 Sprint 16 FY25 Sprint 16 (2025-01-29 - 2025-02-12) labels Jan 29, 2025
@mjlassila
Copy link
Author

To not hold up your PR, I'd suggest that you just go ahead and document how it is now (add additional explanation if you want) and we'll treat updating the docs as part of completing #11195.

@qqmyers - I added a short explanation of why defining a local URI for community metadata blocks is currently necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 15 FY25 Sprint 15 (2025-01-15 - 2025-01-29) FY25 Sprint 16 FY25 Sprint 16 (2025-01-29 - 2025-02-12) Size: 3 A percentage of a sprint. 2.1 hours.
Projects
Status: Ready for Review ⏩
Development

Successfully merging this pull request may close these issues.

5 participants