-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No "identifier" to correspond to the one in datacite XSD/docs? #102
Comments
well, I guess it is not there because you do not expect people to provide it -- that "identifier" (DOI) is set by datacite. But nevertheless - it is part of the schema, so likely should be there! |
Version 4.5 uses the DOI is the only valid identifier for the XML In general the jsonschema tries to represent the DataCite json format, and not be an exact copy of the XML schema. I'll add a quick response in dandi/dandi-schema#261 but it looks like you came to the same conclusion. Feel free to reopen if you want to discuss more. |
That's what I am trying overall to grasp here -- where is the ground truth? ;) I thought it was that XML schema, with docs and jsonschema to provide alternative serializations. But if it is "DataCite json format" -- where is that one "defined" and how do it relate to XML schema? (sorry for all the questions) On that aspect, how does jsonschema relates to the schema of the "datacite json" formatted output from doi.org? Seem to also diverge quite a bit: ❯ pwd
/home/yoh/proj/datacite/inveniosoftware-datacite/datacite/schemas
❯ curl --silent -LH "Accept: application/vnd.datacite.datacite+json" https://doi.org/10.48324/DANDI.000897/0.240605.1710 | check-jsonschema --traceback-mode full --schemafile datacite-v4.5.json -
Schema validation errors were encountered.
-::$: Additional properties are not allowed ('agency', 'clientId', 'id', 'identifiers', 'providerId', 'state' were unexpected)
-::$.types: Additional properties are not allowed ('bibtex', 'citeproc', 'ris', 'schemaOrg' were unexpected)
-::$.publicationYear: 2024 is not of type 'string' here is that json pretty printed{
"id": "https://doi.org/10.48324/dandi.000897/0.240605.1710",
"doi": "10.48324/DANDI.000897/0.240605.1710",
"url": "https://dandiarchive.org/dandiset/000897/0.240605.1710",
"types": {
"ris": "DATA",
"bibtex": "misc",
"citeproc": "dataset",
"schemaOrg": "Dataset",
"resourceType": "Neural Data",
"resourceTypeGeneral": "Dataset"
},
"creators": [
{
"name": "Neupane, Sujaya",
"nameType": "Personal",
"givenName": "Sujaya",
"familyName": "Neupane",
"affiliation": [],
"nameIdentifiers": [
{
"schemeUri": "https://orcid.org/",
"nameIdentifier": "0000-0002-0052-3122",
"nameIdentifierScheme": "ORCID"
}
]
},
{
"name": "Fiete, Ila",
"nameType": "Personal",
"givenName": "Ila",
"familyName": "Fiete",
"affiliation": [],
"nameIdentifiers": [
{
"schemeUri": "https://orcid.org/",
"nameIdentifier": "0000-0003-4738-2539",
"nameIdentifierScheme": "ORCID"
}
]
},
{
"name": "Jazayeri, Mehrdad",
"nameType": "Personal",
"givenName": "Mehrdad",
"familyName": "Jazayeri",
"affiliation": [],
"nameIdentifiers": [
{
"schemeUri": "https://orcid.org/",
"nameIdentifier": "0000-0002-9764-6961",
"nameIdentifierScheme": "ORCID"
}
]
}
],
"titles": [
{
"title": "Neupane_Fiete_Jazayeri_Mental navigation_NHP_EntorhinalCortex"
}
],
"publisher": {
"name": "DANDI Archive"
},
"subjects": [
{
"subject": "entorhinal cortex, cognitive map, mental navigation,"
}
],
"contributors": [
{
"name": "Neupane, Sujaya",
"nameType": "Personal",
"givenName": "Sujaya",
"familyName": "Neupane",
"affiliation": [],
"contributorType": "ContactPerson",
"nameIdentifiers": [
{
"schemeUri": "https://orcid.org/",
"nameIdentifier": "0000-0002-0052-3122",
"nameIdentifierScheme": "ORCID"
}
]
},
{
"name": "Jazayeri, Mehrdad",
"nameType": "Personal",
"givenName": "Mehrdad",
"familyName": "Jazayeri",
"affiliation": [],
"contributorType": "ContactPerson",
"nameIdentifiers": [
{
"schemeUri": "https://orcid.org/",
"nameIdentifier": "0000-0002-9764-6961",
"nameIdentifierScheme": "ORCID"
}
]
}
],
"publicationYear": 2024,
"identifiers": [
{
"identifier": "https://identifiers.org/DANDI:000897/0.240605.1710",
"identifierType": "URL"
},
{
"identifier": "https://dandiarchive.org/dandiset/000897/0.240605.1710",
"identifierType": "URL"
}
],
"rightsList": [
{
"rightsIdentifier": "cc_by_40",
"rightsIdentifierScheme": "SPDX"
}
],
"descriptions": [
{
"description": "The dataset contains electrophysiology data recorded from the entorhinal cortex of two NHPs performing a mental navigation task. The recording probes used were V-probe with 32 channels or 64 channels, manufactured by Plexon Inc. ",
"descriptionType": "Abstract"
}
],
"fundingReferences": [
{
"funderName": "National Institute of Mental Health",
"awardNumber": "NIMH-MH129046",
"funderIdentifier": "https://ror.org/05xj56w78",
"funderIdentifierType": "ROR"
},
{
"funderName": "Natural Science and Engineering Council of Canada",
"awardNumber": "NSERC PDF-516867-2018",
"funderIdentifier": "https://ror.org/01h531d29",
"funderIdentifierType": "ROR"
}
],
"schemaVersion": "http://datacite.org/schema/kernel-4",
"providerId": "dartlib",
"clientId": "dartlib.dandi",
"agency": "datacite",
"state": "findable"
} |
oh, and there is yet another json format model in datacite api output ❯ curl --silent -L "https://api.datacite.org/dois/10.48324/DANDI.000897/0.240605.1710" | jq . >| /tmp/dandi-000897-datacite-api.json
❯ jq .data /tmp/dandi-000897-datacite-api.json | check-jsonschema --traceback-mode full --schemafile datacite-v4.5.json -
Schema validation errors were encountered.
-::$: Additional properties are not allowed ('attributes', 'id', 'relationships', 'type' were unexpected)
-::$: 'creators' is a required property
-::$: 'titles' is a required property
-::$: 'publisher' is a required property
-::$: 'publicationYear' is a required property
-::$: 'types' is a required property
-::$: 'schemaVersion' is a required property My poor brain needs a diagram ... here is what chatgpt gave me (I didn't even try to correct since lacking the picture) -- could you improve relationships there to be more reflective of the situation? (can live edit on https://mermaid.live) graph TD
XML[DataCite XML Model] -- Basis for --> Doc[DataCite Documentation]
Doc -- Explains mapping to --> JSONSchema[JSON Schema Model]
JSONSchema -- Implements --> API[DataCite API Model]
API -- Serves --> DOI[doi.org vnd.datacite.datacite+json Model]
XML -- Provides structure for --> DOI
Doc -- Guides --> API
JSONSchema -- Validates --> DOI
|
For the last curl call you want the content in The ground truth for me is the XML and json representations at https://github.com/inveniosoftware/datacite/tree/master/tests/data. This comes back to #101 and how we should have a more transparent process for generating those. I'll try to tweak your diagram....one sec |
The root of the problem is that DataCite doesn't make official JSON representations available, nor make a JSON schema available. You can't use the JSON from So we do the best we can. We make our own "DataCite JSON Model" that works as both a representation of the metadata and converts between XML and JSON formats. We test that our examples work for DOI minting, and keep fixed examples at https://github.com/inveniosoftware/datacite/tree/master/tests/data. We make changes each version to try to make the jsonschema work well....but since it's not official there's no guarantees we make the right decisions on things. Here's my version of the diagram. |
ah, cool, indeed -- I should have checked -- looks closer, although also does not validate ❯ jq .data.attributes /tmp/dandi-000897-datacite-api.json | check-jsonschema --traceback-mode full --schemafile datacite-v4.5.json -
Schema validation errors were encountered.
-::$: Additional properties are not allowed ('citationCount', 'citationsOverTime', 'contentUrl', 'created', 'downloadCount', 'downloadsOverTime', 'identifiers', 'isActive', 'metadataVersion', 'partCount', 'partOfCount', 'published', 'reason', 'referenceCount', 'registered', 'source', 'state', 'updated', 'versionCount', 'versionOfCount', 'viewCount', 'viewsOverTime', 'xml' were unexpected)
-::$.types: Additional properties are not allowed ('bibtex', 'citeproc', 'ris', 'schemaOrg' were unexpected)
-::$.publisher: 'DANDI Archive' is not of type 'object'
-::$.publicationYear: 2024 is not of type 'string'
-::$.language: None is not of type 'string'
-::$.version: None is not of type 'string'
so it is the "DataCite XML ", good! JSON - you mean examples? Correct me if I am wrong, following your explanation those "official" XML examples are then converted into JSON using your tools and validated against datacite fabric... right?
But they do operate on JSON records. So how do they verify input records or do produce output records -- is there a model or there is only "in code" implementation of XML model? Are sources available?
Just to make sure -- you have that automated, right? FWIW and FTR, here is our testing against api.test.datacite.org -- https://github.com/dandi/dandi-schema/blob/master/dandischema/datacite/tests/test_datacite.py . |
I tried to separate out notions of JSON vs JSON Model etc... but just ended up with a total mess ;) giving up on my artsy attempts for now---
DataCite Representations and actors
---
flowchart TD
XMLModel[DataCite XML Model] -- Basis for --> Doc[DataCite Documentation]
XMLModel -- Informs --> JSONModel[DataCite JSON Model]
XML -- Alows user to mint --> DOI[DOI]
XML -- instantiates --> XMLModel
JSON -- instantiates --> JSONModel
JSON -- Allows user to mint --> DOI
JSONModel -- Enhancement --> Representation[doi.org vnd.datacite.datacite+json Model]
JSONSchema -- Validates --> JSON
JSON -- Validated against --> test[api.test.datacite fabric]
XMLModel -- Validated against --> test
test -- Implements --> XMLModel
click XMLModel "https://github.com/datacite/schema"
click Doc "https://datacite-metadata-schema.readthedocs.io"
click JSONModel "https://github.com/inveniosoftware/datacite"
|
You also need to add We wouldn't include the additional properties...they are added by DataCite but they aren't metadata.
Yup!
As far as I know only "in code". I believe https://github.com/datacite/bolognese does the serialization and https://github.com/datacite/lupo does the API endpoints....but it's not particularly aproachable code.
Yup Line 101 in 26f3974
|
FWIW I did file a sample issue Let's see what it "brings" if anything. |
Package version (if known): currently v1.1.2-22-g26f3974
Describe the bug
There is "Identifier" defined
among the rest:
but it seems that jsonschema does not have it defined anywhere... for comparison to above here are the properties in jsonschema
The text was updated successfully, but these errors were encountered: