Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribution identifier is missing. #142

Open
hcvdwerf opened this issue Sep 1, 2024 · 9 comments
Open

Distribution identifier is missing. #142

hcvdwerf opened this issue Sep 1, 2024 · 9 comments
Assignees

Comments

@hcvdwerf
Copy link
Contributor

hcvdwerf commented Sep 1, 2024

As researcher I want to request a distribution . I expect an expect an identifier within distribution to make linking possible. I think it should be mandatory

@HNeikes HNeikes assigned HNeikes and brunasv and unassigned HNeikes Sep 4, 2024
@brunasv
Copy link
Contributor

brunasv commented Sep 5, 2024

Distribution in dcat doesn't have an identifier; the Dataset to which such Distribution is connected has an identifier. Distribution has an access URL (or download URL). I think the FAIR data point creates a sort of identifier to link all layers (catalog, dataset, distribution) this would be nice to check w @kburger.

@JackBroeren
Copy link

I would like to have a distribution identifier as well. If there are multiple distributions in a dataset with the same data but different formats i would like to select the distribution in the format that is most convenient for me. So i would like to be able to specify in a request tool that i want a specific distribition: an identifier would be helpfull, preferably explicitly provided by a data-holder and not generated internally by a tool like FDP. Suppose i don't use an FDP ?

@hcvdwerf
Copy link
Contributor Author

@JackBroeren but when the format is different the acces url is also different. data.csv vs data.xml. Is that what you mean ? I can imagine that you don’t want to rely on that only

@JackBroeren
Copy link

@hcvdwerf @brunasv The accessURL is foreseen as a URL that contains a reference to a html page that describes the process how to get access to the distribution. I would not like to have the distibution identifier hidden in this url but explicitly defined. I saw a EHDS demo last friday of the catalog and the request process. In that demo a requester does not request access to a dataset but to a specific distribution, like i said before in my example. So a distribution ID is necessary (imho) and it should be specified bij de dataholder, and not by some tool in the middle (different tools can create the same identifier: bytheway: i would prefer a global unique (persistent) identifier). A distribution ID should be part of the metadata (like a keyword or a theme).

@kburger
Copy link
Contributor

kburger commented Oct 14, 2024

Both dcat-ap (3) and healthdcat do not specify a dct:identifier on distribution (or catalog for that matter), only on a dataset. If it is decided to add it, we'd have to be careful with the cardinality.

@jundahuang9123
Copy link

@JackBroeren accessURL is more than enough to refer to the distribution. The URL can be used to describe how to get access but it is intended to point toward the distribution. EHDS focuses the accessRight entirely on the Dataset level, thus request can only be made at the datasets. I didnt see the demo, so not sure if there is misleading there. As you mentioned before that you dont use a FDP, what do you use then to expose your metadata? In FDP the distribution level metadata are not exposed per catalog/dataset. If there is no expose of the metadata/ not exposed at distribution level, the identifier, however persistent and global, would not be of use to data users/requesters.

@jambelien
Copy link

jambelien commented Oct 15, 2024

Based on this very good discussion and various viewpoints: @HNeikes should we discuss this in the Geonovum DCAT AP expert group as well? So import/copy this Geonovum repo?

@HNeikes
Copy link
Contributor

HNeikes commented Oct 22, 2024

@NielsHoffmann
Copy link

To add a different perspective to this. In RDF a class typically has a URI (by which it can be identified/dereferenced), unless the instance is defined as a 'blank node'. From my perspective a URI would already be sufficient as identifier and you do not need a separate declared property.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants