Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow links between dataset types and metadata blocks #11001

Open
wants to merge 20 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
264d648
allow links between dataset types and metadata blocks #10519
pdurbin Nov 4, 2024
1b45992
Merge branch 'develop' into 10519-dataset-types #10519
pdurbin Nov 5, 2024
b6fb92b
Changed: using JPA criteria instead of code looping for DatasetType q…
GPortas Nov 5, 2024
399fe8d
add missing "Command" suffix #10519
pdurbin Nov 5, 2024
be83ada
remove debug line #10519
pdurbin Nov 5, 2024
a7eee45
improve docs #10519
pdurbin Nov 5, 2024
4c45fc0
rename "ownerDataverse" to "dataverse", remove TODOs #10519
pdurbin Nov 5, 2024
4f77a3b
enable codeMeta20 test, disable size=6 test #10519
pdurbin Nov 6, 2024
e7652db
Fixed: removed distinct modifier in JPA query to avoid ignoring results
GPortas Nov 7, 2024
5afad8e
load codemeta metadata block in docker dev persona #10519
pdurbin Nov 8, 2024
1ffeee7
Merge branch 'develop' into 10519-dataset-types #10519
pdurbin Dec 13, 2024
d8a0a54
Merge branch 'develop' of github.com:IQSS/dataverse into 10519-datase…
GPortas Jan 8, 2025
71096f2
fix merge conflicts
stevenwinship Jan 21, 2025
a054737
fix merge conflicts
stevenwinship Jan 21, 2025
a9ed64e
Merge branch 'develop' into 10519-dataset-types #10519
pdurbin Jan 23, 2025
faf1d37
fix JsonPrinter and failing tests #10519
pdurbin Jan 23, 2025
2773c2e
cleanup #10519
pdurbin Jan 23, 2025
e52052a
remove unnecessary sending of content type #10519
pdurbin Jan 23, 2025
57e0c71
clarify docs #10519
pdurbin Jan 23, 2025
9b2fb8d
stop loading Codemeta in Docker, add setDisplayOnCreate API #10519
pdurbin Jan 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions doc/release-notes/10519-dataset-types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
## Dataset Types can be linked to Metadata Blocks

Metadata blocks (e.g. "CodeMeta") can now be linked to dataset types (e.g. "software") using new superuser APIs.

This will have the following effects for the APIs used by the new Dataverse UI ( https://github.com/IQSS/dataverse-frontend ):

- The list of fields shown when creating a dataset will include fields marked as "displayoncreate" (in the tsv/database) for metadata blocks (e.g. "CodeMeta") that are linked to the dataset type (e.g. "software") that is passed to the API.
- The metadata blocks shown when editing a dataset will include metadata blocks (e.g. "CodeMeta") that are linked to the dataset type (e.g. "software") that is passed to the API.

The CodeMeta metadata block is now available in the Dockerized development environment.

For more information, see the guides and #10519.
39 changes: 39 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -529,6 +529,8 @@ The fully expanded example above (without environment variables) looks like this

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "https://demo.dataverse.org/api/dataverses/root/assignments/6"

.. _list-metadata-blocks-for-a-collection:

List Metadata Blocks Defined on a Dataverse Collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -556,6 +558,9 @@ This endpoint supports the following optional query parameters:

- ``returnDatasetFieldTypes``: Whether or not to return the dataset field types present in each metadata block. If not set, the default value is false.
- ``onlyDisplayedOnCreate``: Whether or not to return only the metadata blocks that are displayed on dataset creation. If ``returnDatasetFieldTypes`` is true, only the dataset field types shown on dataset creation will be returned within each metadata block. If not set, the default value is false.
- ``datasetType``: Whether or not to return additional fields from metadata blocks that are linked with a particular dataset type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this parameter also have an action prefix ('returnDataType')? The other params start with the action 'return' or 'onlyDisplayOn'. It's unclear if this is a true/false or a list of datasetTypes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add it to the example to help clarify it's usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see further down that this is not a true/false but value (I.e. 'software'). Maybe rephrase by removing the 'whether or not'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I fixed up the docs in 57e0c71. Please take a look.


are displayed on dataset creation. If ``returnDatasetFieldTypes`` is true, only the dataset field types shown on dataset creation will be returned within each metadata block. If not set, the default value is false.

An example using the optional query parameters is presented below:

Expand All @@ -573,6 +578,8 @@ The fully expanded example above (without environment variables) looks like this

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/dataverses/root/metadatablocks?returnDatasetFieldTypes=true&onlyDisplayedOnCreate=true"

.. _define-metadata-blocks-for-a-dataverse-collection:

Define Metadata Blocks for a Dataverse Collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand All @@ -598,6 +605,8 @@ The fully expanded example above (without environment variables) looks like this

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST -H "Content-type:application/json" --upload-file define-metadatablocks.json "https://demo.dataverse.org/api/dataverses/root/metadatablocks"

An alternative to defining metadata blocks at a collection level is to create and use a dataset type. See :ref:`api-link-dataset-type`.

Determine if a Dataverse Collection Inherits Its Metadata Blocks from Its Parent
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -3250,6 +3259,36 @@ The fully expanded example above (without environment variables) looks like this

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "https://demo.dataverse.org/api/datasets/datasetTypes/3"

.. _api-link-dataset-type:

Link Dataset Type with Metadata Blocks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Linking a dataset type with one or more metadata blocks results in additional fields from those blocks appearing in the output from the :ref:`list-metadata-blocks-for-a-collection` API endpoint. The new frontend for Dataverse (https://github.com/IQSS/dataverse-frontend) uses the JSON output from this API endpoint to construct the page that users see when creating or editing a dataset. Once the frontend has been updated to pass in the dataset type (https://github.com/IQSS/dataverse-client-javascript/issues/210), specifying a dataset type in this way can be an alternative way to display additional metadata fields than the traditional method, which is to enabled a metadata block at the collection level (see :ref:`define-metadata-blocks-for-a-dataverse-collection`).

For example, a superuser could create a type called "software" and link it to the "CodeMeta" metadata block (this example is below). Then, once the new front end allows it, the user can specify that they want to create a dataset of type software and see the additional metadata fields from the CodeMeta block when creating or editing their dataset.

This API endpoint is superuser only.

.. code-block:: bash

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export TYPE=software
export JSON='["codeMeta20"]'

curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-Type: application/json" "$SERVER_URL/api/datasets/datasetTypes/$TYPE" -X POST -d $JSON

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -H "Content-Type: application/json" "https://demo.dataverse.org/api/datasets/datasetTypes/software" -X POST -d '["codeMeta20"]'

To update the blocks that are linked, send an array with those blocks.

To remove all links to blocks, send an empty array.

Files
-----

Expand Down
6 changes: 4 additions & 2 deletions doc/sphinx-guides/source/user/dataset-management.rst
Original file line number Diff line number Diff line change
Expand Up @@ -790,21 +790,23 @@ If you deaccession the most recently published version of the dataset but not al
Dataset Types
=============

.. note:: Development of the dataset types feature is ongoing. Please see https://github.com/IQSS/dataverse-pm/issues/307 for details.

Out of the box, all datasets have a dataset type of "dataset". Superusers can add additional types such as "software" or "workflow" using the :ref:`api-add-dataset-type` API endpoint.

Once more than one type appears in search results, a facet called "Dataset Type" will appear allowing you to filter down to a certain type.

If your installation is configured to use DataCite as a persistent ID (PID) provider, the appropriate type ("Dataset", "Software", "Workflow") will be sent to DataCite when the dataset is published for those three types.

Currently, the dataset type can only be specified via API and only when the dataset is created. For details, see the following sections of the API guide:
Currently, specifying a type for a dataset can only be done via API and only when the dataset is created. The type can't currently be changed afterward. For details, see the following sections of the API guide:

- :ref:`api-create-dataset-with-type` (Native API)
- :ref:`api-semantic-create-dataset-with-type` (Semantic API)
- :ref:`import-dataset-with-type`

Dataset types can be listed, added, or deleted via API. See :ref:`api-dataset-types` in the API Guide for more.

Development of the dataset types feature is ongoing. Please see https://github.com/IQSS/dataverse/issues/10489 for details.
Dataset types can be linked with metadata blocks to make fields from those blocks available when datasets of that type are created or edited. See :ref:`api-link-dataset-type` for details.

.. |image1| image:: ./img/DatasetDiagram.png
:class: img-responsive
Expand Down
2 changes: 2 additions & 0 deletions docker-compose-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ services:
- dev
networks:
- dataverse
volumes:
- ./docker-dev-volumes/solr/data:/var/solr

dev_dv_initializer:
container_name: "dev_dv_initializer"
Expand Down
9 changes: 9 additions & 0 deletions modules/container-configbaker/scripts/bootstrap/dev/init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,15 @@ export API_TOKEN
# ${ENV_OUT} comes from bootstrap.sh and will expose the saved information back to the host if enabled.
echo "API_TOKEN=${API_TOKEN}" >> "${ENV_OUT}"

echo "Loading CodeMeta metadata block (needed for API tests)..."
curl "${DATAVERSE_URL}/api/admin/datasetfield/load" -X POST --data-binary @/scripts/bootstrap/base/data/metadatablocks/codemeta.tsv -H "Content-type: text/tab-separated-values"

echo "Fetching Solr schema from Dataverse and running update-fields.sh..."
curl "${DATAVERSE_URL}/api/admin/index/solr/schema" | /scripts/update-fields.sh /var/solr/data/collection1/conf/schema.xml

echo "Reloading Solr..."
curl "http://solr:8983/solr/admin/cores?action=RELOAD&core=collection1"

echo "Publishing root dataverse..."
curl -H "X-Dataverse-key:$API_TOKEN" -X POST "${DATAVERSE_URL}/api/dataverses/:root/actions/:publish"

Expand Down
81 changes: 67 additions & 14 deletions src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
package edu.harvard.iq.dataverse;

import edu.harvard.iq.dataverse.dataset.DatasetType;
import java.io.IOException;
import java.io.StringReader;
import java.net.URI;
Expand Down Expand Up @@ -871,15 +872,15 @@ public List<DatasetFieldType> findAllDisplayedOnCreateInMetadataBlock(MetadataBl
Root<MetadataBlock> metadataBlockRoot = criteriaQuery.from(MetadataBlock.class);
Root<DatasetFieldType> datasetFieldTypeRoot = criteriaQuery.from(DatasetFieldType.class);

Predicate requiredInDataversePredicate = buildRequiredInDataversePredicate(criteriaBuilder, datasetFieldTypeRoot);
Predicate fieldRequiredInTheInstallation = buildFieldRequiredInTheInstallationPredicate(criteriaBuilder, datasetFieldTypeRoot);

criteriaQuery.where(
criteriaBuilder.and(
criteriaBuilder.equal(metadataBlockRoot.get("id"), metadataBlock.getId()),
datasetFieldTypeRoot.in(metadataBlockRoot.get("datasetFieldTypes")),
criteriaBuilder.or(
criteriaBuilder.isTrue(datasetFieldTypeRoot.get("displayOnCreate")),
requiredInDataversePredicate
fieldRequiredInTheInstallation
)
)
);
Expand All @@ -890,16 +891,39 @@ public List<DatasetFieldType> findAllDisplayedOnCreateInMetadataBlock(MetadataBl
return typedQuery.getResultList();
}

public List<DatasetFieldType> findAllInMetadataBlockAndDataverse(MetadataBlock metadataBlock, Dataverse dataverse, boolean onlyDisplayedOnCreate) {
public List<DatasetFieldType> findAllInMetadataBlockAndDataverse(MetadataBlock metadataBlock, Dataverse dataverse, boolean onlyDisplayedOnCreate, DatasetType datasetType) {
if (!dataverse.isMetadataBlockRoot() && dataverse.getOwner() != null) {
return findAllInMetadataBlockAndDataverse(metadataBlock, dataverse.getOwner(), onlyDisplayedOnCreate);
return findAllInMetadataBlockAndDataverse(metadataBlock, dataverse.getOwner(), onlyDisplayedOnCreate, datasetType);
}

CriteriaBuilder criteriaBuilder = em.getCriteriaBuilder();
CriteriaQuery<DatasetFieldType> criteriaQuery = criteriaBuilder.createQuery(DatasetFieldType.class);

Root<MetadataBlock> metadataBlockRoot = criteriaQuery.from(MetadataBlock.class);
Root<DatasetFieldType> datasetFieldTypeRoot = criteriaQuery.from(DatasetFieldType.class);

// Build the main predicate to include fields that belong to the specified dataverse and metadataBlock and match the onlyDisplayedOnCreate value.
Predicate fieldPresentInDataverse = buildFieldPresentInDataversePredicate(dataverse, onlyDisplayedOnCreate, criteriaQuery, criteriaBuilder, datasetFieldTypeRoot, metadataBlockRoot);

// Build an additional predicate to include fields from the datasetType, if the datasetType is specified and contains the given metadataBlock.
Predicate fieldPresentInDatasetType = buildFieldPresentInDatasetTypePredicate(datasetType, criteriaQuery, criteriaBuilder, datasetFieldTypeRoot, metadataBlockRoot, onlyDisplayedOnCreate);

// Build the final WHERE clause by combining all the predicates.
criteriaQuery.where(
criteriaBuilder.equal(metadataBlockRoot.get("id"), metadataBlock.getId()), // Match the MetadataBlock ID.
datasetFieldTypeRoot.in(metadataBlockRoot.get("datasetFieldTypes")), // Ensure the DatasetFieldType is part of the MetadataBlock.
criteriaBuilder.or(
fieldPresentInDataverse,
fieldPresentInDatasetType
)
);

criteriaQuery.select(datasetFieldTypeRoot);

return em.createQuery(criteriaQuery).getResultList();
}

private Predicate buildFieldPresentInDataversePredicate(Dataverse dataverse, boolean onlyDisplayedOnCreate, CriteriaQuery<DatasetFieldType> criteriaQuery, CriteriaBuilder criteriaBuilder, Root<DatasetFieldType> datasetFieldTypeRoot, Root<MetadataBlock> metadataBlockRoot) {
Root<Dataverse> dataverseRoot = criteriaQuery.from(Dataverse.class);

// Join Dataverse with DataverseFieldTypeInputLevel on the "dataverseFieldTypeInputLevels" attribute, using a LEFT JOIN.
Expand Down Expand Up @@ -930,7 +954,7 @@ public List<DatasetFieldType> findAllInMetadataBlockAndDataverse(MetadataBlock m
Predicate hasNoInputLevelPredicate = criteriaBuilder.not(criteriaBuilder.exists(subquery));

// Define a predicate to include the required fields in Dataverse.
Predicate requiredInDataversePredicate = buildRequiredInDataversePredicate(criteriaBuilder, datasetFieldTypeRoot);
Predicate fieldRequiredInTheInstallation = buildFieldRequiredInTheInstallationPredicate(criteriaBuilder, datasetFieldTypeRoot);

// Define a predicate for displaying DatasetFieldTypes on create.
// If onlyDisplayedOnCreate is true, include fields that:
Expand All @@ -941,28 +965,57 @@ public List<DatasetFieldType> findAllInMetadataBlockAndDataverse(MetadataBlock m
? criteriaBuilder.or(
criteriaBuilder.or(
criteriaBuilder.isTrue(datasetFieldTypeRoot.get("displayOnCreate")),
requiredInDataversePredicate
fieldRequiredInTheInstallation
),
requiredAsInputLevelPredicate
)
: criteriaBuilder.conjunction();

// Build the final WHERE clause by combining all the predicates.
criteriaQuery.where(
// Combine all the predicates.
return criteriaBuilder.and(
criteriaBuilder.equal(dataverseRoot.get("id"), dataverse.getId()), // Match the Dataverse ID.
criteriaBuilder.equal(metadataBlockRoot.get("id"), metadataBlock.getId()), // Match the MetadataBlock ID.
metadataBlockRoot.in(dataverseRoot.get("metadataBlocks")), // Ensure the MetadataBlock is part of the Dataverse.
datasetFieldTypeRoot.in(metadataBlockRoot.get("datasetFieldTypes")), // Ensure the DatasetFieldType is part of the MetadataBlock.
criteriaBuilder.or(includedAsInputLevelPredicate, hasNoInputLevelPredicate), // Include DatasetFieldTypes based on the input level predicates.
displayedOnCreatePredicate // Apply the display-on-create filter if necessary.
);
}

criteriaQuery.select(datasetFieldTypeRoot).distinct(true);

return em.createQuery(criteriaQuery).getResultList();
private Predicate buildFieldPresentInDatasetTypePredicate(DatasetType datasetType,
CriteriaQuery<DatasetFieldType> criteriaQuery,
CriteriaBuilder criteriaBuilder,
Root<DatasetFieldType> datasetFieldTypeRoot,
Root<MetadataBlock> metadataBlockRoot,
boolean onlyDisplayedOnCreate) {
Predicate datasetTypePredicate = criteriaBuilder.isFalse(criteriaBuilder.literal(true)); // Initialize datasetTypePredicate to always false by default
if (datasetType != null) {
// Create a subquery to check for the presence of the specified metadataBlock within the datasetType
Subquery<Long> datasetTypeSubquery = criteriaQuery.subquery(Long.class);
Root<DatasetType> datasetTypeRoot = criteriaQuery.from(DatasetType.class);

// Define a predicate for displaying DatasetFieldTypes on create.
// If onlyDisplayedOnCreate is true, include fields that are either marked as displayed on create OR marked as required.
// Otherwise, use an always-true predicate (conjunction).
Predicate displayedOnCreatePredicate = onlyDisplayedOnCreate ?
criteriaBuilder.or(
criteriaBuilder.isTrue(datasetFieldTypeRoot.get("displayOnCreate")),
buildFieldRequiredInTheInstallationPredicate(criteriaBuilder, datasetFieldTypeRoot)
)
: criteriaBuilder.conjunction();

datasetTypeSubquery.select(criteriaBuilder.literal(1L))
.where(
criteriaBuilder.equal(datasetTypeRoot.get("id"), datasetType.getId()), // Match the DatasetType ID.
metadataBlockRoot.in(datasetTypeRoot.get("metadataBlocks")), // Ensure the metadataBlock is included in the datasetType's list of metadata blocks.
displayedOnCreatePredicate
);

// Now set the datasetTypePredicate to true if the subquery finds a matching metadataBlock
datasetTypePredicate = criteriaBuilder.exists(datasetTypeSubquery);
}
return datasetTypePredicate;
}

private Predicate buildRequiredInDataversePredicate(CriteriaBuilder criteriaBuilder, Root<DatasetFieldType> datasetFieldTypeRoot) {
private Predicate buildFieldRequiredInTheInstallationPredicate(CriteriaBuilder criteriaBuilder, Root<DatasetFieldType> datasetFieldTypeRoot) {
// Predicate to check if the current DatasetFieldType is required.
Predicate isRequired = criteriaBuilder.isTrue(datasetFieldTypeRoot.get("required"));

Expand Down
Loading
Loading