-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: don't duplicate nested compound fields in metadataBlocks returned by search API #11172
base: develop
Are you sure you want to change the base?
Conversation
2025/01/27: Moving to |
Maybe this issue is already solved by this #10764 ? The fix should be available in the next release, I suppose. |
@ffritze Thanks for the pointer! I just tested the search API on develop, which includes your changes. (I've also just added more detail to the "how to test" steps above) The problem doesn't seem to be fixed. Here's what the search API returns on develop:
(Nested field Here's what the search API returns with this PR:
|
What this PR does / why we need it:
The following bug affects metadata blocks which contain deeply nested compound fields (compound fields within compound fields).
Example: My metadata block
exampleBlock
contains the field "A", which is a compound field and contains the field "A.B", which is also a compound field and contains the fields "A.B.C1" and "A.B.C2"."A.B.C1" and "A.B.C2" are not compound fields (e.g. could be text or numbers).
Then, if I use the search API to search for some query and request the deeply nested metadata field with
metadata_fields=exampleBlock:*
, the returned JSON duplicates the nested compound fieldA.B
(once as child ofA
(correct), once as its own field without a parent (not correct)):The code change ensures only fields on the top level of the metadata block are added into the fields array of the returned JSON (which includes their children, if there are any). This eliminates the duplication of the child compound field.
I believe this makes sense and doesn't break anything because non-top level fields aren't meant to be requested according to the documentation:
https://guides.dataverse.org/en/latest/api/search.html
Which issue(s) this PR closes:
/
Special notes for your reviewer:
/
Suggestions on how to test this:
I would have added a test for this bug, but it wasn't straighforward, because none of the default metadata blocks contain deeply nested compound fields.
The existing tests for
metadata_fields=...
can be run to confirm that they weren't broken:mvn test -Dtest="SearchIT#testSearchDynamicMetadataFields"
To test the bug described above, you need to:
Create a metadata block containing nested compound fields, e.g. use one in here: nested_compound_test.tar.gz
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file nested_compound_test2.tsv
Enable metadata block
Update Solr schema, e.g. using this prepared schema.xml
schema.txt (had to rename to
.txt
to upload here)docker cp schema.xml solr-1:/var/solr/data/collection1/conf/schema.xml
(or copy wherever your Solr is running)curl "http://localhost:8983/solr/admin/cores?action=RELOAD&core=collection1"
Create a dataset using the metadata block, e.g. using a JSON file in the archive linked above
curl -H "X-Dataverse-key: $API_TOKEN" -X POST "http://localhost:8080/api/dataverses/root/datasets" --upload-file nested_compound_test2.json -H 'Content-type:application/json'
Go to
http://localhost:8080/dataset.xhtml?persistentId=...
(using PID returned by the API in 4.)Publish dataset
Perform a search that will find the dataset and request the nested metadata block
curl http://localhost:8080/api/search?q=finches&type=dataset&metadata_fields=nested_compound_test2:*
Check whether the nested compound fields are returned correctly
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
/
Is there a release notes update needed for this change?:
/
Additional documentation:
/