-
Notifications
You must be signed in to change notification settings - Fork 1
Loading MeSH datasets
Jena assembler config file <MTW_HOME_DIR>/instance/conf/mesh.ttl MUST BE set up properly !
https://github.com/filak/MTW-MeSH/blob/master/flask-app/instance/conf/mesh_Jena4.ttl
https://github.com/filak/MTW-MeSH/blob/master/flask-app/instance/conf/mesh_Jena5.ttl
Copy the file to <MTW_HOME_DIR>/instance/conf/ and rename it as mesh.ttl
Adjust the paths in mesh.ttl to your <FUSEKI_DATA_DIR>
Use forward slashes
tdb2:location "c:/<FUSEKI_DATA_DIR>/databases/mesh" ;
text:directory "c:/<FUSEKI_DATA_DIR>/indexes/mesh" ;
- Validate mesh.ttl
No output = file is OK
riot --validate mesh.ttl
-
Copy the mesh.ttl file to:
<FUSEKI_DATA_DIR>/configuration/
Download the official MeSH RDF dataset mesh.nt.gz from https://nlmpubs.nlm.nih.gov/projects/mesh/rdf/
You might use curl tool for downloading
curl https://nlmpubs.nlm.nih.gov/projects/mesh/rdf/mesh.nt.gz --ssl-no-revoke -O
As of this writing - Jan 2025 - the above is no longer true.
The mesh.nt.gz currently available is still the MeSH 2024 version - hash c9ef004de88b9201b84f90aad2966bfd067af799
And despite several efforts (https://github.com/HHS/meshrdf/issues/212#issuecomment-2539919254) to get some information when the full RDF dataset for MeSH 2025 version will be made available (if at all) - NLM stays silent. Also the release notes are outdated.
The only official MeSH 2025 RDF datasets available are here https://nlmpubs.nlm.nih.gov/projects/mesh/rdf/2025/ - BUT:
- these are not the complete datasets - obsolete/inactive items are missing - no meshv:active triples are present
- this is the "name-spaced" version - prefix http://id.nlm.nih.gov/mesh/2025/
The information about MeSH item status is vital - both for the translation process and for functional MTW outputs/exports. There are existing data workflows for updating obsolete MeSH items etc which rely on active/inactive status being available.
So what can be done in this situation ? Let's try create the most complete MeSH 2025 RDF version.
You can follow this guide or skip it and just download the final files - mesh.nt.gz and mesh2024_inactive.nt
Download all the official MeSH 2025 XML files here and produce the RDF dataset mesh.nt.gz with https://github.com/HHS/meshrdf script - no year in the namespace (!)
OR
Download the https://nlmpubs.nlm.nih.gov/projects/mesh/rdf/2025/mesh2025.nt.gz and update the namespace using MTW script tools/update-ns.py
py update-ns.py mesh2025.nt.gz http://id.nlm.nih.gov/mesh/2025/ http://id.nlm.nih.gov/mesh/ mesh.nt.gz
Fortunately there were no deleted main headings according to the UMLS MeSH 2025 reports - so we can use the last year complete dataset.
Download the complete MeSH 2024 dataset mesh.nt.gz - save it as mesh2024_full.nt.gz and extract the inactive items using Jena tool arq with this query:
arq --data=mesh2024_full.nt.gz --query=mesh-inactive.sparql > mesh2024_inactive.ttl
riot --output=N-TRIPLES mesh2024_inactive.ttl > mesh2024_inactive.nt
- mesh.nt.gz
- mesh2024_inactive.nt
If you have not translated MeSH before - you can proceed to Import.
Use the trans_only_YYYY_extended.txt and convert it with the mesh-trx2nt tool.
The file MUST have the following columns/items:
DescriptorUI | ConceptUI | Language | TermType | String | TermUI | ScopeNote | Tree | Created | Relation | ParentCUI
- the header row is optional
- the TermUI column is always empty
- the Relation and ParentCUI need to be present at rows with Custom Concepts (ConceptUI starts with F...) and TermType PEP only
Display help - open CMD and run:
mesh-trx2nt -h
usage: mesh-trx2nt inputFile langcode meshxPrefix [options]
Extracting translation dataset from NLM UMLS text file [trans_only_2023_expanded.txt]
positional arguments:
inputFile NLM UMLS text file name (plain or gzipped)
langcode Language code
meshxPrefix MeSH Translation namespace prefix ie. http://my.mesh.com/id/
options:
-h, --help show this help message and exit
--out OUT Output file name prefix
IMPORTANT
The langcode parameter MUST be the same as the TARGET_LANG value in your mtw.ini config file !
The meshxPrefix parameter MUST be the same as the TARGET_NS value in your mtw.ini config file !
Run the conversion - open CMD and run ie.:
mesh-trx2nt trans_only_2023_extended.txt fr http://id.mesh.fr/
Download your *.xml translation file at
https://nlmpubs.nlm.nih.gov/projects/mesh/MESH_FILES/.mtms/
Extract translation data from MeSH XML as N-triples dataset using mesh-xml2trx tool
-
Run the extraction script:
mesh-xml2trx *.xml <TARGET_NS>
IMPORTANT: TARGET_NS - target namespace parameter - the custom URI prefix for you translation - it MUST be the same as TARGET_NS used in your mtw.ini config file !
https://github.com/filak/MTW-MeSH/blob/master/flask-app/instance/conf/mtw.ini
ie.
mesh-xml2trx czedesc2018.xml.gz http://mesh.medvik.cz/link/
-
ALWAYS validate ALL the input files
Run the validation:
No output = dataset is OK
riot --validate *.gz
-
Move the input files into a versioned <IMPORT> directory ie. .../MeSH-data/2023/import/
-
Load the MeSH datatset(s) into Apache Jena
Stop Fuseki server instance (if running)
Go to your <IMPORT> directory
Run the import:
tdb2_tdbloader --loc %FUSEKI_BASE%/databases/mesh mesh.nt.gz mesh-trx_ ...
or if you do not have a translation then just:
tdb2_tdbloader --loc %FUSEKI_BASE%/databases/mesh mesh.nt.gz
-
Create Fuseki search index
Go to your <FUSEKI_DATA_DIR>
cd %FUSEKI_BASE%
Run the indexation - Jena v4:
java -cp %FUSEKI_HOME%/fuseki-server.jar jena.textindexer --desc=configuration/mesh.ttl
Run the indexation - Jena v5+:
java --add-modules jdk.incubator.vector -cp %FUSEKI_HOME%/fuseki-server.jar jena.textindexer --desc=configuration/mesh.ttl
-
Start Fuseki server instance
-
Stop MTW services
-
Stop your Fuseki instance
-
Go to your <FUSEKI_DATA_DIR> and make sure the <mesh> directories under datatabases and indexes dirs are empty !
Run the import:
tdb2_tdbloader --loc %FUSEKI_BASE%/databases/mesh %FUSEKI_BASE%/backups/mesh_YYYY-MM-DD_....nq.gz
Create the search index - Jena v4 - run:
java -cp %FUSEKI_HOME%/fuseki-server.jar jena.textindexer --desc=configuration/mesh.ttl
Create the search index - Jena v5+ - run:
java --add-modules jdk.incubator.vector -cp %FUSEKI_HOME%/fuseki-server.jar jena.textindexer --desc=configuration/mesh.ttl
-
Start your Fuseki instance
-
Start MTW services
Continue to MeSH Annual Updates