What is here:

This code is associated with our submissions to the N2C2 Shared Task - Track 2, on extraction of social determinants of health (SDoH) from clinical notes. It was also used in a consequent study on the effects of different NLP models on downstream medical association study results.

What is here:

This code contains the used submission script, and the two main python files to train or apply our BIO-scheme base SDOH models:

sdoh_model_bert_bio.py: The code used for all BERT settings (call sdoh_model_bert_bio.py -h for more detailed information).
sdoh_model_bio.py: The code used for all other settings (call sdoh_model_bio.py -h for more detailed information).
Submission_script.sh: The script that was used to make our submissions for the shared task.
pretrain_embs.py: The script used to pretrain the fastText embeddings (on the MIMIC III and the UCSF data).
association_study_experiments.sh: The script used to conduct the experiments from the arXiv article.

What is not here:

The text data (clinical notes from MIMIC III and the University of Washington) and SDoH annotations were provided by the task organizers under a data sharing agreement, for patient privacy reasons. For this reason we cannot share this data here.

The DNR/DNI annotations can be found on this repository: https://github.com/tuur/code-status-annotations-mimic

References

Results from our submissions are reported in the attached abstract:

Madhumita Sushil, Atul J. Butte, Ewoud Schuit, Artuur M. Leeuwenberg. Cross-institution extraction of social determinants of health from clinical notes: an evaluation of methods. AMIA Natural Language Processing Working Group Pre-Symposium. November, 2022.

Results from consequent study about the impact of NLP modeling choices on downstream association study results, published in the Journal of Clinical Epidemiology:

Sushil, Madhumita, et al. Cross-institution natural language processing for reliable clinical association studies: a methodological exploration Journal of Clinical Epidemiology. 2024 Mar 1;167:111258.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
N2C2 Abstract.pdf		N2C2 Abstract.pdf
README.md		README.md
Submission_script.sh		Submission_script.sh
Table1.py		Table1.py
association_study.py		association_study.py
association_study_experiments.sh		association_study_experiments.sh
get_dnr_status.py		get_dnr_status.py
plotting.py		plotting.py
plotting_jama.py		plotting_jama.py
pretrain_embs.py		pretrain_embs.py
requirements.txt		requirements.txt
sdoh_model_bert_bio.py		sdoh_model_bert_bio.py
sdoh_model_bio.py		sdoh_model_bio.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is here:

What is not here:

References

About

Releases

Packages

Languages

tuur/sdoh_n2c2track2_ucsf_umcu

Folders and files

Latest commit

History

Repository files navigation

What is here:

What is not here:

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages