Skip to content

Code and data annotations for the N2C2 SDOH challenge and follow up article in the Journal of Clinical Epidemiology about downstream impact of NLP in association study results.

Notifications You must be signed in to change notification settings

tuur/sdoh_n2c2track2_ucsf_umcu

Repository files navigation

This code is associated with our submissions to the N2C2 Shared Task - Track 2, on extraction of social determinants of health (SDoH) from clinical notes. It was also used in a consequent study on the effects of different NLP models on downstream medical association study results.

What is here:

This code contains the used submission script, and the two main python files to train or apply our BIO-scheme base SDOH models:

  • sdoh_model_bert_bio.py: The code used for all BERT settings (call sdoh_model_bert_bio.py -h for more detailed information).
  • sdoh_model_bio.py: The code used for all other settings (call sdoh_model_bio.py -h for more detailed information).
  • Submission_script.sh: The script that was used to make our submissions for the shared task.
  • pretrain_embs.py: The script used to pretrain the fastText embeddings (on the MIMIC III and the UCSF data).
  • association_study_experiments.sh: The script used to conduct the experiments from the arXiv article.

What is not here:

The text data (clinical notes from MIMIC III and the University of Washington) and SDoH annotations were provided by the task organizers under a data sharing agreement, for patient privacy reasons. For this reason we cannot share this data here.

The DNR/DNI annotations can be found on this repository: https://github.com/tuur/code-status-annotations-mimic

References

Results from our submissions are reported in the attached abstract:

Madhumita Sushil, Atul J. Butte, Ewoud Schuit, Artuur M. Leeuwenberg. Cross-institution extraction of social determinants of health from clinical notes: an evaluation of methods. AMIA Natural Language Processing Working Group Pre-Symposium. November, 2022.

Results from consequent study about the impact of NLP modeling choices on downstream association study results, published in the Journal of Clinical Epidemiology:

Sushil, Madhumita, et al. Cross-institution natural language processing for reliable clinical association studies: a methodological exploration Journal of Clinical Epidemiology. 2024 Mar 1;167:111258.

About

Code and data annotations for the N2C2 SDOH challenge and follow up article in the Journal of Clinical Epidemiology about downstream impact of NLP in association study results.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published