Skip to content

Latest commit

 

History

History
54 lines (33 loc) · 4.6 KB

README.md

File metadata and controls

54 lines (33 loc) · 4.6 KB



Youth Mental Health Narratives Challenge

Goal of the Competition

Suicide is one of the leading causes of death in the United States for 5-24 year-olds. In order to better understand the circumstances around youth suicides and inform potential interventions, researchers and policymakers rely on several datasets. One key dataset is the National Violent Death Reporting System (NVDRS), which has been tracking information about violent deaths since 2003. The NVDRS dataset is based on law enforcement reports, medical examiner and coroner reports, and death certificates. The NVDRS dataset includes both narrative descriptions of each incident, and common factor variables like precipitating events. The process of generating consistent narratives and accurate factor variables is time-consuming and prone to error.

In this challenge, solvers helped the CDC extract information from narratives in the NVDRS, improving both the quality and coverage of the NVDRS dataset. Higher-quality data can enable researchers across the country to better understand and prevent youth suicides on a national scale.

This challenge had two tracks:

  1. In the Automated Abstraction track, solvers applied machine learning techniques to automate the population of factor variables from NVDRS' narrative text. The algorithms developed could help streamline the process of manual abstraction and data quality control.
  2. In the Novel Variables track, solvers explored the NVDRS narratives and extracted novel variables that could be used to advance youth mental health research.

What's in this Repository

This repository contains code from winning competitors in the Youth Mental Health Narratives challenge on DrivenData. Code for all winning solutions are open source under the MIT License.

Solution code for the two tracks can be found in the automated-abstraction/ and novel-variables/ subdirectories, respectively. Additional solution details can be found in the reports folder inside the directory for each submission.

Winning code for other DrivenData competitions is available in the competition-winners repository.

Winning Submissions

Automated Abstraction Track

Place Team or User Public Score Private Score Summary of Model
1st kbrodt 0.8706 0.8660 Fine-tuned and performed inference with an ensemble of BigBirg and Longformer models using LoRA
2nd D and T 0.8676 0.8650 Fine-tuned and performed inference with a weighted ensemble of DeBERTa and Longformer models
3rd dylanliu 0.8690 0.8636 Generated and soft-labeled additional data using Qwen and Mistral, then fine-tuned and performed inference with DeBERTa and Gemma models
4th bbb88 0.8655 0.8624 Generated and soft-labeled additional data using Llama, Phi, Mistral, and Yi, then fine-tuned and performed inference with DeBERTa, Gemma, and Llama models

Novel Variables Track

Place Team or User Summary of Approach
1st & midpoint bonus verto Extracted temporal information to determine preceding events and create a time series leading up to a suicide
2nd & midpoint bonus HealthHackers Extracted temporal information to determine preceding events and create a time series leading up to a suicide
3rd & midpoint bonus UM-ATLAS Extracted temporal information to determine preceding events and create a time series leading up to a suicide
Midpoint bonus jackson5 Identified panic attacks
Midpoint bonus MPWARE Extracted temporal information to determine preceding events and create a time series leading up to a suicide

Winners Blog Post: https://drivendata.co/blog/youth-mental-health-winners

Automated Abstraction Benchmark Blog Post: https://drivendata.co/blog/automated-abstraction-benchmark