Skip to content

micophilip/ner-explainer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Named Entity Recognition Explainer

This project adapts Integrated Gradients implementation provided by Captum for named entity recognition task to explain BERT model predictions on I2B2 2014 - PHI dataset. It also extends Language Interpretability Tool (LIT) to visualize and debug NER BERT models. This project can explain any BERT-based model on NER task for any dataset in CONLL format.

Acknowledgments

Code in this project was based on code in HuggingFace and the extensive examples provided by Captum and LIT repositories.

Running

Perequisites

  • Any BERT model trained on NER for a CONLL-based dataset. The model needs to be trained by HuggingFace NER script.
  • A dataset in CONLL format. Test dataset in data folder assumed to be called test.txt
  • Python 3.x (tested with 3.7)
  • Run pip install -r requirements.txt

Captum

python explainer.py --data_dir /path/to/data/folder --model_type bert \ 
--labels /path/to/labels.txt --model_name_or_path /path/to/trained/model \
--max_seq_length 128 --explanations_dir /path/to/store/explainations.html

Lit Server

python lit.py --model_path /pth/to/trained/model --labels /path/to/labels.txt
--test_data_dir /path/to/test/data/folder

Dataset

The dataset used in this project is I2B2 2014 PHI dataset. Can be requested from the Department of Biomedical Informatics and is provided for free to students and researchers. Any NER-annotated CONLL dataset can be used with this project.

Results

Explanation results are stored in the explanations folder provided in an explanations.html file.

About

Explainable Named Entity Recognition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages