ConHAN: Contextualized Hierarchical Attention Networks for Authorship Identification

Abstract

Authorship identification is a common and important task in the field of Natural Language Processing. Defined as the task of predicting the author of a given text, it enables us to accomplish a broad range of tasks, from identifying ghost writers to detecting plagiarism. We show the advantages of using neural network architectures over classic machine learning approaches in terms of accuracy. We first reproduce literature approaches which we benchmark on the Reuters_50_50 dataset, and then improve on them by introducing ConHAN, a new deep learning architecture combining pre-trained contextualized embeddings (DistilBERT) and a hierarchical attention architecture. Among all models we tested, the best model achieves an out-of-sample accuracy of 78%, compared to the literature benchmark of 69%. Despite a small training size, ConHAN shows promising results on this particular task and we firmly believe it would benefit from a bigger training set. We further analyze our model and discover that ConHAN performs best when predicting authors with simple vocabularies, but it is agnostic to many sentences' grammatical differences.

Poster

Original paper

https://github.com/VictorJouault/ConHAN-Authorship-Identification/blob/main/Paper%20ConHAN.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LICENSE.md		LICENSE.md
1_Article _ Sentence RNN.ipynb		1_Article _ Sentence RNN.ipynb
2_ContextualizedDistiBERT.ipynb		2_ContextualizedDistiBERT.ipynb
3_ConHAN.ipynb		3_ConHAN.ipynb
4_Results Analysis and Interpretation.ipynb		4_Results Analysis and Interpretation.ipynb
ConHAN_Author_Identification_Poster.png		ConHAN_Author_Identification_Poster.png
Paper ConHAN.pdf		Paper ConHAN.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConHAN: Contextualized Hierarchical Attention Networks for Authorship Identification

Abstract

Poster

Original paper

About

Releases

Packages

Languages

VictorJouault/ConHAN-Authorship-Identification

Folders and files

Latest commit

History

Repository files navigation

ConHAN: Contextualized Hierarchical Attention Networks for Authorship Identification

Abstract

Poster

Original paper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages