Skip to content

Latest commit

 

History

History
10 lines (6 loc) · 819 Bytes

README.md

File metadata and controls

10 lines (6 loc) · 819 Bytes

ArxivStudy

This repository contains functions relevant to a computational linguistic analysis of the semantic patterns found in the abstracts of scientific publications found on 'arxiv.org'.

The function 'Arxiv Scraper.py' is used to webscrape abstracts from articles published on Arxiv.org to build a linguistic corpus. (Please feel free to message me for a copy of this text). The functions 'LSA_make_term_doc_topic.py' and 'LSA_make_term_doc.py' are used to create a term-doc matrix and then run a single value decomposition to project abstracts into a low dimensional semantic space for various methods of cluster analysis.

The results are found in the 'Figures' folder - for reference here is a sample heirarchical clustering: Research Areas Dendrogram