This repository contains functions relevant to a computational linguistic analysis of the semantic patterns found in the abstracts of scientific publications found on 'arxiv.org'.
The function 'Arxiv Scraper.py' is used to webscrape abstracts from articles published on Arxiv.org to build a linguistic corpus. (Please feel free to message me for a copy of this text). The functions 'LSA_make_term_doc_topic.py' and 'LSA_make_term_doc.py' are used to create a term-doc matrix and then run a single value decomposition to project abstracts into a low dimensional semantic space for various methods of cluster analysis.
The results are found in the 'Figures' folder - for reference here is a sample heirarchical clustering: