Basic search engine for Computer Science related academic papers.
The dataset is obtained from the DBLP computer science bibliography. It provides open bibliographic information on major computer science journals and proceedings.
The dataset is an XML file with a corresponding DTD (document type definition) file for validation. You can find it here.
-
static: Folder containing the CSS and JavaScript codes.
-
templates: Folder containing the index.html file.
-
data_parser.py: Python script containing the functions for parsing the XML file and indexing the academic papers.
-
logic.py: Python script containing the functions for search and retrieval.
-
main.py Python script containing the flask application to render the search result on the web page.
Run the data_parser.py file first to create an index directory for the academic papers, or skip this process and use the index directory already created. You can find the index directory here.