A semantic search engine using Facebook AI Similarity Search (FAISS) and language models (BERT and SBERT).
Keywords: Semantic Search, Indexing, Vectors, Embedding, Information Retrieval.
A subset of the ArXiv dataset (10,000 articles) was used for this project.
You can find the modules and libraries used in this project in the requirement.txt file. You can also run the code below.
pip install -r requirements.txt
-
data: contains the data file used for this project.
-
evaluation: contains code for evaluating the models using SentEval downstream transfer and similarity tasks.
-
utils: contains helper functions used for the project.
-
static: contains CSS and JavaScript files for the web page.
-
templates: contains HTML file for the web page.
-
app.py: A Python file for the search engine web app using Flask.
-
faiss_indexing.py: A Python file for setting up the FAISS index.
-
finetune.py: A Python file for finetuning the language models.
-
semantic_search.py A Python file for the semantic search.
- Clone the repository
git clone https://github.com/gloryodeyemi/Semantic_Search.git
- Change the directory to the cloned repository folder
%cd .../Semantic_Search/FAISS
-
Download the ArXiv dataset and save it to the data folder.
-
Install the needed packages
pip install -r requirements.txt
- Set up the index (optional)
python faiss_indexing.py
- Run app.py
python app.py
To run the evaluation Python files, git clone the SentEval toolkit in the project's root directory first to get them, and follow the README instructions to download the datasets.
- Return to the project's root directory
cd..
- Git clone SentEval toolkit
git clone https://github.com/facebookresearch/SentEval.git
- Download datasets.
Glory Odeyemi is undergoing her Master's program in Computer Science, Artificial Intelligence specialization at the University of Windsor, Windsor, ON, Canada. You can connect with her on LinkedIn.