Skip to content

Latest commit

 

History

History
78 lines (60 loc) · 3.39 KB

README.md

File metadata and controls

78 lines (60 loc) · 3.39 KB

Semantic Search Engine

A semantic search engine using Facebook AI Similarity Search (FAISS) and language models (BERT and SBERT).

Keywords: Semantic Search, Indexing, Vectors, Embedding, Information Retrieval.

The Dataset

A subset of the ArXiv dataset (10,000 articles) was used for this project.

Requirements

You can find the modules and libraries used in this project in the requirement.txt file. You can also run the code below.

pip install -r requirements.txt

Structure

  • data: contains the data file used for this project.

  • evaluation: contains code for evaluating the models using SentEval downstream transfer and similarity tasks.

  • utils: contains helper functions used for the project.

  • static: contains CSS and JavaScript files for the web page.

  • templates: contains HTML file for the web page.

  • app.py: A Python file for the search engine web app using Flask.

  • faiss_indexing.py: A Python file for setting up the FAISS index.

  • finetune.py: A Python file for finetuning the language models.

  • semantic_search.py A Python file for the semantic search.

Quickstart Guideline

  1. Clone the repository
git clone https://github.com/gloryodeyemi/Semantic_Search.git
  1. Change the directory to the cloned repository folder
%cd .../Semantic_Search/FAISS
  1. Download the ArXiv dataset and save it to the data folder.

  2. Install the needed packages

pip install -r requirements.txt
  1. Set up the index (optional)
python faiss_indexing.py
  1. Run app.py
python app.py

To run the evaluation Python files, git clone the SentEval toolkit in the project's root directory first to get them, and follow the README instructions to download the datasets.

  1. Return to the project's root directory
cd..
  1. Git clone SentEval toolkit
git clone https://github.com/facebookresearch/SentEval.git
  1. Download datasets.

Contact

Glory Odeyemi is undergoing her Master's program in Computer Science, Artificial Intelligence specialization at the University of Windsor, Windsor, ON, Canada. You can connect with her on LinkedIn.

References

  1. ArXiv dataset
  2. FAISS
  3. BERT
  4. SBERT
  5. SentEval