This project is a Question & Answer system implemented using DistilBERT for text representation and Faiss (Facebook AI Similarity Search) for efficient similarity search in a vector database. The system is designed to provide accurate and relevant answers to user queries by searching through a large collection of documents.
-
DistilBERT-based Text Representation: Utilizes the DistilBERT model to convert questions and documents into dense vector representations.
-
Faiss Vector Database: Stores the vector representations of the documents for fast similarity search.
-
Efficient Retrieval: Finds the most relevant documents to a given question by performing efficient similarity searches in the Faiss vector database.
- Python 3.x
- PyTorch
- Transformers
- Faiss
- Streamlit (for the web-based interface)
- Clone the repository:
git clone https://github.com/VuBacktracking/bert-faiss-qa-sytem.git
- Clone the repository:
pip install -r requirements.txt
- Train and Download the DistilBERT model:
python3 trainer.py
Note: You can check my model in the link: https://huggingface.co/vubacktracking/distilbert-base-uncased-finetuned-squad2
- Build the Faiss vector database:
python3 faiss_index.py
streamlit run app.py
Open your web browser and navigate to http://localhost:8501/
to use the web-based Q&A system.
-
BERT Embeddings:
- The preprocessed text is converted into vector embeddings using the DistilBERT model.
-
Faiss Indexing:
- The DistilBERT embeddings of the documents are indexed in the Faiss vector database.
-
Query Processing:
- When a user inputs a question, the question is converted into a DistilBERT embedding.
- Faiss is used to find the most similar embeddings (i.e., the most relevant documents) to the question embedding.
-
Answer Extraction:
- The relevant documents are ranked, and the most relevant answer passages are extracted and presented to the user.