The MultiSource RAG project implements a robust Retrieval-Augmented Generation (RAG) system utilizing advanced machine learning techniques. This system is designed to enhance the performance and relevance of generated content by leveraging multiple data sources and vector stores.
- Hugging Face Models: Utilizes state-of-the-art models from Hugging Face for natural language processing tasks.
- Multiple Vector Stores: Implements three different vector stores—Chroma, Pinecone, and FAISS—to evaluate their performance and effectiveness in data retrieval.
- Sentence Transformers: Employs sentence transformers to encode and retrieve relevant text based on semantic similarity.
- Semantic Chunking: Breaks down texts into meaningful chunks to improve retrieval accuracy.
- Recursive Character Splitting: Enhances the processing of longer texts by recursively splitting them into smaller, manageable segments.
- CRAG (Contextual Retrieval-Augmented Generation): Integrates contextual information to improve the quality of generated responses.
- Benchmarking Tools: Utilizes benchmarking tools to assess the performance of different vector stores and retrieval methods.
Clone the repository, and then open a command prompt in the file explorer, where you should write: python -m notebook. A jupyter notebook webpage will be opened, and here is all the code.