Skip to content

RAG system using Hugging Face models, multiple vector stores (Chroma, Pinecone, FAISS), and CRAG, with sentence transformers and benchmarking tools for optimized retrieval and content generation.

Notifications You must be signed in to change notification settings

chxlm27/MultiSourceRAG

Repository files navigation

MultiSource RAG (Retrieval-Augmented Generation)

Overview

The MultiSource RAG project implements a robust Retrieval-Augmented Generation (RAG) system utilizing advanced machine learning techniques. This system is designed to enhance the performance and relevance of generated content by leveraging multiple data sources and vector stores.

Features

  • Hugging Face Models: Utilizes state-of-the-art models from Hugging Face for natural language processing tasks.
  • Multiple Vector Stores: Implements three different vector stores—Chroma, Pinecone, and FAISS—to evaluate their performance and effectiveness in data retrieval.
  • Sentence Transformers: Employs sentence transformers to encode and retrieve relevant text based on semantic similarity.
  • Semantic Chunking: Breaks down texts into meaningful chunks to improve retrieval accuracy.
  • Recursive Character Splitting: Enhances the processing of longer texts by recursively splitting them into smaller, manageable segments.
  • CRAG (Contextual Retrieval-Augmented Generation): Integrates contextual information to improve the quality of generated responses.
  • Benchmarking Tools: Utilizes benchmarking tools to assess the performance of different vector stores and retrieval methods.

Set Up

Clone the repository, and then open a command prompt in the file explorer, where you should write: python -m notebook. A jupyter notebook webpage will be opened, and here is all the code.

About

RAG system using Hugging Face models, multiple vector stores (Chroma, Pinecone, FAISS), and CRAG, with sentence transformers and benchmarking tools for optimized retrieval and content generation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published