DocuWise - Document Chatbot

Description : This application is a basic document question-answering system that leverages state-of-the-art Language Models - gpt-4o. It allows users to upload documents and ask questions about their content, receiving accurate and context-aware answers.

Features:

Document upload and processing (supports PDF files)
Natural language question answering
Adaptive Retrieval-Augmented Generation (RAG) for improved accuracy - compresssing the chunks
Vector search using FAISS
Feedback loop for continuous improvement
Advanced post-processing including text splitting, summarizing docs.

Architecture

Technologies Used :

LLM Model: GPT-4 (via OpenAI API)
Embedding Model: OpenAI Embeddings
Vector Store: FAISS (Facebook AI Similarity Search)
LLM Orchestration: Langchain
Document Processing: PyPDFLoader for PDF handling, UnstructuredImage Loader for jpg and jpeg
RAG Implementation: Adaptive RAG for dynamic information retrieval
Web Framework: Streamlit
Language: Python

Setup and Installation

Clone the repository: git clone https://github.com/yourusername/your-repo-name.git cd your-repo-name Install dependencies: pip install -r requirements.txt

Set up your OpenAI API key in a .env file or as an environment variable Run the Streamlit app: streamlit run app.py

Usage and Example

Upload your PDF document through the web interface, Ask questions about the document in natural language, Receive AI-generated answers based on the document's content,

How It Works

Document Processing: Uploaded PDFs are processed using PyPDFLoader
Embedding: Document content is embedded using OpenAI's embedding model
Indexing: FAISS is used to create a searchable index of the document embeddings
Query Processing: User questions are processed and embedded
Retrieval: Relevant document sections are retrieved using Adaptive RAG
Answer Generation: GPT-4 generates answers based on retrieved context and user query
Post-Processing: Answers undergo splitting and other refinements for improved readability
Feedback Loop: User feedback is incorporated to improve future retrievals and answers

Future Improvements

Support for additional document formats
Multi-document querying
Enhanced user interface for better interaction and visualization
Proper Processing of Jpeg and jpg file
Use of AI Agent

Contributing Contributions to improve the application are welcome. Please feel free to submit pull requests or open issues to discuss potential enhancements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DocuWise - Document Chatbot

Files

README.md

Latest commit

History

README.md

File metadata and controls

DocuWise - Document Chatbot