-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathReadme
314 lines (202 loc) · 8.24 KB
/
Readme
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
Here’s the comprehensive documentation for your Conversational Retrieval Chatbot project, incorporating details on setup, usage, API endpoints, and more:
---
# **Conversational Retrieval Chatbot Documentation**
## **Project Overview**
The Conversational Retrieval Chatbot is a sophisticated AI-powered system designed to interact with users through a chatbot interface. It leverages document embeddings and advanced language models to provide context-aware responses. Key features include document uploading, conversational querying, text-to-speech functionality, and personality-based responses.
### **Features**
- **Upload and Process Documents**: Upload and store documents for querying.
- **Handle Conversational Queries**: Use a retrieval-based chain to respond to user queries.
- **Generate and Stream Text-to-Speech Responses**: Convert text responses to audio and stream them.
- **Personality Queries**: Respond to queries about the bot’s personality.
## **Setup and Installation**
### **Prerequisites**
- **Python 3.8 or higher**
- **`pip`** (Python package installer)
- **API Keys**: Obtain API keys for Deepgram and Groq.
### **1. Clone the Repository**
First, clone the repository to your local machine:
```bash
git clone <repository-url>
cd <repository-directory>
```
### **2. Create and Activate a Virtual Environment**
Create a virtual environment to manage dependencies:
```bash
python -m venv venv
```
Activate the virtual environment:
- **On Unix/macOS**:
```bash
source venv/bin/activate
```
- **On Windows**:
```bash
venv\Scripts\activate
```
### **3. Install Dependencies**
Install the required packages using `pip`:
```bash
pip install -r requirements.txt
```
### **4. Set Up Environment Variables**
Create a `.env` file in the root directory of the project with the following content:
```
GROQ_API_KEY=your_groq_api_key
DEEPGRAM_API_KEY=your_deepgram_api_key
```
Replace `your_groq_api_key` and `your_deepgram_api_key` with your actual API keys.
### **5. Initialize Vector Store**
Run the server once to initialize the vector store:
```bash
python app.py
```
The vector store will be created or loaded based on existing files.
## **API Endpoints**
### **1. Upload Documents**
**Endpoint:** `/upload`
**Method:** `POST`
**Description:** Upload documents to be processed and stored for querying.
**Request:**
- `multipart/form-data`
- `files[]`: List of documents to upload. Supported formats: PDF, DOCX, DOC, TXT.
**Response:**
- **Success (200 OK):**
```json
{
"message": "Files processed successfully"
}
```
- **Error (400 Bad Request):**
```json
{
"error": "No file part"
}
```
### **2. Chat with the Bot**
**Endpoint:** `/chat`
**Method:** `POST`
**Description:** Send a message to the bot and receive a text and audio response.
**Request:**
- `application/json`
- `message`: The user’s query.
**Response:**
- **Success (200 OK):** Streams a response containing text and audio data.
**Text Response:**
```json
{
"text": "Bot’s response text"
}
```
**Audio Response:** Audio content streamed as `application/octet-stream`.
- **Error (400 Bad Request):**
```json
{
"error": "Please upload documents first"
}
```
```json
{
"error": "No message provided"
}
```
### **3. Personality Queries**
The bot responds to queries about its personality using predefined responses based on the input. Example queries include:
- "What is your name?"
- "Who created you?"
- "What can you do?"
The responses are customized based on the bot’s personality traits.
## **File Upload and Processing Workflow**
1. **Upload Files**: Use the `/upload` endpoint to upload documents. Supported formats include `.pdf`, `.docx`, `.doc`, and `.txt`.
2. **Process Files**: Uploaded files are saved temporarily. The appropriate loader (e.g., PyPDFLoader, Docx2txtLoader, TextLoader) extracts text from each file.
3. **Split Text into Chunks**: The text is divided into smaller chunks using a `CharacterTextSplitter` for more manageable processing.
4. **Generate Embeddings**: Convert text chunks into embeddings using a pre-trained Hugging Face model (`sentence-transformers/all-MiniLM-L6-v2`).
5. **Store in Vector Database**: Save the embeddings in a Chroma vector database for efficient retrieval.
6. **Create Conversational Chain**: Construct a `ConversationalRetrievalChain` using the embeddings, allowing the chatbot to fetch relevant information based on user queries.
## **Chat Workflow**
1. **Receive User Query**:
- **Endpoint**: `/chat`
- **Method**: POST
- **Description**: The user sends a query to this endpoint in JSON format.
**Example Request:**
```json
{
"message": "Tell me about World War II."
}
```
2. **Retrieve Relevant Chunks**:
- **Action**: Fetch the most relevant text chunks from the vector database based on the user query.
3. **Generate Response**:
- **Action**: Pass the relevant chunks and conversation history to the `ChatGroq` language model to generate a text response.
**Example Text Response:**
```json
{
"text": "World War II was a global conflict that lasted from 1939 to 1945..."
}
```
4. **Generate Audio Response**:
- **Action**: Convert the text response to audio using Deepgram’s text-to-speech service.
**Example Request to Deepgram:**
```json
{
"text": "World War II was a global conflict that lasted from 1939 to 1945...",
"voice": "en_us_male"
}
```
5. **Stream Audio Response**:
- **Action**: Stream the generated audio back to the client. The response includes both text and audio.
**Example Response:**
```json
{
"text": "World War II was a global conflict that lasted from 1939 to 1945...",
"audio_url": "<url_to_audio_stream>"
}
```
The client can use this URL to stream the audio response.
6. **Handle Errors**:
- **Action**: Return appropriate error messages for issues such as invalid input or external service failures.
## **Code Overview**
### **Main Components**
- **Flask App**: Initializes and runs the web server.
- **`generateAudio(text)`**: Generates speech from text using `deepgram` and stream.
- **`create_conversational_chain(vector_store)`**: Creates a conversational retrieval chain using the `ChatGroq` model and a vector store.
- **`load_or_create_vector_store()`**: Loads or creates a vector store for document embeddings.
- **`upload_files()`**: Endpoint to upload and process documents.
- **`chat()`**: Endpoint to handle user queries and generate responses.
- **`remove_emojis(text)`**: Removes emojis from text to ensure cleaner responses.
- **`handle_personality_query(user_input)`**: Provides predefined responses based on user queries about the bot’s personality.
- **`add_personality_to_response(response, personality)`**: Adjusts the response based on the bot’s personality.
### **Debug Statements**
Throughout the code, debug statements are included to trace the flow and verify operations.
## **Running the Server**
Start the Flask server in debug mode:
```bash
python app.py
```
The server will be accessible at `http://127.0.0.1:5000`.
## **Testing the API**
You can test the API using tools like `curl`, Postman, or directly via Python scripts.
**Example using `curl` to upload a file:**
```bash
curl -X POST -F "files[]=@path/to/your/file.pdf" http://127.0.0.1:5000/upload
```
**Example using `curl` to chat:**
```bash
curl -X POST -H "Content-Type: application/json" -d '{"message": "Hello, bot!"}' http://127.0.0.1:5000/chat
```
## **Troubleshooting**
- **Application Not Starting**:
- Ensure all environment variables are correctly set.
- Check for missing dependencies in `requirements.txt`.
- **File Upload Issues**:
- Verify the file formats and sizes.
- Check server logs for errors related to file processing.
- **API Errors**:
- Validate API requests and responses.
- Ensure external services (Groq, Deepgram) are accessible and operational.
- **Audio Streaming Problems**:
- Check the URL returned by Deepgram and ensure it is correct.
- Verify audio content is properly formatted and streamed.
## **License**
This project is licensed under the MIT License. See the `LICENSE` file for more details.
---
Feel free to modify this documentation as needed to fit your project's specifics and requirements.