Skip to content

Commit

Permalink
Merge pull request #1325 from kritinv/use-case-tutorials-QA-agent
Browse files Browse the repository at this point in the history
Use Case Tutorials QA agent
  • Loading branch information
penguine-ip authored Jan 30, 2025
2 parents a67509b + 5446004 commit 884e88f
Show file tree
Hide file tree
Showing 6 changed files with 459 additions and 11 deletions.
15 changes: 13 additions & 2 deletions docs/sidebarTutorials.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,17 @@ module.exports = {
tutorials: [
"tutorial-introduction",
"tutorial-setup",
{
type: "category",
label: "RAG QA Agent ",
items: [
"qa-agent-introduction",
"qa-agent-generating-a-synthetic-dataset",
"qa-agent-defining-an-evaluation-criteria",
"qa-agent-choosing-metrics",
],
collapsed: false,
},
{
type: "category",
label: "Legal Document Summarizer",
Expand All @@ -15,7 +26,7 @@ module.exports = {
"legal-doc-summarizer-maintaining-a-dataset",
"legal-doc-summarizer-pulling-dataset",
],
collapsed: false,
collapsed: true,
},
{
type: "category",
Expand All @@ -32,7 +43,7 @@ module.exports = {
"tutorial-production-monitoring",
"tutorial-production-evaluation",
],
collapsed: false,
collapsed: true,
},
],
};
5 changes: 5 additions & 0 deletions docs/tutorials/qa-agent-choosing-metrics.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
id: qa-agent-choosing-metrics
title: Choosing the right Metrics for your QA Agent
sidebar_label: Choosing your Metrics
---
73 changes: 73 additions & 0 deletions docs/tutorials/qa-agent-defining-an-evaluation-criteria.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
id: qa-agent-defining-an-evaluation-criteria
title: Defining an Evaluation Criteria
sidebar_label: Defining an Evaluation Criteria
---

In the previous section, we generated an entire synthetic QA dataset in just a few seconds. In this section, we'll be using some of those queries to observe how our RAG QA Agent responds, and carve out our what factors we care about in our QA agent's response. This will help us narrow down our evaluation criteria, which we'll need in order to select the right metrics for evaluation.

## Revisting the Generated User Queries

Let's examine the first three golden generations and generate responses for them. Since we're only examining three user queries, you can simply copy and paste these `input`s from Confident AI into your codebase.

```python
input_1 = "Examine CloudMate's unique enterprise offerings: comprehensive compliance tools and dedicated support services."
input_2 = "If CloudMate introduces an Ultra plan, doubling current storage and security, what pricing might be expected?"
input_3 = "How does MadeUpCompany's adherence to GDPR, HIPAA, and SOC 2 ensure data protection?"
```

:::note
As you can see, these synthetic questions are great because they challenge your QA agent on edge cases and go beyond asking simple questions. For example, `input_2` asks about the pricing of a potential new plan offering.
:::

## Generating QA Agent Responses for Inspection

We'll pass these `input`s into our QA agent and see how it responds:

```python
input_1 = "Examine CloudMate's unique enterprise offerings: comprehensive compliance tools and dedicated support services."
print(qa_agent.generate(input_1))
# CloudMate offers a secure and scalable cloud storage solution designed for businesses
# of all sizes. Its enterprise features include comprehensive compliance tools such as
# military-grade encryption, role-based access control, and multi-factor authentication.
# Additionally, CloudMate provides dedicated support services, including priority customer
# assistance and a dedicated account manager for enterprise clients.
```

The QA agent answered our first input correctly, providing relevant details about CloudMate's compliance tools and dedicated support services.

```python
input_2 = "Examine CloudMate's unique enterprise offerings: comprehensive compliance tools and dedicated support services."
print(qa_agent.generate(input_2))
# The CloudMate Ultra plan will provide unlimited storage and top-tier security features
# at a fixed cost of $99.99/month.
```

The QA agent provided a speculative response about a potential Ultra plan. While the answer is reasonable, it lacks a factual basis unless CloudMate has officially announced such a plan.

```python
input_3 = "How does MadeUpCompany's adherence to GDPR, HIPAA, and SOC 2 ensure data protection?"
print(qa_agent.generate(input_3))
# "MadeUpCompany ensures data protection through strict adherence to GDPR, HIPAA, and SOC 2 standards.
# This includes robust encryption protocols, access control mechanisms, and regular security audits
# to safeguard sensitive data."
```

While the response is generally relevant to data protection, it incorrectly assumes knowledge about MadeUpCompany when the QA agent is meant to provide answers specifically about CloudMate. Instead, the system should have either declined to answer or clarified its limitations.

## Defining the Evaluation Criteria

By reviewing these three responses, we can start to understand what we care about in our QA agent's replies. While the tone could be more personable, the main issue here is that some of the answers are **speculative** when asked questions outside the scope of the company's informational knowledge context, and some are **irrelevant**. These values we care about form our evaluation criteria.

:::info
As you inspect more responses from your QA agent, your evaluation criteria will expand, especially when it comes to product and user feedback.
:::

For now, we'll focus on two key criteria:

1. The answers must be _non-speculative_.
2. The answers must be _relevant_.

:::note
In the next section, we’ll explore how these evaluation criteria help in selecting the right metrics for assessing our QA agent.
:::
230 changes: 230 additions & 0 deletions docs/tutorials/qa-agent-generating-a-synthetic-dataset.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
---
id: qa-agent-generating-a-synthetic-dataset
title: Generating a Comprehensive Synthetic Dataset from your Knowledge Base
sidebar_label: Generating Synthetic Datasets
---

In this section, we'll be generating a synthetic dataset from a text file containing information about MadeUpCompany, the company of interest, and its core product offerings. The goal is to create a dataset containing a **wide variety questions that users might potentially ask**.

:::tip
In QA use cases, synthetic datasets are often superior to human-curated evaluation datasets because they are more diverse and capture more edge cases, especially when using various evolution techniques. They can also be generated much more quickly.
:::

### Generating a Synthetic Dataset

Generating a dataset in `DeepEval` is simple. Simply create an `EvaluationDataset` and call the `synthesize` method with the documents in you knowledge base. In this example, we'll pass a single `.txt` file, but you can include multiple documents in `.txt`, `.pdf`, or `.docx` format.

<details><summary>Click here to see the contents of <strong>datawiz_information.txt</strong></summary>
<p>

```
About MadeUpCompany
MadeUpCompany is a pioneering technology firm founded in 2010, specializing in cloud computing, data analytics, and machine learning. Headquartered in San Francisco, California, we have a global presence with satellite offices in New York, London, and Tokyo. Our mission is to empower businesses and individuals with cutting-edge technology that enhances efficiency, scalability, and innovation.
With a diverse team of experts from various industries—including AI research, cybersecurity, and enterprise software development—we push the boundaries of what’s possible. Our commitment to continuous improvement, security, and customer success has earned us recognition as a leader in the tech space.
Our Values
At MadeUpCompany, we believe in:
Innovation – Continuously developing and refining solutions that meet the evolving needs of businesses.
Security & Privacy – Implementing world-class security protocols to protect our customers' data.
Customer-Centric Approach – Designing intuitive, powerful tools that make complex technology accessible.
Sustainability – Ensuring our infrastructure is energy-efficient and environmentally responsible.
Products and Services
We offer a comprehensive suite of cloud-based solutions that streamline operations, enhance decision-making, and power AI-driven insights.
CloudMate – Secure and Scalable Cloud Storage
CloudMate is our flagship cloud storage solution, designed for businesses of all sizes. Features include:
✅ Seamless data migration with automated backups
✅ Military-grade encryption and multi-factor authentication
✅ Role-based access control for enterprise security
✅ AI-powered file organization and search capabilities
DataWiz – Advanced Data Analytics
DataWiz transforms raw data into actionable insights using cutting-edge machine learning models. Features include:
📊 Predictive analytics for demand forecasting and customer behavior modeling
📊 Real-time dashboards with customizable reporting
📊 API integrations with popular business intelligence tools
📊 Automated anomaly detection for fraud prevention and operational efficiency
Custom AI Solutions
We provide tailored machine learning models to optimize business workflows, automate repetitive tasks, and enhance decision-making. From NLP-based chatbots to AI-driven recommendation engines, we develop bespoke AI solutions for various industries.
Pricing
We offer flexible pricing plans to meet the needs of individuals, small businesses, and large enterprises.
CloudMate Plans
Basic: $9.99/month – 100GB storage, essential security features
Professional: $29.99/month – 1TB storage, enhanced security, priority support
Enterprise: Custom pricing – Unlimited storage, advanced compliance tools, dedicated account manager
DataWiz Plans
Starter: $49/month – Basic analytics, limited AI insights
Growth: $99/month – Advanced machine learning models, predictive analytics
Enterprise: Custom pricing – Full AI customization, dedicated data scientists
Custom AI Solutions – Pricing is determined based on project scope and complexity. Contact our sales team for a personalized quote.
Technical Support
Our award-winning customer support team is available 24/7 to assist with any technical issues. Support channels include:
📞 Toll-free phone support
💬 Live chat assistance
📧 Email support with guaranteed response within 6 hours
📚 Comprehensive FAQ and user guides available on our website
👥 Community forum for peer-to-peer discussions and best practices
Most technical issues are resolved within 24 hours, ensuring minimal downtime for your business.
Security and Compliance
Security is at the heart of everything we do. MadeUpCompany adheres to the highest security and regulatory standards, including:
🔒 GDPR, HIPAA, and SOC 2 Compliance – Ensuring global security and data protection compliance.
🔒 End-to-End Encryption – Protecting data in transit and at rest with AES-256 encryption.
🔒 Zero Trust Architecture – Implementing rigorous access control and continuous authentication.
🔒 DDoS Protection & Advanced Threat Detection – Safeguarding against cyber threats with AI-powered monitoring.
Our team continuously updates security measures to stay ahead of evolving cyber risks.
Account Management
Managing your MadeUpCompany services is simple and intuitive via our online portal. Customers can:
✔️ Upgrade or downgrade plans at any time
✔️ Access billing history and download invoices
✔️ Manage multiple users and set role-based permissions
✔️ Track storage and analytics usage in real time
For enterprise accounts, we offer dedicated account managers who provide strategic guidance and personalized support.
Refund and Cancellation Policy
We stand by the quality of our services and offer a 30-day money-back guarantee on all plans.
If you're not satisfied, you can request a full refund within the first 30 days.
After 30 days, you may cancel your subscription at any time, and we’ll issue a prorated refund based on your remaining subscription period.
Enterprise contracts include a flexible exit clause, ensuring fair terms for long-term clients.
Upcoming Features
We are constantly evolving and introducing new features based on customer feedback. Here’s what’s coming soon:
🚀 AI-Driven Data Insights – DataWiz will introduce automated trend forecasting powered by deep learning.
🚀 Collaboration Tools for CloudMate – Enhanced real-time document editing and team workspaces for seamless collaboration.
🚀 Zero-Knowledge Encryption – An optional feature for businesses requiring absolute data confidentiality.
We value our customers' input and prioritize updates that deliver the most impact.
Why Choose MadeUpCompany?
✔️ Over 1 million satisfied users worldwide
✔️ Trusted by Fortune 500 companies
✔️ Featured in TechCrunch, Forbes, and Wired as a top innovator
✔️ Unmatched customer support and security
Whether you're a startup, an enterprise, or an individual user, MadeUpCompany provides the tools you need to thrive in the digital age.
For more information, visit our website at www.madeupcompany.com or contact our sales team at [email protected]. 🚀
```

</p>
</details>

```python
from deepeval.dataset import EvaluationDataset

dataset = EvaluationDataset()
dataset.generate_goldens_from_docs(document_paths=["madeup_company.txt"])
```

:::info
By default the synthesizer requires an OpenAI API key for generation. However, you can fully **customize the synthetic generation process** with custom models and various parameters. You can learn more about the [synthetic generation process here](/docs/synthesizer-introduction).
:::

A row of dataset is called a `Golden`. When generating from documents, each golden contains an input (the user query), the expected output (ideal LLM response to user query), and the context (ideal context to be retrieved). A golden is different from an `LLMTestCase` because it does not require the actual output, which is generated at evaluation time.

You can access the list of goldens from the dataset object:

```python
print(dataset.goldens[0])
```

<details><summary>Click here to see the generated golden</summary>
<p>

```python
Golden(
input="Examine CloudMate's unique enterprise offerings: comprehensive compliance tools and dedicated support services."
expected_output="""
CloudMate's Enterprise plan offers a robust set of features tailored for large organizations seeking scalable cloud solutions. Key offerings include:
1. **Unlimited Storage**: Ideal for enterprises with substantial data storage requirements.
2. **Advanced Compliance Tools**: These tools help organizations adhere to industry regulations and standards, ensuring data privacy and protection.
3. **Dedicated Account Manager**: Provides personalized support and guidance to maximize the use of CloudMate services.
Combined with enterprise-grade security measures like role-based access control, military-grade encryption, and multi-factor authentication, CloudMate's Enterprise plan ensures both security and seamless operation of critical business functions. Custom pricing options allow for tailored solutions specific to each enterprise's needs.
"""
context=[
"""Basic: $9.99/month – 100GB storage, essential security features
Professional: $29.99/month – 1TB storage, enhanced security, priority support
Enterprise: Custom pricing – Unlimited storage, advanced compliance tools, dedicated account manager
DataWiz Plans
Starter: $49/month – Basic analytics, limited AI insights
Growth: $99/month – Advanced machine learning models, predictive analytics
Enterprise: Custom pricing – Full AI customization
""",
"""✅ Military-grade encryption and multi-factor authentication
✅ Role-based access control for enterprise security
✅ AI-powered file organization and search capabilities
DataWiz – Advanced Data Analytics
DataWiz transforms raw data into actionable insights using cutting-edge machine learning models. Features include:
📊 Predictive analytics for demand forecasting and customer behavior modeling
📊 Real-time dashboards with customizable reporting
📊 API integrations with popular business
""",
""" intelligence tools
📊 Automated anomaly detection for fraud prevention and operational efficiency
Custom AI Solutions
We provide tailored machine learning models to optimize business workflows, automate repetitive tasks, and enhance decision-making. From NLP-based chatbots to AI-driven recommendation engines, we develop bespoke AI solutions for various industries.
Pricing
We offer flexible pricing plans to meet the needs of individuals, small businesses, and large enterprises.
CloudMate Plans
"""
]
)
```

</p>
</details>

## Reviewing the Synthetic Dataset

We'll be pushing our dataset to Confident AI for review. To do so, simply call the `push` method and give your dataset a unique name.

```python
dataset.push("MadeUpCompany QA Dataset")
```

:::info
While it's possible to manually inspect each golden, Confident AI allows you to review all the generated goldens at the same time, as well as **make edits, add annotations, and invite team members** (which is especially helpful if you have a customer support team helping you build the QA dataset).
:::

<div
style={{
display: "flex",
alignItems: "center",
justifyContent: "center",
marginBottom: "20px",
}}
>
<video width="100%" autoPlay loop muted playsInlines>
<source
src="https://confident-docs.s3.us-east-1.amazonaws.com/qa-agent-review-dataset.mp4"
type="video/mp4"
/>
</video>
</div>

Now that we have our dataset, we can beginning testing some of the generated `input`s or user queries in our QA agent in the next section.
Loading

0 comments on commit 884e88f

Please sign in to comment.