Merge pull request #1325 from kritinv/use-case-tutorials-QA-agent

Use Case Tutorials QA agent
confident-ai · Jan 30, 2025 · 884e88f · 884e88f
2 parents a67509b + 5446004
commit 884e88f
Show file tree

Hide file tree

Showing 6 changed files with 459 additions and 11 deletions.
diff --git a/docs/sidebarTutorials.js b/docs/sidebarTutorials.js
@@ -2,6 +2,17 @@ module.exports = {
   tutorials: [
     "tutorial-introduction",
     "tutorial-setup",
+    {
+      type: "category",
+      label: "RAG QA Agent ",
+      items: [
+        "qa-agent-introduction",
+        "qa-agent-generating-a-synthetic-dataset",
+        "qa-agent-defining-an-evaluation-criteria",
+        "qa-agent-choosing-metrics",
+      ],
+      collapsed: false,
+    },
     {
       type: "category",
       label: "Legal Document Summarizer",
@@ -15,7 +26,7 @@ module.exports = {
         "legal-doc-summarizer-maintaining-a-dataset",
         "legal-doc-summarizer-pulling-dataset",
       ],
-      collapsed: false,
+      collapsed: true,
     },
     {
       type: "category",
@@ -32,7 +43,7 @@ module.exports = {
         "tutorial-production-monitoring",
         "tutorial-production-evaluation",
       ],
-      collapsed: false,
+      collapsed: true,
     },
   ],
 };
diff --git a/docs/tutorials/qa-agent-choosing-metrics.mdx b/docs/tutorials/qa-agent-choosing-metrics.mdx
@@ -0,0 +1,5 @@
+---
+id: qa-agent-choosing-metrics
+title: Choosing the right Metrics for your QA Agent
+sidebar_label: Choosing your Metrics
+---
diff --git a/docs/tutorials/qa-agent-defining-an-evaluation-criteria.mdx b/docs/tutorials/qa-agent-defining-an-evaluation-criteria.mdx
@@ -0,0 +1,73 @@
+---
+id: qa-agent-defining-an-evaluation-criteria
+title: Defining an Evaluation Criteria
+sidebar_label: Defining an Evaluation Criteria
+---
+
+In the previous section, we generated an entire synthetic QA dataset in just a few seconds. In this section, we'll be using some of those queries to observe how our RAG QA Agent responds, and carve out our what factors we care about in our QA agent's response. This will help us narrow down our evaluation criteria, which we'll need in order to select the right metrics for evaluation.
+
+## Revisting the Generated User Queries
+
+Let's examine the first three golden generations and generate responses for them. Since we're only examining three user queries, you can simply copy and paste these `input`s from Confident AI into your codebase.
+
+```python
+input_1 = "Examine CloudMate's unique enterprise offerings: comprehensive compliance tools and dedicated support services."
+input_2 = "If CloudMate introduces an Ultra plan, doubling current storage and security, what pricing might be expected?"
+input_3 = "How does MadeUpCompany's adherence to GDPR, HIPAA, and SOC 2 ensure data protection?"
+```
+
+:::note
+As you can see, these synthetic questions are great because they challenge your QA agent on edge cases and go beyond asking simple questions. For example, `input_2` asks about the pricing of a potential new plan offering.
+:::
+
+## Generating QA Agent Responses for Inspection
+
+We'll pass these `input`s into our QA agent and see how it responds:
+
+```python
+input_1 = "Examine CloudMate's unique enterprise offerings: comprehensive compliance tools and dedicated support services."
+print(qa_agent.generate(input_1))
+# CloudMate offers a secure and scalable cloud storage solution designed for businesses
+# of all sizes. Its enterprise features include comprehensive compliance tools such as
+# military-grade encryption, role-based access control, and multi-factor authentication.
+# Additionally, CloudMate provides dedicated support services, including priority customer
+# assistance and a dedicated account manager for enterprise clients.
+```
+
+The QA agent answered our first input correctly, providing relevant details about CloudMate's compliance tools and dedicated support services.
+
+```python
+input_2 = "Examine CloudMate's unique enterprise offerings: comprehensive compliance tools and dedicated support services."
+print(qa_agent.generate(input_2))
+# The CloudMate Ultra plan will provide unlimited storage and top-tier security features
+# at a fixed cost of $99.99/month.
+```
+
+The QA agent provided a speculative response about a potential Ultra plan. While the answer is reasonable, it lacks a factual basis unless CloudMate has officially announced such a plan.
+
+```python
+input_3 = "How does MadeUpCompany's adherence to GDPR, HIPAA, and SOC 2 ensure data protection?"
+print(qa_agent.generate(input_3))
+# "MadeUpCompany ensures data protection through strict adherence to GDPR, HIPAA, and SOC 2 standards.
+# This includes robust encryption protocols, access control mechanisms, and regular security audits
+# to safeguard sensitive data."
+```
+
+While the response is generally relevant to data protection, it incorrectly assumes knowledge about MadeUpCompany when the QA agent is meant to provide answers specifically about CloudMate. Instead, the system should have either declined to answer or clarified its limitations.
+
+## Defining the Evaluation Criteria
+
+By reviewing these three responses, we can start to understand what we care about in our QA agent's replies. While the tone could be more personable, the main issue here is that some of the answers are **speculative** when asked questions outside the scope of the company's informational knowledge context, and some are **irrelevant**. These values we care about form our evaluation criteria.
+
+:::info
+As you inspect more responses from your QA agent, your evaluation criteria will expand, especially when it comes to product and user feedback.
+:::
+
+For now, we'll focus on two key criteria:
+
+1. The answers must be _non-speculative_.
+2. The answers must be _relevant_.
+
+:::note
+In the next section, we’ll explore how these evaluation criteria help in selecting the right metrics for assessing our QA agent.
+:::
diff --git a/docs/tutorials/qa-agent-generating-a-synthetic-dataset.mdx b/docs/tutorials/qa-agent-generating-a-synthetic-dataset.mdx
@@ -0,0 +1,230 @@
+---
+id: qa-agent-generating-a-synthetic-dataset
+title: Generating a Comprehensive Synthetic Dataset from your Knowledge Base
+sidebar_label: Generating Synthetic Datasets
+---
+
+In this section, we'll be generating a synthetic dataset from a text file containing information about MadeUpCompany, the company of interest, and its core product offerings. The goal is to create a dataset containing a **wide variety questions that users might potentially ask**.
+
+:::tip
+In QA use cases, synthetic datasets are often superior to human-curated evaluation datasets because they are more diverse and capture more edge cases, especially when using various evolution techniques. They can also be generated much more quickly.
+:::
+
+### Generating a Synthetic Dataset
+
+Generating a dataset in `DeepEval` is simple. Simply create an `EvaluationDataset` and call the `synthesize` method with the documents in you knowledge base. In this example, we'll pass a single `.txt` file, but you can include multiple documents in `.txt`, `.pdf`, or `.docx` format.
+
+<details><summary>Click here to see the contents of <strong>datawiz_information.txt</strong></summary>
+<p>
+
+```
+About MadeUpCompany
+MadeUpCompany is a pioneering technology firm founded in 2010, specializing in cloud computing, data analytics, and machine learning. Headquartered in San Francisco, California, we have a global presence with satellite offices in New York, London, and Tokyo. Our mission is to empower businesses and individuals with cutting-edge technology that enhances efficiency, scalability, and innovation.
+
+With a diverse team of experts from various industries—including AI research, cybersecurity, and enterprise software development—we push the boundaries of what’s possible. Our commitment to continuous improvement, security, and customer success has earned us recognition as a leader in the tech space.
+
+Our Values
+At MadeUpCompany, we believe in:
+
+Innovation – Continuously developing and refining solutions that meet the evolving needs of businesses.
+Security & Privacy – Implementing world-class security protocols to protect our customers' data.
+Customer-Centric Approach – Designing intuitive, powerful tools that make complex technology accessible.
+Sustainability – Ensuring our infrastructure is energy-efficient and environmentally responsible.
+Products and Services
+We offer a comprehensive suite of cloud-based solutions that streamline operations, enhance decision-making, and power AI-driven insights.
+
+CloudMate – Secure and Scalable Cloud Storage
+CloudMate is our flagship cloud storage solution, designed for businesses of all sizes. Features include:
+✅ Seamless data migration with automated backups
+✅ Military-grade encryption and multi-factor authentication
+✅ Role-based access control for enterprise security
+✅ AI-powered file organization and search capabilities
+
+DataWiz – Advanced Data Analytics
+DataWiz transforms raw data into actionable insights using cutting-edge machine learning models. Features include:
+📊 Predictive analytics for demand forecasting and customer behavior modeling
+📊 Real-time dashboards with customizable reporting
+📊 API integrations with popular business intelligence tools
+📊 Automated anomaly detection for fraud prevention and operational efficiency
+
+Custom AI Solutions
+We provide tailored machine learning models to optimize business workflows, automate repetitive tasks, and enhance decision-making. From NLP-based chatbots to AI-driven recommendation engines, we develop bespoke AI solutions for various industries.
+
+Pricing
+We offer flexible pricing plans to meet the needs of individuals, small businesses, and large enterprises.
+
+CloudMate Plans
+
+Basic: $9.99/month – 100GB storage, essential security features
+Professional: $29.99/month – 1TB storage, enhanced security, priority support
+Enterprise: Custom pricing – Unlimited storage, advanced compliance tools, dedicated account manager
+DataWiz Plans
+
+Starter: $49/month – Basic analytics, limited AI insights
+Growth: $99/month – Advanced machine learning models, predictive analytics
+Enterprise: Custom pricing – Full AI customization, dedicated data scientists
+Custom AI Solutions – Pricing is determined based on project scope and complexity. Contact our sales team for a personalized quote.
+
+Technical Support
+Our award-winning customer support team is available 24/7 to assist with any technical issues. Support channels include:
+📞 Toll-free phone support
+💬 Live chat assistance
+📧 Email support with guaranteed response within 6 hours
+📚 Comprehensive FAQ and user guides available on our website
+👥 Community forum for peer-to-peer discussions and best practices
+
+Most technical issues are resolved within 24 hours, ensuring minimal downtime for your business.
+
+Security and Compliance
+Security is at the heart of everything we do. MadeUpCompany adheres to the highest security and regulatory standards, including:
+
+🔒 GDPR, HIPAA, and SOC 2 Compliance – Ensuring global security and data protection compliance.
+🔒 End-to-End Encryption – Protecting data in transit and at rest with AES-256 encryption.
+🔒 Zero Trust Architecture – Implementing rigorous access control and continuous authentication.
+🔒 DDoS Protection & Advanced Threat Detection – Safeguarding against cyber threats with AI-powered monitoring.
+
+Our team continuously updates security measures to stay ahead of evolving cyber risks.
+
+Account Management
+Managing your MadeUpCompany services is simple and intuitive via our online portal. Customers can:
+
+✔️ Upgrade or downgrade plans at any time
+✔️ Access billing history and download invoices
+✔️ Manage multiple users and set role-based permissions
+✔️ Track storage and analytics usage in real time
+
+For enterprise accounts, we offer dedicated account managers who provide strategic guidance and personalized support.
+
+Refund and Cancellation Policy
+We stand by the quality of our services and offer a 30-day money-back guarantee on all plans.
+
+If you're not satisfied, you can request a full refund within the first 30 days.
+After 30 days, you may cancel your subscription at any time, and we’ll issue a prorated refund based on your remaining subscription period.
+Enterprise contracts include a flexible exit clause, ensuring fair terms for long-term clients.
+Upcoming Features
+We are constantly evolving and introducing new features based on customer feedback. Here’s what’s coming soon:
+
+🚀 AI-Driven Data Insights – DataWiz will introduce automated trend forecasting powered by deep learning.
+🚀 Collaboration Tools for CloudMate – Enhanced real-time document editing and team workspaces for seamless collaboration.
+🚀 Zero-Knowledge Encryption – An optional feature for businesses requiring absolute data confidentiality.
+
+We value our customers' input and prioritize updates that deliver the most impact.
+
+Why Choose MadeUpCompany?
+
+✔️ Over 1 million satisfied users worldwide
+✔️ Trusted by Fortune 500 companies
+✔️ Featured in TechCrunch, Forbes, and Wired as a top innovator
+✔️ Unmatched customer support and security
+
+Whether you're a startup, an enterprise, or an individual user, MadeUpCompany provides the tools you need to thrive in the digital age.
+
+For more information, visit our website at www.madeupcompany.com or contact our sales team at [email protected]. 🚀
+```
+
+</p>
+</details>
+
+```python
+from deepeval.dataset import EvaluationDataset
+
+dataset = EvaluationDataset()
+dataset.generate_goldens_from_docs(document_paths=["madeup_company.txt"])
+```
+
+:::info
+By default the synthesizer requires an OpenAI API key for generation. However, you can fully **customize the synthetic generation process** with custom models and various parameters. You can learn more about the [synthetic generation process here](/docs/synthesizer-introduction).
+:::
+
+A row of dataset is called a `Golden`. When generating from documents, each golden contains an input (the user query), the expected output (ideal LLM response to user query), and the context (ideal context to be retrieved). A golden is different from an `LLMTestCase` because it does not require the actual output, which is generated at evaluation time.
+
+You can access the list of goldens from the dataset object:
+
+```python
+print(dataset.goldens[0])
+```
+
+<details><summary>Click here to see the generated golden</summary>
+<p>
+
+```python
+Golden(
+    input="Examine CloudMate's unique enterprise offerings: comprehensive compliance tools and dedicated support services."
+    expected_output="""
+        CloudMate's Enterprise plan offers a robust set of features tailored for large organizations seeking scalable cloud solutions. Key offerings include:
+
+        1. **Unlimited Storage**: Ideal for enterprises with substantial data storage requirements.
+
+        2. **Advanced Compliance Tools**: These tools help organizations adhere to industry regulations and standards, ensuring data privacy and protection.
+
+        3. **Dedicated Account Manager**: Provides personalized support and guidance to maximize the use of CloudMate services.
+
+        Combined with enterprise-grade security measures like role-based access control, military-grade encryption, and multi-factor authentication, CloudMate's Enterprise plan ensures both security and seamless operation of critical business functions. Custom pricing options allow for tailored solutions specific to each enterprise's needs.
+    """
+    context=[
+        """Basic: $9.99/month – 100GB storage, essential security features
+            Professional: $29.99/month – 1TB storage, enhanced security, priority support
+            Enterprise: Custom pricing – Unlimited storage, advanced compliance tools, dedicated account manager
+            DataWiz Plans
+
+            Starter: $49/month – Basic analytics, limited AI insights
+            Growth: $99/month – Advanced machine learning models, predictive analytics
+            Enterprise: Custom pricing – Full AI customization
+        """,
+        """✅ Military-grade encryption and multi-factor authentication
+            ✅ Role-based access control for enterprise security
+            ✅ AI-powered file organization and search capabilities
+
+            DataWiz – Advanced Data Analytics
+            DataWiz transforms raw data into actionable insights using cutting-edge machine learning models. Features include:
+            📊 Predictive analytics for demand forecasting and customer behavior modeling
+            📊 Real-time dashboards with customizable reporting
+            📊 API integrations with popular business
+        """,
+        """ intelligence tools
+            📊 Automated anomaly detection for fraud prevention and operational efficiency
+
+            Custom AI Solutions
+            We provide tailored machine learning models to optimize business workflows, automate repetitive tasks, and enhance decision-making. From NLP-based chatbots to AI-driven recommendation engines, we develop bespoke AI solutions for various industries.
+
+            Pricing
+            We offer flexible pricing plans to meet the needs of individuals, small businesses, and large enterprises.
+
+            CloudMate Plans
+        """
+    ]
+)
+```
+
+</p>
+</details>
+
+## Reviewing the Synthetic Dataset
+
+We'll be pushing our dataset to Confident AI for review. To do so, simply call the `push` method and give your dataset a unique name.
+
+```python
+dataset.push("MadeUpCompany QA Dataset")
+```
+
+:::info
+While it's possible to manually inspect each golden, Confident AI allows you to review all the generated goldens at the same time, as well as **make edits, add annotations, and invite team members** (which is especially helpful if you have a customer support team helping you build the QA dataset).
+:::
+
+<div
+  style={{
+    display: "flex",
+    alignItems: "center",
+    justifyContent: "center",
+    marginBottom: "20px",
+  }}
+>
+  <video width="100%" autoPlay loop muted playsInlines>
+    <source
+      src="https://confident-docs.s3.us-east-1.amazonaws.com/qa-agent-review-dataset.mp4"
+      type="video/mp4"
+    />
+  </video>
+</div>
+
+Now that we have our dataset, we can beginning testing some of the generated `input`s or user queries in our QA agent in the next section.