diff --git a/tutorials/27_First_RAG_Pipeline.ipynb b/tutorials/27_First_RAG_Pipeline.ipynb index d9467ff..1e5b0b6 100644 --- a/tutorials/27_First_RAG_Pipeline.ipynb +++ b/tutorials/27_First_RAG_Pipeline.ipynb @@ -1,1446 +1,1480 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "2OvkPji9O-qX" - }, - "source": [ - "# Tutorial: Creating Your First QA Pipeline with Retrieval-Augmentation\n", - "\n", - "- **Level**: Beginner\n", - "- **Time to complete**: 10 minutes\n", - "- **Components Used**: [`InMemoryDocumentStore`](https://docs.haystack.deepset.ai/docs/inmemorydocumentstore), [`SentenceTransformersDocumentEmbedder`](https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder), [`SentenceTransformersTextEmbedder`](https://docs.haystack.deepset.ai/docs/sentencetransformerstextembedder), [`InMemoryEmbeddingRetriever`](https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever), [`PromptBuilder`](https://docs.haystack.deepset.ai/docs/promptbuilder), [`OpenAIGenerator`](https://docs.haystack.deepset.ai/docs/openaigenerator)\n", - "- **Prerequisites**: You must have an [OpenAI API Key](https://platform.openai.com/api-keys).\n", - "- **Goal**: After completing this tutorial, you'll have learned the new prompt syntax and how to use PromptBuilder and OpenAIGenerator to build a generative question-answering pipeline with retrieval-augmentation.\n", - "\n", - "> This tutorial uses Haystack 2.0. To learn more, read the [Haystack 2.0 announcement](https://haystack.deepset.ai/blog/haystack-2-release) or visit the [Haystack 2.0 Documentation](https://docs.haystack.deepset.ai/docs/intro)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "LFqHcXYPO-qZ" - }, - "source": [ - "## Overview\n", - "\n", - "This tutorial shows you how to create a generative question-answering pipeline using the retrieval-augmentation ([RAG](https://www.deepset.ai/blog/llms-retrieval-augmentation)) approach with Haystack 2.0. The process involves four main components: [SentenceTransformersTextEmbedder](https://docs.haystack.deepset.ai/docs/sentencetransformerstextembedder) for creating an embedding for the user query, [InMemoryBM25Retriever](https://docs.haystack.deepset.ai/docs/inmemorybm25retriever) for fetching relevant documents, [PromptBuilder](https://docs.haystack.deepset.ai/docs/promptbuilder) for creating a template prompt, and [OpenAIGenerator](https://docs.haystack.deepset.ai/docs/openaigenerator) for generating responses.\n", - "\n", - "For this tutorial, you'll use the Wikipedia pages of [Seven Wonders of the Ancient World](https://en.wikipedia.org/wiki/Wonders_of_the_World) as Documents, but you can replace them with any text you want.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "QXjVlbPiO-qZ" - }, - "source": [ - "## Preparing the Colab Environment\n", - "\n", - "- [Enable GPU Runtime in Colab](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration)\n", - "- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/logging)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Kww5B_vXO-qZ" - }, - "source": [ - "## Installing Haystack\n", - "\n", - "Install Haystack 2.0 and other required packages with `pip`:" - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "2OvkPji9O-qX" + }, + "source": [ + "# Tutorial: Creating Your First QA Pipeline with Retrieval-Augmentation\n", + "\n", + "- **Level**: Beginner\n", + "- **Time to complete**: 10 minutes\n", + "- **Components Used**: [`InMemoryDocumentStore`](https://docs.haystack.deepset.ai/docs/inmemorydocumentstore), [`SentenceTransformersDocumentEmbedder`](https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder), [`SentenceTransformersTextEmbedder`](https://docs.haystack.deepset.ai/docs/sentencetransformerstextembedder), [`InMemoryEmbeddingRetriever`](https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever), [`PromptBuilder`](https://docs.haystack.deepset.ai/docs/promptbuilder), [`OpenAIChatGenerator`](https://docs.haystack.deepset.ai/docs/openaichatgenerator)\n", + "- **Prerequisites**: You must have an [OpenAI API Key](https://platform.openai.com/api-keys).\n", + "- **Goal**: After completing this tutorial, you'll have learned the new prompt syntax and how to use PromptBuilder and OpenAIChatGenerator to build a generative question-answering pipeline with retrieval-augmentation.\n", + "\n", + "> This tutorial uses Haystack 2.0. To learn more, read the [Haystack 2.0 announcement](https://haystack.deepset.ai/blog/haystack-2-release) or visit the [Haystack 2.0 Documentation](https://docs.haystack.deepset.ai/docs/intro)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LFqHcXYPO-qZ" + }, + "source": [ + "## Overview\n", + "\n", + "This tutorial shows you how to create a generative question-answering pipeline using the retrieval-augmentation ([RAG](https://www.deepset.ai/blog/llms-retrieval-augmentation)) approach with Haystack 2.0. The process involves four main components: [SentenceTransformersTextEmbedder](https://docs.haystack.deepset.ai/docs/sentencetransformerstextembedder) for creating an embedding for the user query, [InMemoryBM25Retriever](https://docs.haystack.deepset.ai/docs/inmemorybm25retriever) for fetching relevant documents, [PromptBuilder](https://docs.haystack.deepset.ai/docs/promptbuilder) for creating a template prompt, and [OpenAIChatGenerator](https://docs.haystack.deepset.ai/docs/openaichatgenerator) for generating responses.\n", + "\n", + "For this tutorial, you'll use the Wikipedia pages of [Seven Wonders of the Ancient World](https://en.wikipedia.org/wiki/Wonders_of_the_World) as Documents, but you can replace them with any text you want.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QXjVlbPiO-qZ" + }, + "source": [ + "## Preparing the Colab Environment\n", + "\n", + "- [Enable GPU Runtime in Colab](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration)\n", + "- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/logging)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Kww5B_vXO-qZ" + }, + "source": [ + "## Installing Haystack\n", + "\n", + "Install Haystack 2.0 and other required packages with `pip`:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" }, + "id": "UQbU8GUfO-qZ", + "outputId": "c33579e9-5557-43bd-a3c5-63b8373770c7" + }, + "outputs": [ { - "cell_type": "code", - "execution_count": 1, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "UQbU8GUfO-qZ", - "outputId": "c33579e9-5557-43bd-a3c5-63b8373770c7" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Requirement already satisfied: haystack-ai in /usr/local/lib/python3.10/dist-packages (2.0.0b8)\n", - "Requirement already satisfied: boilerpy3 in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (1.0.7)\n", - "Requirement already satisfied: haystack-bm25 in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (1.0.2)\n", - "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (3.1.3)\n", - "Requirement already satisfied: lazy-imports in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (0.3.1)\n", - "Requirement already satisfied: more-itertools in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (10.1.0)\n", - "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (3.2.1)\n", - "Requirement already satisfied: openai>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (1.13.3)\n", - "Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (1.5.3)\n", - "Requirement already satisfied: posthog in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (3.5.0)\n", - "Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (6.0.1)\n", - "Requirement already satisfied: tenacity in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (8.2.3)\n", - "Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (4.66.2)\n", - "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (4.10.0)\n", - "Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from openai>=1.1.0->haystack-ai) (3.7.1)\n", - "Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from openai>=1.1.0->haystack-ai) (1.7.0)\n", - "Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from openai>=1.1.0->haystack-ai) (0.27.0)\n", - "Requirement already satisfied: pydantic<3,>=1.9.0 in /usr/local/lib/python3.10/dist-packages (from openai>=1.1.0->haystack-ai) (2.6.3)\n", - "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from openai>=1.1.0->haystack-ai) (1.3.1)\n", - "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from haystack-bm25->haystack-ai) (1.25.2)\n", - "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->haystack-ai) (2.1.5)\n", - "Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas->haystack-ai) (2.8.2)\n", - "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->haystack-ai) (2023.4)\n", - "Requirement already satisfied: requests<3.0,>=2.7 in /usr/local/lib/python3.10/dist-packages (from posthog->haystack-ai) (2.31.0)\n", - "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from posthog->haystack-ai) (1.16.0)\n", - "Requirement already satisfied: monotonic>=1.5 in /usr/local/lib/python3.10/dist-packages (from posthog->haystack-ai) (1.6)\n", - "Requirement already satisfied: backoff>=1.10.0 in /usr/local/lib/python3.10/dist-packages (from posthog->haystack-ai) (2.2.1)\n", - "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai>=1.1.0->haystack-ai) (3.6)\n", - "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai>=1.1.0->haystack-ai) (1.2.0)\n", - "Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->openai>=1.1.0->haystack-ai) (2024.2.2)\n", - "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->openai>=1.1.0->haystack-ai) (1.0.4)\n", - "Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.10/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai>=1.1.0->haystack-ai) (0.14.0)\n", - "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->openai>=1.1.0->haystack-ai) (0.6.0)\n", - "Requirement already satisfied: pydantic-core==2.16.3 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->openai>=1.1.0->haystack-ai) (2.16.3)\n", - "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3.0,>=2.7->posthog->haystack-ai) (3.3.2)\n", - "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0,>=2.7->posthog->haystack-ai) (2.0.7)\n", - "Requirement already satisfied: datasets>=2.6.1 in /usr/local/lib/python3.10/dist-packages (2.18.0)\n", - "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from datasets>=2.6.1) (3.13.1)\n", - "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.6.1) (1.25.2)\n", - "Requirement already satisfied: pyarrow>=12.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.6.1) (14.0.2)\n", - "Requirement already satisfied: pyarrow-hotfix in /usr/local/lib/python3.10/dist-packages (from datasets>=2.6.1) (0.6)\n", - "Requirement already satisfied: dill<0.3.9,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.6.1) (0.3.8)\n", - "Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datasets>=2.6.1) (1.5.3)\n", - "Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.6.1) (2.31.0)\n", - "Requirement already satisfied: tqdm>=4.62.1 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.6.1) (4.66.2)\n", - "Requirement already satisfied: xxhash in /usr/local/lib/python3.10/dist-packages (from datasets>=2.6.1) (3.4.1)\n", - "Requirement already satisfied: multiprocess in /usr/local/lib/python3.10/dist-packages (from datasets>=2.6.1) (0.70.16)\n", - "Requirement already satisfied: fsspec[http]<=2024.2.0,>=2023.1.0 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.6.1) (2023.6.0)\n", - "Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets>=2.6.1) (3.9.3)\n", - "Requirement already satisfied: huggingface-hub>=0.19.4 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.6.1) (0.20.3)\n", - "Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from datasets>=2.6.1) (23.2)\n", - "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.6.1) (6.0.1)\n", - "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.6.1) (1.3.1)\n", - "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.6.1) (23.2.0)\n", - "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.6.1) (1.4.1)\n", - "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.6.1) (6.0.5)\n", - "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.6.1) (1.9.4)\n", - "Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.6.1) (4.0.3)\n", - "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.19.4->datasets>=2.6.1) (4.10.0)\n", - "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets>=2.6.1) (3.3.2)\n", - "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets>=2.6.1) (3.6)\n", - "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets>=2.6.1) (2.0.7)\n", - "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets>=2.6.1) (2024.2.2)\n", - "Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets>=2.6.1) (2.8.2)\n", - "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets>=2.6.1) (2023.4)\n", - "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas->datasets>=2.6.1) (1.16.0)\n", - "Requirement already satisfied: sentence-transformers>=2.2.0 in /usr/local/lib/python3.10/dist-packages (2.5.1)\n", - "Requirement already satisfied: transformers<5.0.0,>=4.32.0 in /usr/local/lib/python3.10/dist-packages (from sentence-transformers>=2.2.0) (4.38.2)\n", - "Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from sentence-transformers>=2.2.0) (4.66.2)\n", - "Requirement already satisfied: torch>=1.11.0 in /usr/local/lib/python3.10/dist-packages (from sentence-transformers>=2.2.0) (2.1.0+cu121)\n", - "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from sentence-transformers>=2.2.0) (1.25.2)\n", - "Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (from sentence-transformers>=2.2.0) (1.2.2)\n", - "Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from sentence-transformers>=2.2.0) (1.11.4)\n", - "Requirement already satisfied: huggingface-hub>=0.15.1 in /usr/local/lib/python3.10/dist-packages (from sentence-transformers>=2.2.0) (0.20.3)\n", - "Requirement already satisfied: Pillow in /usr/local/lib/python3.10/dist-packages (from sentence-transformers>=2.2.0) (9.4.0)\n", - "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.15.1->sentence-transformers>=2.2.0) (3.13.1)\n", - "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.15.1->sentence-transformers>=2.2.0) (2023.6.0)\n", - "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.15.1->sentence-transformers>=2.2.0) (2.31.0)\n", - "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.15.1->sentence-transformers>=2.2.0) (6.0.1)\n", - "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.15.1->sentence-transformers>=2.2.0) (4.10.0)\n", - "Requirement already satisfied: packaging>=20.9 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.15.1->sentence-transformers>=2.2.0) (23.2)\n", - "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.11.0->sentence-transformers>=2.2.0) (1.12)\n", - "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.11.0->sentence-transformers>=2.2.0) (3.2.1)\n", - "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.11.0->sentence-transformers>=2.2.0) (3.1.3)\n", - "Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.11.0->sentence-transformers>=2.2.0) (2.1.0)\n", - "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers<5.0.0,>=4.32.0->sentence-transformers>=2.2.0) (2023.12.25)\n", - "Requirement already satisfied: tokenizers<0.19,>=0.14 in /usr/local/lib/python3.10/dist-packages (from transformers<5.0.0,>=4.32.0->sentence-transformers>=2.2.0) (0.15.2)\n", - "Requirement already satisfied: safetensors>=0.4.1 in /usr/local/lib/python3.10/dist-packages (from transformers<5.0.0,>=4.32.0->sentence-transformers>=2.2.0) (0.4.2)\n", - "Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->sentence-transformers>=2.2.0) (1.3.2)\n", - "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->sentence-transformers>=2.2.0) (3.3.0)\n", - "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.11.0->sentence-transformers>=2.2.0) (2.1.5)\n", - "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.15.1->sentence-transformers>=2.2.0) (3.3.2)\n", - "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.15.1->sentence-transformers>=2.2.0) (3.6)\n", - "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.15.1->sentence-transformers>=2.2.0) (2.0.7)\n", - "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.15.1->sentence-transformers>=2.2.0) (2024.2.2)\n", - "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.11.0->sentence-transformers>=2.2.0) (1.3.0)\n" - ] - } - ], - "source": [ - "%%bash\n", - "\n", - "pip install haystack-ai\n", - "pip install \"datasets>=2.6.1\"\n", - "pip install \"sentence-transformers>=3.0.0\"" - ] + "name": "stdout", + "output_type": "stream", + "text": [ + "Defaulting to user installation because normal site-packages is not writeable\n", + "Requirement already satisfied: haystack-ai==2.8.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (2.8.0)\n", + "Requirement already satisfied: haystack-experimental in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from haystack-ai==2.8.0) (0.3.0)\n", + "Requirement already satisfied: jinja2 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from haystack-ai==2.8.0) (3.1.4)\n", + "Requirement already satisfied: lazy-imports in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from haystack-ai==2.8.0) (0.3.1)\n", + "Requirement already satisfied: more-itertools in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from haystack-ai==2.8.0) (10.2.0)\n", + "Requirement already satisfied: networkx in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from haystack-ai==2.8.0) (3.2.1)\n", + "Requirement already satisfied: numpy in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from haystack-ai==2.8.0) (1.26.4)\n", + "Requirement already satisfied: openai>=1.1.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from haystack-ai==2.8.0) (1.31.1)\n", + "Requirement already satisfied: pandas in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from haystack-ai==2.8.0) (2.2.2)\n", + "Requirement already satisfied: posthog in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from haystack-ai==2.8.0) (3.5.0)\n", + "Requirement already satisfied: python-dateutil in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from haystack-ai==2.8.0) (2.9.0.post0)\n", + "Requirement already satisfied: pyyaml in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from haystack-ai==2.8.0) (6.0.1)\n", + "Requirement already satisfied: requests in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from haystack-ai==2.8.0) (2.32.3)\n", + "Requirement already satisfied: tenacity!=8.4.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from haystack-ai==2.8.0) (8.3.0)\n", + "Requirement already satisfied: tqdm in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from haystack-ai==2.8.0) (4.66.4)\n", + "Requirement already satisfied: typing-extensions>=4.7 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from haystack-ai==2.8.0) (4.12.1)\n", + "Requirement already satisfied: anyio<5,>=3.5.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from openai>=1.1.0->haystack-ai==2.8.0) (4.4.0)\n", + "Requirement already satisfied: distro<2,>=1.7.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from openai>=1.1.0->haystack-ai==2.8.0) (1.9.0)\n", + "Requirement already satisfied: httpx<1,>=0.23.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from openai>=1.1.0->haystack-ai==2.8.0) (0.27.0)\n", + "Requirement already satisfied: pydantic<3,>=1.9.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from openai>=1.1.0->haystack-ai==2.8.0) (2.7.3)\n", + "Requirement already satisfied: sniffio in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from openai>=1.1.0->haystack-ai==2.8.0) (1.3.1)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from jinja2->haystack-ai==2.8.0) (2.1.5)\n", + "Requirement already satisfied: pytz>=2020.1 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from pandas->haystack-ai==2.8.0) (2024.1)\n", + "Requirement already satisfied: tzdata>=2022.7 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from pandas->haystack-ai==2.8.0) (2024.1)\n", + "Requirement already satisfied: six>=1.5 in /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/site-packages (from python-dateutil->haystack-ai==2.8.0) (1.15.0)\n", + "Requirement already satisfied: monotonic>=1.5 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from posthog->haystack-ai==2.8.0) (1.6)\n", + "Requirement already satisfied: backoff>=1.10.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from posthog->haystack-ai==2.8.0) (2.2.1)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from requests->haystack-ai==2.8.0) (3.3.2)\n", + "Requirement already satisfied: idna<4,>=2.5 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from requests->haystack-ai==2.8.0) (3.7)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from requests->haystack-ai==2.8.0) (1.26.18)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from requests->haystack-ai==2.8.0) (2024.6.2)\n", + "Requirement already satisfied: exceptiongroup>=1.0.2 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from anyio<5,>=3.5.0->openai>=1.1.0->haystack-ai==2.8.0) (1.2.1)\n", + "Requirement already satisfied: httpcore==1.* in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from httpx<1,>=0.23.0->openai>=1.1.0->haystack-ai==2.8.0) (1.0.5)\n", + "Requirement already satisfied: h11<0.15,>=0.13 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai>=1.1.0->haystack-ai==2.8.0) (0.14.0)\n", + "Requirement already satisfied: annotated-types>=0.4.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from pydantic<3,>=1.9.0->openai>=1.1.0->haystack-ai==2.8.0) (0.7.0)\n", + "Requirement already satisfied: pydantic-core==2.18.4 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from pydantic<3,>=1.9.0->openai>=1.1.0->haystack-ai==2.8.0) (2.18.4)\n" + ] }, { - "cell_type": "markdown", - "metadata": { - "id": "Wl_jYERtO-qa" - }, - "source": [ - "### Enabling Telemetry\n", - "\n", - "Knowing you're using this tutorial helps us decide where to invest our efforts to build a better product but you can always opt out by commenting the following line. See [Telemetry](https://docs.haystack.deepset.ai/docs/enabling-telemetry) for more details." - ] + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49m/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip\u001b[0m\n" + ] }, { - "cell_type": "code", - "execution_count": 2, - "metadata": { - "id": "A76B4S49O-qa" - }, - "outputs": [], - "source": [ - "from haystack.telemetry import tutorial_running\n", - "\n", - "tutorial_running(27)" - ] + "name": "stdout", + "output_type": "stream", + "text": [ + "Defaulting to user installation because normal site-packages is not writeable\n", + "Requirement already satisfied: datasets>=2.6.1 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (3.1.0)\n", + "Requirement already satisfied: filelock in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from datasets>=2.6.1) (3.14.0)\n", + "Requirement already satisfied: numpy>=1.17 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from datasets>=2.6.1) (1.26.4)\n", + "Requirement already satisfied: pyarrow>=15.0.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from datasets>=2.6.1) (18.1.0)\n", + "Requirement already satisfied: dill<0.3.9,>=0.3.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from datasets>=2.6.1) (0.3.8)\n", + "Requirement already satisfied: pandas in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from datasets>=2.6.1) (2.2.2)\n", + "Requirement already satisfied: requests>=2.32.2 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from datasets>=2.6.1) (2.32.3)\n", + "Requirement already satisfied: tqdm>=4.66.3 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from datasets>=2.6.1) (4.66.4)\n", + "Requirement already satisfied: xxhash in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from datasets>=2.6.1) (3.5.0)\n", + "Requirement already satisfied: multiprocess<0.70.17 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from datasets>=2.6.1) (0.70.16)\n", + "Requirement already satisfied: fsspec<=2024.9.0,>=2023.1.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets>=2.6.1) (2024.6.0)\n", + "Requirement already satisfied: aiohttp in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from datasets>=2.6.1) (3.11.10)\n", + "Requirement already satisfied: huggingface-hub>=0.23.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from datasets>=2.6.1) (0.23.3)\n", + "Requirement already satisfied: packaging in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from datasets>=2.6.1) (24.0)\n", + "Requirement already satisfied: pyyaml>=5.1 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from datasets>=2.6.1) (6.0.1)\n", + "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from aiohttp->datasets>=2.6.1) (2.4.4)\n", + "Requirement already satisfied: aiosignal>=1.1.2 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from aiohttp->datasets>=2.6.1) (1.3.1)\n", + "Requirement already satisfied: async-timeout<6.0,>=4.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from aiohttp->datasets>=2.6.1) (5.0.1)\n", + "Requirement already satisfied: attrs>=17.3.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from aiohttp->datasets>=2.6.1) (24.2.0)\n", + "Requirement already satisfied: frozenlist>=1.1.1 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from aiohttp->datasets>=2.6.1) (1.5.0)\n", + "Requirement already satisfied: multidict<7.0,>=4.5 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from aiohttp->datasets>=2.6.1) (6.1.0)\n", + "Requirement already satisfied: propcache>=0.2.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from aiohttp->datasets>=2.6.1) (0.2.1)\n", + "Requirement already satisfied: yarl<2.0,>=1.17.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from aiohttp->datasets>=2.6.1) (1.18.3)\n", + "Requirement already satisfied: typing-extensions>=3.7.4.3 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from huggingface-hub>=0.23.0->datasets>=2.6.1) (4.12.1)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from requests>=2.32.2->datasets>=2.6.1) (3.3.2)\n", + "Requirement already satisfied: idna<4,>=2.5 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from requests>=2.32.2->datasets>=2.6.1) (3.7)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from requests>=2.32.2->datasets>=2.6.1) (1.26.18)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from requests>=2.32.2->datasets>=2.6.1) (2024.6.2)\n", + "Requirement already satisfied: python-dateutil>=2.8.2 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from pandas->datasets>=2.6.1) (2.9.0.post0)\n", + "Requirement already satisfied: pytz>=2020.1 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from pandas->datasets>=2.6.1) (2024.1)\n", + "Requirement already satisfied: tzdata>=2022.7 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from pandas->datasets>=2.6.1) (2024.1)\n", + "Requirement already satisfied: six>=1.5 in /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/site-packages (from python-dateutil>=2.8.2->pandas->datasets>=2.6.1) (1.15.0)\n" + ] }, { - "cell_type": "markdown", - "metadata": { - "id": "_lvfew16O-qa" - }, - "source": [ - "## Fetching and Indexing Documents\n", - "\n", - "You'll start creating your question answering system by downloading the data and indexing the data with its embeddings to a DocumentStore. \n", - "\n", - "In this tutorial, you will take a simple approach to writing documents and their embeddings into the DocumentStore. For a full indexing pipeline with preprocessing, cleaning and splitting, check out our tutorial on [Preprocessing Different File Types](https://haystack.deepset.ai/tutorials/30_file_type_preprocessing_index_pipeline).\n", - "\n", - "\n", - "### Initializing the DocumentStore\n", - "\n", - "Initialize a DocumentStore to index your documents. A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. In this tutorial, you'll be using the `InMemoryDocumentStore`." - ] + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49m/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip\u001b[0m\n" + ] }, { - "cell_type": "code", - "execution_count": 3, - "metadata": { - "id": "CbVN-s5LO-qa" - }, - "outputs": [], - "source": [ - "from haystack.document_stores.in_memory import InMemoryDocumentStore\n", - "\n", - "document_store = InMemoryDocumentStore()" - ] + "name": "stdout", + "output_type": "stream", + "text": [ + "Defaulting to user installation because normal site-packages is not writeable\n", + "Requirement already satisfied: sentence-transformers>=3.0.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (3.0.0)\n", + "Requirement already satisfied: transformers<5.0.0,>=4.34.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from sentence-transformers>=3.0.0) (4.41.2)\n", + "Requirement already satisfied: tqdm in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from sentence-transformers>=3.0.0) (4.66.4)\n", + "Requirement already satisfied: torch>=1.11.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from sentence-transformers>=3.0.0) (2.3.1)\n", + "Requirement already satisfied: numpy in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from sentence-transformers>=3.0.0) (1.26.4)\n", + "Requirement already satisfied: scikit-learn in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from sentence-transformers>=3.0.0) (1.5.0)\n", + "Requirement already satisfied: scipy in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from sentence-transformers>=3.0.0) (1.13.1)\n", + "Requirement already satisfied: huggingface-hub>=0.15.1 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from sentence-transformers>=3.0.0) (0.23.3)\n", + "Requirement already satisfied: Pillow in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from sentence-transformers>=3.0.0) (10.3.0)\n", + "Requirement already satisfied: filelock in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from huggingface-hub>=0.15.1->sentence-transformers>=3.0.0) (3.14.0)\n", + "Requirement already satisfied: fsspec>=2023.5.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from huggingface-hub>=0.15.1->sentence-transformers>=3.0.0) (2024.6.0)\n", + "Requirement already satisfied: packaging>=20.9 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from huggingface-hub>=0.15.1->sentence-transformers>=3.0.0) (24.0)\n", + "Requirement already satisfied: pyyaml>=5.1 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from huggingface-hub>=0.15.1->sentence-transformers>=3.0.0) (6.0.1)\n", + "Requirement already satisfied: requests in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from huggingface-hub>=0.15.1->sentence-transformers>=3.0.0) (2.32.3)\n", + "Requirement already satisfied: typing-extensions>=3.7.4.3 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from huggingface-hub>=0.15.1->sentence-transformers>=3.0.0) (4.12.1)\n", + "Requirement already satisfied: sympy in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from torch>=1.11.0->sentence-transformers>=3.0.0) (1.12.1)\n", + "Requirement already satisfied: networkx in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from torch>=1.11.0->sentence-transformers>=3.0.0) (3.2.1)\n", + "Requirement already satisfied: jinja2 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from torch>=1.11.0->sentence-transformers>=3.0.0) (3.1.4)\n", + "Requirement already satisfied: regex!=2019.12.17 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from transformers<5.0.0,>=4.34.0->sentence-transformers>=3.0.0) (2024.5.15)\n", + "Requirement already satisfied: tokenizers<0.20,>=0.19 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from transformers<5.0.0,>=4.34.0->sentence-transformers>=3.0.0) (0.19.1)\n", + "Requirement already satisfied: safetensors>=0.4.1 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from transformers<5.0.0,>=4.34.0->sentence-transformers>=3.0.0) (0.4.3)\n", + "Requirement already satisfied: joblib>=1.2.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from scikit-learn->sentence-transformers>=3.0.0) (1.4.2)\n", + "Requirement already satisfied: threadpoolctl>=3.1.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from scikit-learn->sentence-transformers>=3.0.0) (3.5.0)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from jinja2->torch>=1.11.0->sentence-transformers>=3.0.0) (2.1.5)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from requests->huggingface-hub>=0.15.1->sentence-transformers>=3.0.0) (3.3.2)\n", + "Requirement already satisfied: idna<4,>=2.5 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from requests->huggingface-hub>=0.15.1->sentence-transformers>=3.0.0) (3.7)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from requests->huggingface-hub>=0.15.1->sentence-transformers>=3.0.0) (1.26.18)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from requests->huggingface-hub>=0.15.1->sentence-transformers>=3.0.0) (2024.6.2)\n", + "Requirement already satisfied: mpmath<1.4.0,>=1.1.0 in /Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages (from sympy->torch>=1.11.0->sentence-transformers>=3.0.0) (1.3.0)\n" + ] }, { - "cell_type": "markdown", - "metadata": { - "id": "yL8nuJdWO-qa" - }, - "source": [ - "> `InMemoryDocumentStore` is the simplest DocumentStore to get started with. It requires no external dependencies and it's a good option for smaller projects and debugging. But it doesn't scale up so well to larger Document collections, so it's not a good choice for production systems. To learn more about the different types of external databases that Haystack supports, see [DocumentStore Integrations](https://haystack.deepset.ai/integrations?type=Document+Store)." - ] - }, + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49m/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip\u001b[0m\n" + ] + } + ], + "source": [ + "%%bash\n", + "\n", + "pip install haystack-ai==2.8.0\n", + "pip install \"datasets>=2.6.1\"\n", + "pip install \"sentence-transformers>=3.0.0\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Wl_jYERtO-qa" + }, + "source": [ + "### Enabling Telemetry\n", + "\n", + "Knowing you're using this tutorial helps us decide where to invest our efforts to build a better product but you can always opt out by commenting the following line. See [Telemetry](https://docs.haystack.deepset.ai/docs/enabling-telemetry) for more details." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "A76B4S49O-qa" + }, + "outputs": [ { - "cell_type": "markdown", - "metadata": { - "id": "XvLVaFHTO-qb" - }, - "source": [ - "The DocumentStore is now ready. Now it's time to fill it with some Documents." - ] + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", + " from .autonotebook import tqdm as notebook_tqdm\n" + ] + } + ], + "source": [ + "from haystack.telemetry import tutorial_running\n", + "\n", + "tutorial_running(27)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_lvfew16O-qa" + }, + "source": [ + "## Fetching and Indexing Documents\n", + "\n", + "You'll start creating your question answering system by downloading the data and indexing the data with its embeddings to a DocumentStore. \n", + "\n", + "In this tutorial, you will take a simple approach to writing documents and their embeddings into the DocumentStore. For a full indexing pipeline with preprocessing, cleaning and splitting, check out our tutorial on [Preprocessing Different File Types](https://haystack.deepset.ai/tutorials/30_file_type_preprocessing_index_pipeline).\n", + "\n", + "\n", + "### Initializing the DocumentStore\n", + "\n", + "Initialize a DocumentStore to index your documents. A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. In this tutorial, you'll be using the `InMemoryDocumentStore`." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "CbVN-s5LO-qa" + }, + "outputs": [], + "source": [ + "from haystack.document_stores.in_memory import InMemoryDocumentStore\n", + "\n", + "document_store = InMemoryDocumentStore()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yL8nuJdWO-qa" + }, + "source": [ + "> `InMemoryDocumentStore` is the simplest DocumentStore to get started with. It requires no external dependencies and it's a good option for smaller projects and debugging. But it doesn't scale up so well to larger Document collections, so it's not a good choice for production systems. To learn more about the different types of external databases that Haystack supports, see [DocumentStore Integrations](https://haystack.deepset.ai/integrations?type=Document+Store)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XvLVaFHTO-qb" + }, + "source": [ + "The DocumentStore is now ready. Now it's time to fill it with some Documents." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HryYZP9ZO-qb" + }, + "source": [ + "### Fetch the Data\n", + "\n", + "You'll use the Wikipedia pages of [Seven Wonders of the Ancient World](https://en.wikipedia.org/wiki/Wonders_of_the_World) as Documents. We preprocessed the data and uploaded to a Hugging Face Space: [Seven Wonders](https://huggingface.co/datasets/bilgeyucel/seven-wonders). Thus, you don't need to perform any additional cleaning or splitting.\n", + "\n", + "Fetch the data and convert it into Haystack Documents:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" }, - { - "cell_type": "markdown", - "metadata": { - "id": "HryYZP9ZO-qb" - }, - "source": [ - "### Fetch the Data\n", - "\n", - "You'll use the Wikipedia pages of [Seven Wonders of the Ancient World](https://en.wikipedia.org/wiki/Wonders_of_the_World) as Documents. We preprocessed the data and uploaded to a Hugging Face Space: [Seven Wonders](https://huggingface.co/datasets/bilgeyucel/seven-wonders). Thus, you don't need to perform any additional cleaning or splitting.\n", - "\n", - "Fetch the data and convert it into Haystack Documents:" - ] + "id": "INdC3WvLO-qb", + "outputId": "1af43d0f-2999-4de4-d152-b3cca9fb49e6" + }, + "outputs": [], + "source": [ + "from datasets import load_dataset\n", + "from haystack import Document\n", + "\n", + "dataset = load_dataset(\"bilgeyucel/seven-wonders\", split=\"train\")\n", + "docs = [Document(content=doc[\"content\"], meta=doc[\"meta\"]) for doc in dataset]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "czMjWwnxPA-3" + }, + "source": [ + "### Initalize a Document Embedder\n", + "\n", + "To store your data in the DocumentStore with embeddings, initialize a [SentenceTransformersDocumentEmbedder](https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder) with the model name and call `warm_up()` to download the embedding model.\n", + "\n", + "> If you'd like, you can use a different [Embedder](https://docs.haystack.deepset.ai/docs/embedders) for your documents." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" }, + "id": "EUmAH9sEn3R7", + "outputId": "ee54b59b-4d4a-45eb-c1a9-0b7b248f1dd4" + }, + "outputs": [ { - "cell_type": "code", - "execution_count": 4, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "INdC3WvLO-qb", - "outputId": "1af43d0f-2999-4de4-d152-b3cca9fb49e6" - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: \n", - "The secret `HF_TOKEN` does not exist in your Colab secrets.\n", - "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n", - "You will be able to reuse this secret in all of your notebooks.\n", - "Please note that authentication is recommended but still optional to access public models or datasets.\n", - " warnings.warn(\n" - ] - } - ], - "source": [ - "from datasets import load_dataset\n", - "from haystack import Document\n", - "\n", - "dataset = load_dataset(\"bilgeyucel/seven-wonders\", split=\"train\")\n", - "docs = [Document(content=doc[\"content\"], meta=doc[\"meta\"]) for doc in dataset]" - ] + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/amna.mubashar/Library/Python/3.9/lib/python/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n", + " warnings.warn(\n" + ] + } + ], + "source": [ + "from haystack.components.embedders import SentenceTransformersDocumentEmbedder\n", + "\n", + "doc_embedder = SentenceTransformersDocumentEmbedder(model=\"sentence-transformers/all-MiniLM-L6-v2\")\n", + "doc_embedder.warm_up()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9y4iJE_SrS4K" + }, + "source": [ + "### Write Documents to the DocumentStore\n", + "\n", + "Run the `doc_embedder` with the Documents. The embedder will create embeddings for each document and save these embeddings in Document object's `embedding` field. Then, you can write the Documents to the DocumentStore with `write_documents()` method." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 66, + "referenced_widgets": [ + "7d482188c12d4a7886f20a65d3402c59", + "2a3ec74419ae4a02ac0210db66133415", + "ddeff9a822404adbbc3cad97a939bc0c", + "36d341ab3a044709b5af2e8ab97559bc", + "88fc33e1ab78405e911b5eafa512c935", + "91e5d4b0ede848319ef0d3b558d57d19", + "d2428c21707d43f2b6f07bfafbace8bb", + "7fdb2c859e454e72888709a835f7591e", + "6b8334e071a3438397ba6435aac69f58", + "5f5cfa425cac4d37b2ea29e53b4ed900", + "3c59a82dac5c476b9a3e3132094e1702" + ] }, + "id": "ETpQKftLplqh", + "outputId": "b9c8658c-90c8-497c-e765-97487c0daf8e" + }, + "outputs": [ { - "cell_type": "markdown", - "metadata": { - "id": "czMjWwnxPA-3" - }, - "source": [ - "### Initalize a Document Embedder\n", - "\n", - "To store your data in the DocumentStore with embeddings, initialize a [SentenceTransformersDocumentEmbedder](https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder) with the model name and call `warm_up()` to download the embedding model.\n", - "\n", - "> If you'd like, you can use a different [Embedder](https://docs.haystack.deepset.ai/docs/embedders) for your documents." - ] + "name": "stderr", + "output_type": "stream", + "text": [ + "Batches: 100%|██████████| 5/5 [00:01<00:00, 3.09it/s]\n" + ] }, { - "cell_type": "code", - "execution_count": 5, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "EUmAH9sEn3R7", - "outputId": "ee54b59b-4d4a-45eb-c1a9-0b7b248f1dd4" - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.10/dist-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()\n", - " return self.fget.__get__(instance, owner)()\n" - ] - } - ], - "source": [ - "from haystack.components.embedders import SentenceTransformersDocumentEmbedder\n", - "\n", - "doc_embedder = SentenceTransformersDocumentEmbedder(model=\"sentence-transformers/all-MiniLM-L6-v2\")\n", - "doc_embedder.warm_up()" + "data": { + "text/plain": [ + "151" ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "docs_with_embeddings = doc_embedder.run(docs)\n", + "document_store.write_documents(docs_with_embeddings[\"documents\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IdojTxg6uubn" + }, + "source": [ + "## Building the RAG Pipeline\n", + "\n", + "The next step is to build a [Pipeline](https://docs.haystack.deepset.ai/docs/pipelines) to generate answers for the user query following the RAG approach. To create the pipeline, you first need to initialize each component, add them to your pipeline, and connect them." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0uyV6-u-u56P" + }, + "source": [ + "### Initialize a Text Embedder\n", + "\n", + "Initialize a text embedder to create an embedding for the user query. The created embedding will later be used by the Retriever to retrieve relevant documents from the DocumentStore.\n", + "\n", + "> ⚠️ Notice that you used `sentence-transformers/all-MiniLM-L6-v2` model to create embeddings for your documents before. This is why you need to use the same model to embed the user queries." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "LyJY2yW628dl" + }, + "outputs": [], + "source": [ + "from haystack.components.embedders import SentenceTransformersTextEmbedder\n", + "\n", + "text_embedder = SentenceTransformersTextEmbedder(model=\"sentence-transformers/all-MiniLM-L6-v2\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0_cj-5m-O-qb" + }, + "source": [ + "### Initialize the Retriever\n", + "\n", + "Initialize a [InMemoryEmbeddingRetriever](https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever) and make it use the InMemoryDocumentStore you initialized earlier in this tutorial. This Retriever will get the relevant documents to the query." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "-uo-6fjiO-qb" + }, + "outputs": [], + "source": [ + "from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever\n", + "\n", + "retriever = InMemoryEmbeddingRetriever(document_store)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6CEuQpB7O-qb" + }, + "source": [ + "### Define a Template Prompt\n", + "\n", + "Create a custom prompt for a generative question answering task using the RAG approach. The prompt should take in two parameters: `documents`, which are retrieved from a document store, and a `question` from the user. Use the Jinja2 looping syntax to combine the content of the retrieved documents in the prompt.\n", + "\n", + "Next, initialize a [PromptBuilder](https://docs.haystack.deepset.ai/docs/promptbuilder) instance with your prompt template. The PromptBuilder, when given the necessary values, will automatically fill in the variable values and generate a complete prompt. This approach allows for a more tailored and effective question-answering experience." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "id": "ObahTh45FqOT" + }, + "outputs": [], + "source": [ + "from haystack.components.builders import ChatPromptBuilder\n", + "from haystack.dataclasses import ChatMessage\n", + "\n", + "template = [ChatMessage.from_user(\"\"\"\n", + "Given the following information, answer the question.\n", + "\n", + "Context:\n", + "{% for document in documents %}\n", + " {{ document.content }}\n", + "{% endfor %}\n", + "\n", + "Question: {{question}}\n", + "Answer:\n", + "\"\"\")]\n", + "\n", + "prompt_builder = ChatPromptBuilder(template=template)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HR14lbfcFtXj" + }, + "source": [ + "### Initialize a ChatGenerator\n", + "\n", + "\n", + "ChatGenerators are the components that interact with large language models (LLMs). Now, set `OPENAI_API_KEY` environment variable and initialize a [OpenAIChatGenerator](https://docs.haystack.deepset.ai/docs/OpenAIChatGenerator) that can communicate with OpenAI GPT models. As you initialize, provide a model name:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" }, - { - "cell_type": "markdown", - "metadata": { - "id": "9y4iJE_SrS4K" - }, - "source": [ - "### Write Documents to the DocumentStore\n", - "\n", - "Run the `doc_embedder` with the Documents. The embedder will create embeddings for each document and save these embeddings in Document object's `embedding` field. Then, you can write the Documents to the DocumentStore with `write_documents()` method." - ] + "id": "SavE_FAqfApo", + "outputId": "1afbf2e8-ae63-41ff-c37f-5123b2103356" + }, + "outputs": [], + "source": [ + "import os\n", + "from getpass import getpass\n", + "from haystack.components.generators.chat import OpenAIChatGenerator\n", + "\n", + "if \"OPENAI_API_KEY\" not in os.environ:\n", + " os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter OpenAI API key:\")\n", + "chat_generator = OpenAIChatGenerator(model=\"gpt-4o-mini\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nenbo2SvycHd" + }, + "source": [ + "> You can replace `OpenAIChatGenerator` in your pipeline with another `ChatGenerator`. Check out the full list of chat generators [here](https://docs.haystack.deepset.ai/docs/generators)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1bfHwOQwycHe" + }, + "source": [ + "### Build the Pipeline\n", + "\n", + "To build a pipeline, add all components to your pipeline and connect them. Create connections from `text_embedder`'s \"embedding\" output to \"query_embedding\" input of `retriever`, from `retriever` to `prompt_builder` and from `prompt_builder` to `llm`. Explicitly connect the output of `retriever` with \"documents\" input of the `prompt_builder` to make the connection obvious as `prompt_builder` has two inputs (\"documents\" and \"question\").\n", + "\n", + "For more information on pipelines and creating connections, refer to [Creating Pipelines](https://docs.haystack.deepset.ai/docs/creating-pipelines) documentation." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 }, + "id": "f6NFmpjEO-qb", + "outputId": "89fd1b48-5189-4401-9cf8-15f55c503676" + }, + "outputs": [], + "source": [ + "from haystack import Pipeline\n", + "\n", + "basic_rag_pipeline = Pipeline()\n", + "# Add components to your pipeline\n", + "basic_rag_pipeline.add_component(\"text_embedder\", text_embedder)\n", + "basic_rag_pipeline.add_component(\"retriever\", retriever)\n", + "basic_rag_pipeline.add_component(\"prompt_builder\", prompt_builder)\n", + "basic_rag_pipeline.add_component(\"llm\", chat_generator)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ { - "cell_type": "code", - "execution_count": 6, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 66, - "referenced_widgets": [ - "7d482188c12d4a7886f20a65d3402c59", - "2a3ec74419ae4a02ac0210db66133415", - "ddeff9a822404adbbc3cad97a939bc0c", - "36d341ab3a044709b5af2e8ab97559bc", - "88fc33e1ab78405e911b5eafa512c935", - "91e5d4b0ede848319ef0d3b558d57d19", - "d2428c21707d43f2b6f07bfafbace8bb", - "7fdb2c859e454e72888709a835f7591e", - "6b8334e071a3438397ba6435aac69f58", - "5f5cfa425cac4d37b2ea29e53b4ed900", - "3c59a82dac5c476b9a3e3132094e1702" - ] - }, - "id": "ETpQKftLplqh", - "outputId": "b9c8658c-90c8-497c-e765-97487c0daf8e" - }, - "outputs": [ - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7d482188c12d4a7886f20a65d3402c59", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "Batches: 0%| | 0/5 [00:00\n", + "🚅 Components\n", + " - text_embedder: SentenceTransformersTextEmbedder\n", + " - retriever: InMemoryEmbeddingRetriever\n", + " - prompt_builder: ChatPromptBuilder\n", + " - llm: OpenAIChatGenerator\n", + "🛤️ Connections\n", + " - text_embedder.embedding -> retriever.query_embedding (List[float])\n", + " - retriever.documents -> prompt_builder.documents (List[Document])\n", + " - prompt_builder.prompt -> llm.messages (List[ChatMessage])" ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Now, connect the components to each other\n", + "basic_rag_pipeline.connect(\"text_embedder.embedding\", \"retriever.query_embedding\")\n", + "basic_rag_pipeline.connect(\"retriever\", \"prompt_builder\")\n", + "basic_rag_pipeline.connect(\"prompt_builder.prompt\", \"llm.messages\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6NqyLhx7O-qc" + }, + "source": [ + "That's it! Your RAG pipeline is ready to generate answers to questions!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DBAyF5tVO-qc" + }, + "source": [ + "## Asking a Question\n", + "\n", + "When asking a question, use the `run()` method of the pipeline. Make sure to provide the question to both the `text_embedder` and the `prompt_builder`. This ensures that the `{{question}}` variable in the template prompt gets replaced with your specific question." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 86, + "referenced_widgets": [ + "4e6e97b6d54f4f80bb7e8b25aba8e616", + "1a820c06a7a049d8b6c9ff300284d06e", + "58ff4e0603a74978a134f63533859be5", + "8bdb8bfae31d4f4cb6c3b0bf43120eed", + "39a68d9a5c274e2dafaa2d1f86eea768", + "d0cfe5dacdfc431a91b4c4741123e2d0", + "e7f1e1a14bb740d18827dd78bbe7b2e3", + "3fda06f905b445a488efdd2dd08c0939", + "2bc341a780f7498ba9cd475468841bb5", + "d7218475e23b420a8c03d00ca4ab8718", + "a694abaf765f4d1b82fa0138e59c6793" + ] }, + "id": "Vnt283M5O-qc", + "outputId": "d2843a73-3ad5-4daa-8d1e-a58de7aa2bb0" + }, + "outputs": [ { - "cell_type": "markdown", - "metadata": { - "id": "IdojTxg6uubn" - }, - "source": [ - "## Building the RAG Pipeline\n", - "\n", - "The next step is to build a [Pipeline](https://docs.haystack.deepset.ai/docs/pipelines) to generate answers for the user query following the RAG approach. To create the pipeline, you first need to initialize each component, add them to your pipeline, and connect them." - ] + "name": "stderr", + "output_type": "stream", + "text": [ + "Batches: 100%|██████████| 1/1 [00:00<00:00, 4.19it/s]\n" + ] }, { - "cell_type": "markdown", - "metadata": { - "id": "0uyV6-u-u56P" - }, - "source": [ - "### Initialize a Text Embedder\n", - "\n", - "Initialize a text embedder to create an embedding for the user query. The created embedding will later be used by the Retriever to retrieve relevant documents from the DocumentStore.\n", - "\n", - "> ⚠️ Notice that you used `sentence-transformers/all-MiniLM-L6-v2` model to create embeddings for your documents before. This is why you need to use the same model to embed the user queries." - ] + "name": "stdout", + "output_type": "stream", + "text": [ + "ChatMessage(content='The Colossus of Rhodes was a statue of the Greek sun-god Helios, thought to be approximately 70 cubits, or about 33 meters (108 feet) tall. Although no definitive description of its appearance survives, ancient accounts suggest it featured a standard rendering of Helios from that era. It likely had curly hair with bronze or silver spikes representing flames radiating from his head, similar to depictions found on Rhodian coins.\\n\\nThe statue was constructed using iron tie bars and brass plates, forming a skin that covered a core filled with stone blocks. The details of the face and head followed common artistic conventions of the time, with the design possibly reflecting a pose of shielding the eyes with one hand, resembling the way a person looks toward the sun. While it was built to celebrate the victory of Rhodes over an attacking army, its exact posture and additional details remain subjects of speculation due to the lack of surviving descriptions.', role=, name=None, meta={'model': 'gpt-4o-mini-2024-07-18', 'index': 0, 'finish_reason': 'stop', 'usage': {'completion_tokens': 187, 'prompt_tokens': 2405, 'total_tokens': 2592, 'prompt_tokens_details': {'cached_tokens': 2176, 'audio_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}}})\n" + ] + } + ], + "source": [ + "question = \"What does Rhodes Statue look like?\"\n", + "\n", + "response = basic_rag_pipeline.run({\"text_embedder\": {\"text\": question}, \"prompt_builder\": {\"question\": question}})\n", + "\n", + "print(response[\"llm\"][\"replies\"][0])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IWQN-aoGO-qc" + }, + "source": [ + "Here are some other example questions to test:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "id": "_OHUQ5xxO-qc" + }, + "outputs": [], + "source": [ + "examples = [\n", + " \"Where is Gardens of Babylon?\",\n", + " \"Why did people build Great Pyramid of Giza?\",\n", + " \"What does Rhodes Statue look like?\",\n", + " \"Why did people visit the Temple of Artemis?\",\n", + " \"What is the importance of Colossus of Rhodes?\",\n", + " \"What happened to the Tomb of Mausolus?\",\n", + " \"How did Colossus of Rhodes collapse?\",\n", + "]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XueCK3y4O-qc" + }, + "source": [ + "## What's next\n", + "\n", + "🎉 Congratulations! You've learned how to create a generative QA system for your documents with the RAG approach.\n", + "\n", + "If you liked this tutorial, you may also enjoy:\n", + "- [Filtering Documents with Metadata](https://haystack.deepset.ai/tutorials/31_metadata_filtering)\n", + "- [Preprocessing Different File Types](https://haystack.deepset.ai/tutorials/30_file_type_preprocessing_index_pipeline)\n", + "- [Creating a Hybrid Retrieval Pipeline](https://haystack.deepset.ai/tutorials/33_hybrid_retrieval)\n", + "\n", + "To stay up to date on the latest Haystack developments, you can [subscribe to our newsletter](https://landing.deepset.ai/haystack-community-updates) and [join Haystack discord community](https://discord.gg/haystack).\n", + "\n", + "Thanks for reading!" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + }, + "orig_nbformat": 4, + "vscode": { + "interpreter": { + "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" + } + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "1a820c06a7a049d8b6c9ff300284d06e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d0cfe5dacdfc431a91b4c4741123e2d0", + "placeholder": "​", + "style": "IPY_MODEL_e7f1e1a14bb740d18827dd78bbe7b2e3", + "value": "Batches: 100%" + } }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": { - "id": "LyJY2yW628dl" - }, - "outputs": [], - "source": [ - "from haystack.components.embedders import SentenceTransformersTextEmbedder\n", - "\n", - "text_embedder = SentenceTransformersTextEmbedder(model=\"sentence-transformers/all-MiniLM-L6-v2\")" - ] + "2a3ec74419ae4a02ac0210db66133415": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_91e5d4b0ede848319ef0d3b558d57d19", + "placeholder": "​", + "style": "IPY_MODEL_d2428c21707d43f2b6f07bfafbace8bb", + "value": "Batches: 100%" + } }, - { - "cell_type": "markdown", - "metadata": { - "id": "0_cj-5m-O-qb" - }, - "source": [ - "### Initialize the Retriever\n", - "\n", - "Initialize a [InMemoryEmbeddingRetriever](https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever) and make it use the InMemoryDocumentStore you initialized earlier in this tutorial. This Retriever will get the relevant documents to the query." - ] + "2bc341a780f7498ba9cd475468841bb5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": { - "id": "-uo-6fjiO-qb" - }, - "outputs": [], - "source": [ - "from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever\n", - "\n", - "retriever = InMemoryEmbeddingRetriever(document_store)" - ] + "36d341ab3a044709b5af2e8ab97559bc": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_5f5cfa425cac4d37b2ea29e53b4ed900", + "placeholder": "​", + "style": "IPY_MODEL_3c59a82dac5c476b9a3e3132094e1702", + "value": " 5/5 [00:01<00:00,  3.35it/s]" + } }, - { - "cell_type": "markdown", - "metadata": { - "id": "6CEuQpB7O-qb" - }, - "source": [ - "### Define a Template Prompt\n", - "\n", - "Create a custom prompt for a generative question answering task using the RAG approach. The prompt should take in two parameters: `documents`, which are retrieved from a document store, and a `question` from the user. Use the Jinja2 looping syntax to combine the content of the retrieved documents in the prompt.\n", - "\n", - "Next, initialize a [PromptBuilder](https://docs.haystack.deepset.ai/docs/promptbuilder) instance with your prompt template. The PromptBuilder, when given the necessary values, will automatically fill in the variable values and generate a complete prompt. This approach allows for a more tailored and effective question-answering experience." - ] + "39a68d9a5c274e2dafaa2d1f86eea768": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": { - "id": "ObahTh45FqOT" - }, - "outputs": [], - "source": [ - "from haystack.components.builders import PromptBuilder\n", - "\n", - "template = \"\"\"\n", - "Given the following information, answer the question.\n", - "\n", - "Context:\n", - "{% for document in documents %}\n", - " {{ document.content }}\n", - "{% endfor %}\n", - "\n", - "Question: {{question}}\n", - "Answer:\n", - "\"\"\"\n", - "\n", - "prompt_builder = PromptBuilder(template=template)" - ] + "3c59a82dac5c476b9a3e3132094e1702": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } }, - { - "cell_type": "markdown", - "metadata": { - "id": "HR14lbfcFtXj" - }, - "source": [ - "### Initialize a Generator\n", - "\n", - "\n", - "Generators are the components that interact with large language models (LLMs). Now, set `OPENAI_API_KEY` environment variable and initialize a [OpenAIGenerator](https://docs.haystack.deepset.ai/docs/OpenAIGenerator) that can communicate with OpenAI GPT models. As you initialize, provide a model name:" - ] + "3fda06f905b445a488efdd2dd08c0939": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "SavE_FAqfApo", - "outputId": "1afbf2e8-ae63-41ff-c37f-5123b2103356" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Enter OpenAI API key: ··········\n" - ] - } + "4e6e97b6d54f4f80bb7e8b25aba8e616": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_1a820c06a7a049d8b6c9ff300284d06e", + "IPY_MODEL_58ff4e0603a74978a134f63533859be5", + "IPY_MODEL_8bdb8bfae31d4f4cb6c3b0bf43120eed" ], - "source": [ - "import os\n", - "from getpass import getpass\n", - "from haystack.components.generators import OpenAIGenerator\n", - "\n", - "if \"OPENAI_API_KEY\" not in os.environ:\n", - " os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter OpenAI API key:\")\n", - "generator = OpenAIGenerator(model=\"gpt-4o-mini\")" - ] + "layout": "IPY_MODEL_39a68d9a5c274e2dafaa2d1f86eea768" + } }, - { - "cell_type": "markdown", - "metadata": { - "id": "nenbo2SvycHd" - }, - "source": [ - "> You can replace `OpenAIGenerator` in your pipeline with another `Generator`. Check out the full list of generators [here](https://docs.haystack.deepset.ai/docs/generators)." - ] + "58ff4e0603a74978a134f63533859be5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3fda06f905b445a488efdd2dd08c0939", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_2bc341a780f7498ba9cd475468841bb5", + "value": 1 + } }, - { - "cell_type": "markdown", - "metadata": { - "id": "1bfHwOQwycHe" - }, - "source": [ - "### Build the Pipeline\n", - "\n", - "To build a pipeline, add all components to your pipeline and connect them. Create connections from `text_embedder`'s \"embedding\" output to \"query_embedding\" input of `retriever`, from `retriever` to `prompt_builder` and from `prompt_builder` to `llm`. Explicitly connect the output of `retriever` with \"documents\" input of the `prompt_builder` to make the connection obvious as `prompt_builder` has two inputs (\"documents\" and \"question\").\n", - "\n", - "For more information on pipelines and creating connections, refer to [Creating Pipelines](https://docs.haystack.deepset.ai/docs/creating-pipelines) documentation." - ] + "5f5cfa425cac4d37b2ea29e53b4ed900": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 1000 - }, - "id": "f6NFmpjEO-qb", - "outputId": "89fd1b48-5189-4401-9cf8-15f55c503676" - }, - "outputs": [ - { - "data": { - "image/jpeg": "", - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } + "6b8334e071a3438397ba6435aac69f58": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "7d482188c12d4a7886f20a65d3402c59": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_2a3ec74419ae4a02ac0210db66133415", + "IPY_MODEL_ddeff9a822404adbbc3cad97a939bc0c", + "IPY_MODEL_36d341ab3a044709b5af2e8ab97559bc" ], - "source": [ - "from haystack import Pipeline\n", - "\n", - "basic_rag_pipeline = Pipeline()\n", - "# Add components to your pipeline\n", - "basic_rag_pipeline.add_component(\"text_embedder\", text_embedder)\n", - "basic_rag_pipeline.add_component(\"retriever\", retriever)\n", - "basic_rag_pipeline.add_component(\"prompt_builder\", prompt_builder)\n", - "basic_rag_pipeline.add_component(\"llm\", generator)\n", - "\n", - "# Now, connect the components to each other\n", - "basic_rag_pipeline.connect(\"text_embedder.embedding\", \"retriever.query_embedding\")\n", - "basic_rag_pipeline.connect(\"retriever\", \"prompt_builder.documents\")\n", - "basic_rag_pipeline.connect(\"prompt_builder\", \"llm\")" - ] + "layout": "IPY_MODEL_88fc33e1ab78405e911b5eafa512c935" + } }, - { - "cell_type": "markdown", - "metadata": { - "id": "6NqyLhx7O-qc" - }, - "source": [ - "That's it! Your RAG pipeline is ready to generate answers to questions!" - ] + "7fdb2c859e454e72888709a835f7591e": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } }, - { - "cell_type": "markdown", - "metadata": { - "id": "DBAyF5tVO-qc" - }, - "source": [ - "## Asking a Question\n", - "\n", - "When asking a question, use the `run()` method of the pipeline. Make sure to provide the question to both the `text_embedder` and the `prompt_builder`. This ensures that the `{{question}}` variable in the template prompt gets replaced with your specific question." - ] + "88fc33e1ab78405e911b5eafa512c935": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 86, - "referenced_widgets": [ - "4e6e97b6d54f4f80bb7e8b25aba8e616", - "1a820c06a7a049d8b6c9ff300284d06e", - "58ff4e0603a74978a134f63533859be5", - "8bdb8bfae31d4f4cb6c3b0bf43120eed", - "39a68d9a5c274e2dafaa2d1f86eea768", - "d0cfe5dacdfc431a91b4c4741123e2d0", - "e7f1e1a14bb740d18827dd78bbe7b2e3", - "3fda06f905b445a488efdd2dd08c0939", - "2bc341a780f7498ba9cd475468841bb5", - "d7218475e23b420a8c03d00ca4ab8718", - "a694abaf765f4d1b82fa0138e59c6793" - ] - }, - "id": "Vnt283M5O-qc", - "outputId": "d2843a73-3ad5-4daa-8d1e-a58de7aa2bb0" - }, - "outputs": [ - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4e6e97b6d54f4f80bb7e8b25aba8e616", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "Batches: 0%| | 0/1 [00:00