diff --git a/notebooks/254-llm-chatbot/254-rag-chatbot.ipynb b/notebooks/254-llm-chatbot/254-rag-chatbot.ipynb
index 07f7054f55d..feb37f8404b 100644
--- a/notebooks/254-llm-chatbot/254-rag-chatbot.ipynb
+++ b/notebooks/254-llm-chatbot/254-rag-chatbot.ipynb
@@ -83,7 +83,7 @@
     "\"accelerate\"\\\n",
     "\"openvino-nightly\"\\\n",
     "\"gradio\"\\\n",
-    "\"onnx\" \"einops\" \"transformers_stream_generator\" \"tiktoken\" \"transformers>=4.38.1\" \"bitsandbytes\" \"chromadb\" \"sentence_transformers\" \"langchain>=0.1.7\" \"langchainhub\" \"unstructured\" \"scikit-learn\" \"python-docx\" \"pdfminer.six\""
+    "\"onnx\" \"chromadb\" \"sentence_transformers\" \"langchain>=0.1.7\" \"langchainhub\" \"transformers>=4.37.0\" \"unstructured\" \"scikit-learn\" \"python-docx\" \"pdfminer.six\" \"bitsandbytes\""
    ]
   },
   {
@@ -122,6 +122,7 @@
     "    except OSError:\n",
     "        notebook_login()\n",
     "```\n",
+    "*  **mini-cpm-2b-dpo** - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. After Direct Preference Optimization (DPO) fine-tuning, MiniCPM outperforms many popular 7b, 13b and 70b models. More details can be found in [model_card](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp16).\n",
     "* **red-pajama-3b-chat** - A 2.8B parameter pre-trained language model based on GPT-NEOX architecture. It was developed by Together Computer and leaders from the open-source AI community. The model is fine-tuned on OASST1 and Dolly2 datasets to enhance chatting ability. More details about model can be found in [HuggingFace model card](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1).\n",
     "*  **gemma-7b-it** - Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. This model is instruction-tuned version of 7B parameters model. More details about model can be found in [model card](https://huggingface.co/google/gemma-7b-it).\n",
     ">**Note**: run model with demo, you will need to accept license agreement. \n",
@@ -155,7 +156,7 @@
     "    except OSError:\n",
     "        notebook_login()\n",
     "```\n",
-    "* **qwen1.5-1.8b-chat/qwen1.5-7b-chat** - Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. Qwen1.5 is a language model series including decoder language models of different model sizes. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention. You can find more details about model in the [model repository](https://huggingface.co/Qwen).\n",
+    "* **qwen1.5-0.5b-chat/qwen1.5-1.8b-chat/qwen1.5-7b-chat** - Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. Qwen1.5 is a language model series including decoder language models of different model sizes. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention. You can find more details about model in the [model repository](https://huggingface.co/Qwen).\n",
     "* **qwen-7b-chat** - Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. For more details about Qwen, please refer to the [GitHub](https://github.com/QwenLM/Qwen) code repository.\n",
     "* **mpt-7b-chat** - MPT-7B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference. These architectural changes include performance-optimized layer implementations and the elimination of context length limits by replacing positional embeddings with Attention with Linear Biases ([ALiBi](https://arxiv.org/abs/2108.12409)). Thanks to these modifications, MPT models can be trained with high throughput efficiency and stable convergence. MPT-7B-chat is a chatbot-like model for dialogue generation. It was built by finetuning MPT-7B on the [ShareGPT-Vicuna](https://huggingface.co/datasets/jeffwan/sharegpt_vicuna), [HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3), [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca), [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf), and [Evol-Instruct](https://huggingface.co/datasets/victor123/evol_instruct_70k) datasets. More details about the model can be found in [blog post](https://www.mosaicml.com/blog/mpt-7b), [repository](https://github.com/mosaicml/llm-foundry/) and [HuggingFace model card](https://huggingface.co/mosaicml/mpt-7b-chat).\n",
     "* **chatglm3-6b** - ChatGLM3-6B is the latest open-source model in the ChatGLM series. While retaining many excellent features such as smooth dialogue and low deployment threshold from the previous two generations, ChatGLM3-6B employs a more diverse training dataset, more sufficient training steps, and a more reasonable training strategy. ChatGLM3-6B adopts a newly designed [Prompt format](https://github.com/THUDM/ChatGLM3/blob/main/PROMPT_en.md), in addition to the normal multi-turn dialogue. You can find more details about model in the [model card](https://huggingface.co/THUDM/chatglm3-6b)\n",
@@ -164,7 +165,8 @@
     "* **neural-chat-7b-v3-1** - Mistral-7b model fine-tuned using Intel Gaudi. The model fine-tuned on the open source dataset [Open-Orca/SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) and aligned with [Direct Preference Optimization (DPO) algorithm](https://arxiv.org/abs/2305.18290). More details can be found in [model card](https://huggingface.co/Intel/neural-chat-7b-v3-1) and [blog post](https://medium.com/@NeuralCompressor/the-practice-of-supervised-finetuning-and-direct-preference-optimization-on-habana-gaudi2-a1197d8a3cd3).\n",
     "* **notus-7b-v1** - Notus is a collection of fine-tuned models using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). and related [RLHF](https://huggingface.co/blog/rlhf) techniques. This model is the first version, fine-tuned with DPO over zephyr-7b-sft. Following a data-first approach, the only difference between Notus-7B-v1 and Zephyr-7B-beta is the preference dataset used for dDPO. Proposed approach for dataset creation helps to effectively fine-tune Notus-7b that surpasses Zephyr-7B-beta and Claude 2 on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). More details about model can be found in [model card](https://huggingface.co/argilla/notus-7b-v1).\n",
     "* **youri-7b-chat** - Youri-7b-chat is a Llama2 based model. [Rinna Co., Ltd.](https://rinna.co.jp/) conducted further pre-training for the Llama2 model with a mixture of English and Japanese datasets to improve Japanese task capability. The model is publicly released on Hugging Face hub. You can find detailed information at the [rinna/youri-7b-chat project page](https://huggingface.co/rinna/youri-7b). \n",
-    "* **baichuan2-7b-chat** - Baichuan 2 is the new generation of large-scale open-source language models launched by [Baichuan Intelligence inc](https://www.baichuan-ai.com/home). It is trained on a high-quality corpus with 2.6 trillion tokens and has achieved the best performance in authoritative Chinese and English benchmarks of the same size."
+    "* **baichuan2-7b-chat** - Baichuan 2 is the new generation of large-scale open-source language models launched by [Baichuan Intelligence inc](https://www.baichuan-ai.com/home). It is trained on a high-quality corpus with 2.6 trillion tokens and has achieved the best performance in authoritative Chinese and English benchmarks of the same size.\n",
+    "* **internlm2-chat-1.8b** - InternLM2 is the second generation InternLM series. Compared to the previous generation model, it shows significant improvements in various capabilities, including reasoning, mathematics, and coding. More details about model can be found in [model repository](https://huggingface.co/internlm)."
    ]
   },
   {
@@ -184,15 +186,15 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "2024-03-09 06:18:48.945368: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
-      "2024-03-09 06:18:48.948759: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.\n",
-      "2024-03-09 06:18:48.991023: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
-      "2024-03-09 06:18:48.991056: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
-      "2024-03-09 06:18:48.991087: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
-      "2024-03-09 06:18:48.998911: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.\n",
-      "2024-03-09 06:18:49.000066: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
+      "2024-03-06 07:05:19.617312: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
+      "2024-03-06 07:05:19.620814: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.\n",
+      "2024-03-06 07:05:19.663621: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
+      "2024-03-06 07:05:19.663653: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
+      "2024-03-06 07:05:19.663683: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
+      "2024-03-06 07:05:19.671963: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.\n",
+      "2024-03-06 07:05:19.673938: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
       "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
-      "2024-03-09 06:18:49.793544: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n"
+      "2024-03-06 07:05:20.726709: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n"
      ]
     }
    ],
@@ -240,7 +242,7 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "b450e5be76a2410785bdbc9604116d53",
+       "model_id": "5875d10008c442c38ff1d90da874b8dc",
        "version_major": 2,
        "version_minor": 0
       },
@@ -270,32 +272,32 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 15,
    "id": "184d1678-0e73-4f35-8af5-1a7d291c2e6e",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "e2075555292c4b91a4125bb5e556856a",
+       "model_id": "c8d393ddf227409d84313cde097d9896",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
-       "Dropdown(description='Model:', index=3, options=('tiny-llama-1b-chat', 'gemma-2b-it', 'red-pajama-3b-chat', 'g…"
+       "Dropdown(description='Model:', options=('tiny-llama-1b-chat', 'gemma-2b-it', 'red-pajama-3b-chat', 'gemma-7b-i…"
       ]
      },
-     "execution_count": 3,
+     "execution_count": 15,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "llm_model_ids = [model_id for model_id, model_config in SUPPORTED_LLM_MODELS[model_language.value].items() if model_config.get(\"rag_prompt_template\")]\n",
+    "llm_model_ids = list(SUPPORTED_LLM_MODELS[model_language.value])\n",
     "\n",
     "llm_model_id = widgets.Dropdown(\n",
     "    options=llm_model_ids,\n",
-    "    value=llm_model_ids[0],\n",
+    "    value=llm_model_ids[4],\n",
     "    description=\"Model:\",\n",
     "    disabled=False,\n",
     ")\n",
@@ -305,7 +307,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 16,
    "id": "49ea95f8",
    "metadata": {},
    "outputs": [
@@ -313,7 +315,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Selected LLM model zephyr-7b-beta\n"
+      "Selected LLM model tiny-llama-1b-chat\n"
      ]
     }
    ],
@@ -380,14 +382,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 10,
    "id": "c6a38153",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "7e20021257b145519c5eccff50f93b1d",
+       "model_id": "10a3596a41864effbe8fb9d81723f3ed",
        "version_major": 2,
        "version_minor": 0
       },
@@ -401,7 +403,7 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "75a4ba1905434e77a2147d94b281fcbb",
+       "model_id": "da04e6b87e41474194e2de8219da7303",
        "version_major": 2,
        "version_minor": 0
       },
@@ -415,7 +417,7 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "097690bb77a54eca87851d64b189dda5",
+       "model_id": "0532ba4230d440aeb3f10cd7becf9156",
        "version_major": 2,
        "version_minor": 0
       },
@@ -453,7 +455,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 11,
    "id": "2020d522",
    "metadata": {},
    "outputs": [],
@@ -652,7 +654,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 12,
    "id": "8e127215",
    "metadata": {},
    "outputs": [
@@ -660,7 +662,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Size of model with INT4 compressed weights is 6943.14 MB\n"
+      "Size of model with INT4 compressed weights is 1837.58 MB\n"
      ]
     }
    ],
@@ -696,14 +698,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 17,
    "id": "ff80e6eb-7923-40ef-93d8-5e6c56e50667",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "533cbd205e284549ab4bf3bba2baea41",
+       "model_id": "d7e6f5925ad0446ca94e882a8c6503fc",
        "version_major": 2,
        "version_minor": 0
       },
@@ -711,7 +713,7 @@
        "Dropdown(description='Embedding Model:', options=('all-mpnet-base-v2',), value='all-mpnet-base-v2')"
       ]
      },
-     "execution_count": 8,
+     "execution_count": 17,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -794,12 +796,12 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "a208ab4dd63b4a06b4a1cda727653924",
+       "model_id": "ee9a2eefa59e420693ba647d8d5b70c6",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
-       "Dropdown(description='Device:', options=('CPU', 'GPU', 'AUTO'), value='CPU')"
+       "Dropdown(description='Device:', options=('CPU', 'AUTO'), value='CPU')"
       ]
      },
      "execution_count": 11,
@@ -856,12 +858,12 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "a6d9632cd2b54e0ba4cf3ca72fad4dc9",
+       "model_id": "b31a332c59b847269e7a395d34319ad6",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
-       "Dropdown(description='Device:', options=('CPU', 'GPU', 'AUTO'), value='CPU')"
+       "Dropdown(description='Device:', options=('CPU', 'AUTO'), value='CPU')"
       ]
      },
      "execution_count": 13,
@@ -980,12 +982,12 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "a1f3803a98f2431fbb4614e265bdd401",
+       "model_id": "d40ce9ed20ac455ab3a92366690955a1",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
-       "Dropdown(description='Model to run:', options=('INT4',), value='INT4')"
+       "Dropdown(description='Model to run:', options=('FP16',), value='FP16')"
       ]
      },
      "execution_count": 17,
@@ -1014,7 +1016,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 33,
+   "execution_count": 18,
    "id": "f7f708db-8de1-4efd-94b2-fcabc48d52f4",
    "metadata": {},
    "outputs": [
@@ -1022,13 +1024,81 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Loading model from zephyr-7b-beta/INT4_compressed_weights\n"
+      "Loading model from chatglm3-6b/FP16\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "fcdbf25a78d84edaaf992eef0ff48814",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "tokenizer_config.json:   0%|          | 0.00/1.41k [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "12cbca021f014bb6900c86cbe0fc5e85",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "tokenization_chatglm.py:   0%|          | 0.00/13.0k [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "A new version of the following files was downloaded from https://huggingface.co/THUDM/chatglm3-6b:\n",
+      "- tokenization_chatglm.py\n",
+      ". Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.\n"
      ]
     },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "11924e2e57bc4e59a95fa0d29fd0f0b5",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "tokenizer.model:   0%|          | 0.00/1.02M [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "1d72858b6165472fad6c29bacc2ea49c",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "special_tokens_map.json:   0%|          | 0.00/3.00 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
+      "Setting eos_token is not supported, use the default one.\n",
+      "Setting pad_token is not supported, use the default one.\n",
+      "Setting unk_token is not supported, use the default one.\n",
       "The argument `trust_remote_code` is to be used along with export=True. It will be ignored.\n",
       "Compiling the model to CPU ...\n"
      ]
@@ -1055,7 +1125,7 @@
     "model_name = llm_model_configuration[\"model_id\"]\n",
     "stop_tokens = llm_model_configuration.get(\"stop_tokens\")\n",
     "class_key = llm_model_id.value.split(\"-\")[0]\n",
-    "tok = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, max_length=512)\n",
+    "tok = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)\n",
     "\n",
     "class StopOnTokens(StoppingCriteria):\n",
     "    def __init__(self, token_ids):\n",
@@ -1089,241 +1159,6 @@
     ")"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": 34,
-   "id": "116d6862-b2f2-4a40-88b9-e2b0b9b9e3c1",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "device must be of type <class 'str'> but got <class 'torch.device'> instead\n"
-     ]
-    }
-   ],
-   "source": [
-    "streamer = TextIteratorStreamer(\n",
-    "    tok, timeout=60.0, skip_prompt=True, skip_special_tokens=True\n",
-    ")\n",
-    "generate_kwargs = dict(\n",
-    "    model=ov_model,\n",
-    "    tokenizer=tok,\n",
-    "    max_new_tokens=256,\n",
-    "    streamer=streamer,\n",
-    ")\n",
-    "if stop_tokens is not None:\n",
-    "    generate_kwargs[\"stopping_criteria\"] = StoppingCriteriaList(stop_tokens)\n",
-    "    \n",
-    "pipe = pipeline(\"text-generation\", **generate_kwargs)\n",
-    "llm = HuggingFacePipeline(pipeline=pipe)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 31,
-   "id": "2312b613-1bd3-4920-97fe-57a8949e22fb",
-   "metadata": {},
-   "outputs": [
-    {
-     "ename": "AttributeError",
-     "evalue": "'OVModelForCausalLM' object has no attribute 'parameters'",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
-      "Cell \u001b[0;32mIn[31], line 4\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtransformers\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m AutoModelForCausalLM, AutoTokenizer, LocalAgent\n\u001b[1;32m      3\u001b[0m agent \u001b[38;5;241m=\u001b[39m LocalAgent(ov_model, tok)\n\u001b[0;32m----> 4\u001b[0m \u001b[43magent\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mhello\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/tools/agents.py:348\u001b[0m, in \u001b[0;36mAgent.run\u001b[0;34m(self, task, return_code, remote, **kwargs)\u001b[0m\n\u001b[1;32m    326\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m    327\u001b[0m \u001b[38;5;124;03mSends a request to the agent.\u001b[39;00m\n\u001b[1;32m    328\u001b[0m \n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    345\u001b[0m \u001b[38;5;124;03m```\u001b[39;00m\n\u001b[1;32m    346\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m    347\u001b[0m prompt \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mformat_prompt(task)\n\u001b[0;32m--> 348\u001b[0m result \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mgenerate_one\u001b[49m\u001b[43m(\u001b[49m\u001b[43mprompt\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstop\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mTask:\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    349\u001b[0m explanation, code \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mclean_code_for_run(result)\n\u001b[1;32m    351\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mlog(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m==Explanation from the agent==\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;132;01m{\u001b[39;00mexplanation\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/tools/agents.py:744\u001b[0m, in \u001b[0;36mLocalAgent.generate_one\u001b[0;34m(self, prompt, stop)\u001b[0m\n\u001b[1;32m    743\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mgenerate_one\u001b[39m(\u001b[38;5;28mself\u001b[39m, prompt, stop):\n\u001b[0;32m--> 744\u001b[0m     encoded_inputs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mtokenizer(prompt, return_tensors\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mpt\u001b[39m\u001b[38;5;124m\"\u001b[39m)\u001b[38;5;241m.\u001b[39mto(\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_model_device\u001b[49m)\n\u001b[1;32m    745\u001b[0m     src_len \u001b[38;5;241m=\u001b[39m encoded_inputs[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minput_ids\u001b[39m\u001b[38;5;124m\"\u001b[39m]\u001b[38;5;241m.\u001b[39mshape[\u001b[38;5;241m1\u001b[39m]\n\u001b[1;32m    746\u001b[0m     stopping_criteria \u001b[38;5;241m=\u001b[39m StoppingCriteriaList([StopSequenceCriteria(stop, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mtokenizer)])\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/tools/agents.py:740\u001b[0m, in \u001b[0;36mLocalAgent._model_device\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    738\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mhasattr\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmodel, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhf_device_map\u001b[39m\u001b[38;5;124m\"\u001b[39m):\n\u001b[1;32m    739\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mlist\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmodel\u001b[38;5;241m.\u001b[39mhf_device_map\u001b[38;5;241m.\u001b[39mvalues())[\u001b[38;5;241m0\u001b[39m]\n\u001b[0;32m--> 740\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m param \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mmodel\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mparameters\u001b[49m():\n\u001b[1;32m    741\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m param\u001b[38;5;241m.\u001b[39mdevice\n",
-      "\u001b[0;31mAttributeError\u001b[0m: 'OVModelForCausalLM' object has no attribute 'parameters'"
-     ]
-    }
-   ],
-   "source": [
-    "from transformers import AutoModelForCausalLM, AutoTokenizer, LocalAgent\n",
-    "\n",
-    "agent = LocalAgent(ov_model, tok)\n",
-    "agent.run(\"hello\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 29,
-   "id": "9dc78bda-4be2-4d7b-b0cb-7deb4ed17b16",
-   "metadata": {},
-   "outputs": [
-    {
-     "ename": "AttributeError",
-     "evalue": "'HuggingFacePipeline' object has no attribute 'bind_tools'",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
-      "Cell \u001b[0;32mIn[29], line 36\u001b[0m\n\u001b[1;32m     31\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m base\u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mexponent\n\u001b[1;32m     34\u001b[0m tools \u001b[38;5;241m=\u001b[39m [multiply, add, exponentiate]\n\u001b[0;32m---> 36\u001b[0m llm_with_tools \u001b[38;5;241m=\u001b[39m \u001b[43mllm\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mbind_tools\u001b[49m(tools)\n\u001b[1;32m     38\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01magents\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mformat_scratchpad\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mxml\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m (\n\u001b[1;32m     39\u001b[0m     format_xml,\n\u001b[1;32m     40\u001b[0m )\n\u001b[1;32m     41\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01magents\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01moutput_parsers\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mxml\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m XMLAgentOutputParser\n",
-      "\u001b[0;31mAttributeError\u001b[0m: 'HuggingFacePipeline' object has no attribute 'bind_tools'"
-     ]
-    }
-   ],
-   "source": [
-    "from langchain_core.tools import tool\n",
-    "from langchain import hub\n",
-    "from langchain.agents import AgentExecutor, create_react_agent\n",
-    "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
-    "\n",
-    "prompt = ChatPromptTemplate.from_messages(\n",
-    "    [\n",
-    "        (\n",
-    "            \"system\",\n",
-    "            \"You are very powerful assistant, but don't know current events\",\n",
-    "        ),\n",
-    "        (\"user\", \"{input}\"),\n",
-    "        MessagesPlaceholder(variable_name=\"agent_scratchpad\"),\n",
-    "    ]\n",
-    ")\n",
-    "\n",
-    "@tool\n",
-    "def multiply(first_int: int, second_int: int) -> int:\n",
-    "    \"\"\"Multiply two integers together.\"\"\"\n",
-    "    return first_int * second_int\n",
-    "    \n",
-    "@tool\n",
-    "def add(first_int: int, second_int: int) -> int:\n",
-    "    \"Add two integers.\"\n",
-    "    return first_int + second_int\n",
-    "\n",
-    "\n",
-    "@tool\n",
-    "def exponentiate(base: int, exponent: int) -> int:\n",
-    "    \"Exponentiate the base to the exponent power.\"\n",
-    "    return base**exponent\n",
-    "\n",
-    "\n",
-    "tools = [multiply, add, exponentiate]\n",
-    "\n",
-    "llm_with_tools = llm.bind_tools(tools)\n",
-    "\n",
-    "from langchain.agents.format_scratchpad.xml import (\n",
-    "    format_xml,\n",
-    ")\n",
-    "from langchain.agents.output_parsers.xml import XMLAgentOutputParser\n",
-    "\n",
-    "agent = (\n",
-    "    {\n",
-    "        \"input\": lambda x: x[\"input\"],\n",
-    "        \"agent_scratchpad\": lambda x: format_xml(\n",
-    "            x[\"intermediate_steps\"]\n",
-    "        ),\n",
-    "    }\n",
-    "    | prompt\n",
-    "    | llm_with_tools\n",
-    "    | XMLAgentOutputParser()\n",
-    ")\n",
-    "\n",
-    "\n",
-    "# prompt = hub.pull(\"hwchase17/react\")\n",
-    "# agent = create_react_agent(llm, tools, prompt)\n",
-    "agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)\n",
-    "agent_executor.invoke(\n",
-    "    {\n",
-    "        \"input\": \"Take 3 to the fifth power and multiply that by the sum of twelve and three, then square the whole result\"\n",
-    "    }\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 44,
-   "id": "65b2ebdc-8a50-4580-abeb-3234ea5598a6",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "\n",
-      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n"
-     ]
-    },
-    {
-     "ename": "TypeError",
-     "evalue": "TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
-      "Cell \u001b[0;32mIn[44], line 34\u001b[0m\n\u001b[1;32m     28\u001b[0m agent \u001b[38;5;241m=\u001b[39m create_structured_chat_agent(llm, tools, prompt)\n\u001b[1;32m     30\u001b[0m agent_executor \u001b[38;5;241m=\u001b[39m AgentExecutor(\n\u001b[1;32m     31\u001b[0m     agent\u001b[38;5;241m=\u001b[39magent, tools\u001b[38;5;241m=\u001b[39mtools, verbose\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m, handle_parsing_errors\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m\n\u001b[1;32m     32\u001b[0m )\n\u001b[0;32m---> 34\u001b[0m \u001b[43magent_executor\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43minvoke\u001b[49m\u001b[43m(\u001b[49m\u001b[43m{\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43minput\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m \u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mwhat is LangChain?\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m}\u001b[49m\u001b[43m)\u001b[49m\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain/chains/base.py:163\u001b[0m, in \u001b[0;36mChain.invoke\u001b[0;34m(self, input, config, **kwargs)\u001b[0m\n\u001b[1;32m    161\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mBaseException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m    162\u001b[0m     run_manager\u001b[38;5;241m.\u001b[39mon_chain_error(e)\n\u001b[0;32m--> 163\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m e\n\u001b[1;32m    164\u001b[0m run_manager\u001b[38;5;241m.\u001b[39mon_chain_end(outputs)\n\u001b[1;32m    166\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m include_run_info:\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain/chains/base.py:153\u001b[0m, in \u001b[0;36mChain.invoke\u001b[0;34m(self, input, config, **kwargs)\u001b[0m\n\u001b[1;32m    150\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m    151\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_validate_inputs(inputs)\n\u001b[1;32m    152\u001b[0m     outputs \u001b[38;5;241m=\u001b[39m (\n\u001b[0;32m--> 153\u001b[0m         \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_call\u001b[49m\u001b[43m(\u001b[49m\u001b[43minputs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mrun_manager\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_manager\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    154\u001b[0m         \u001b[38;5;28;01mif\u001b[39;00m new_arg_supported\n\u001b[1;32m    155\u001b[0m         \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_call(inputs)\n\u001b[1;32m    156\u001b[0m     )\n\u001b[1;32m    158\u001b[0m     final_outputs: Dict[\u001b[38;5;28mstr\u001b[39m, Any] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mprep_outputs(\n\u001b[1;32m    159\u001b[0m         inputs, outputs, return_only_outputs\n\u001b[1;32m    160\u001b[0m     )\n\u001b[1;32m    161\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mBaseException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain/agents/agent.py:1391\u001b[0m, in \u001b[0;36mAgentExecutor._call\u001b[0;34m(self, inputs, run_manager)\u001b[0m\n\u001b[1;32m   1389\u001b[0m \u001b[38;5;66;03m# We now enter the agent loop (until it returns something).\u001b[39;00m\n\u001b[1;32m   1390\u001b[0m \u001b[38;5;28;01mwhile\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_should_continue(iterations, time_elapsed):\n\u001b[0;32m-> 1391\u001b[0m     next_step_output \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_take_next_step\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   1392\u001b[0m \u001b[43m        \u001b[49m\u001b[43mname_to_tool_map\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1393\u001b[0m \u001b[43m        \u001b[49m\u001b[43mcolor_mapping\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1394\u001b[0m \u001b[43m        \u001b[49m\u001b[43minputs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1395\u001b[0m \u001b[43m        \u001b[49m\u001b[43mintermediate_steps\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1396\u001b[0m \u001b[43m        \u001b[49m\u001b[43mrun_manager\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_manager\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1397\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1398\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(next_step_output, AgentFinish):\n\u001b[1;32m   1399\u001b[0m         \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_return(\n\u001b[1;32m   1400\u001b[0m             next_step_output, intermediate_steps, run_manager\u001b[38;5;241m=\u001b[39mrun_manager\n\u001b[1;32m   1401\u001b[0m         )\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain/agents/agent.py:1097\u001b[0m, in \u001b[0;36mAgentExecutor._take_next_step\u001b[0;34m(self, name_to_tool_map, color_mapping, inputs, intermediate_steps, run_manager)\u001b[0m\n\u001b[1;32m   1088\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_take_next_step\u001b[39m(\n\u001b[1;32m   1089\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m   1090\u001b[0m     name_to_tool_map: Dict[\u001b[38;5;28mstr\u001b[39m, BaseTool],\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m   1094\u001b[0m     run_manager: Optional[CallbackManagerForChainRun] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m   1095\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Union[AgentFinish, List[Tuple[AgentAction, \u001b[38;5;28mstr\u001b[39m]]]:\n\u001b[1;32m   1096\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_consume_next_step(\n\u001b[0;32m-> 1097\u001b[0m         [\n\u001b[1;32m   1098\u001b[0m             a\n\u001b[1;32m   1099\u001b[0m             \u001b[38;5;28;01mfor\u001b[39;00m a \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_iter_next_step(\n\u001b[1;32m   1100\u001b[0m                 name_to_tool_map,\n\u001b[1;32m   1101\u001b[0m                 color_mapping,\n\u001b[1;32m   1102\u001b[0m                 inputs,\n\u001b[1;32m   1103\u001b[0m                 intermediate_steps,\n\u001b[1;32m   1104\u001b[0m                 run_manager,\n\u001b[1;32m   1105\u001b[0m             )\n\u001b[1;32m   1106\u001b[0m         ]\n\u001b[1;32m   1107\u001b[0m     )\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain/agents/agent.py:1097\u001b[0m, in \u001b[0;36m<listcomp>\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m   1088\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_take_next_step\u001b[39m(\n\u001b[1;32m   1089\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m   1090\u001b[0m     name_to_tool_map: Dict[\u001b[38;5;28mstr\u001b[39m, BaseTool],\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m   1094\u001b[0m     run_manager: Optional[CallbackManagerForChainRun] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m   1095\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Union[AgentFinish, List[Tuple[AgentAction, \u001b[38;5;28mstr\u001b[39m]]]:\n\u001b[1;32m   1096\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_consume_next_step(\n\u001b[0;32m-> 1097\u001b[0m         [\n\u001b[1;32m   1098\u001b[0m             a\n\u001b[1;32m   1099\u001b[0m             \u001b[38;5;28;01mfor\u001b[39;00m a \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_iter_next_step(\n\u001b[1;32m   1100\u001b[0m                 name_to_tool_map,\n\u001b[1;32m   1101\u001b[0m                 color_mapping,\n\u001b[1;32m   1102\u001b[0m                 inputs,\n\u001b[1;32m   1103\u001b[0m                 intermediate_steps,\n\u001b[1;32m   1104\u001b[0m                 run_manager,\n\u001b[1;32m   1105\u001b[0m             )\n\u001b[1;32m   1106\u001b[0m         ]\n\u001b[1;32m   1107\u001b[0m     )\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain/agents/agent.py:1125\u001b[0m, in \u001b[0;36mAgentExecutor._iter_next_step\u001b[0;34m(self, name_to_tool_map, color_mapping, inputs, intermediate_steps, run_manager)\u001b[0m\n\u001b[1;32m   1122\u001b[0m     intermediate_steps \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_prepare_intermediate_steps(intermediate_steps)\n\u001b[1;32m   1124\u001b[0m     \u001b[38;5;66;03m# Call the LLM to see what to do.\u001b[39;00m\n\u001b[0;32m-> 1125\u001b[0m     output \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43magent\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mplan\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   1126\u001b[0m \u001b[43m        \u001b[49m\u001b[43mintermediate_steps\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1127\u001b[0m \u001b[43m        \u001b[49m\u001b[43mcallbacks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_manager\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_child\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mrun_manager\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01melse\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m   1128\u001b[0m \u001b[43m        \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43minputs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1129\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1130\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m OutputParserException \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m   1131\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandle_parsing_errors, \u001b[38;5;28mbool\u001b[39m):\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain/agents/agent.py:387\u001b[0m, in \u001b[0;36mRunnableAgent.plan\u001b[0;34m(self, intermediate_steps, callbacks, **kwargs)\u001b[0m\n\u001b[1;32m    381\u001b[0m \u001b[38;5;66;03m# Use streaming to make sure that the underlying LLM is invoked in a streaming\u001b[39;00m\n\u001b[1;32m    382\u001b[0m \u001b[38;5;66;03m# fashion to make it possible to get access to the individual LLM tokens\u001b[39;00m\n\u001b[1;32m    383\u001b[0m \u001b[38;5;66;03m# when using stream_log with the Agent Executor.\u001b[39;00m\n\u001b[1;32m    384\u001b[0m \u001b[38;5;66;03m# Because the response from the plan is not a generator, we need to\u001b[39;00m\n\u001b[1;32m    385\u001b[0m \u001b[38;5;66;03m# accumulate the output into final output and return that.\u001b[39;00m\n\u001b[1;32m    386\u001b[0m final_output: Any \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m--> 387\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m chunk \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mrunnable\u001b[38;5;241m.\u001b[39mstream(inputs, config\u001b[38;5;241m=\u001b[39m{\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcallbacks\u001b[39m\u001b[38;5;124m\"\u001b[39m: callbacks}):\n\u001b[1;32m    388\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m final_output \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m    389\u001b[0m         final_output \u001b[38;5;241m=\u001b[39m chunk\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/runnables/base.py:2446\u001b[0m, in \u001b[0;36mRunnableSequence.stream\u001b[0;34m(self, input, config, **kwargs)\u001b[0m\n\u001b[1;32m   2440\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mstream\u001b[39m(\n\u001b[1;32m   2441\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m   2442\u001b[0m     \u001b[38;5;28minput\u001b[39m: Input,\n\u001b[1;32m   2443\u001b[0m     config: Optional[RunnableConfig] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m   2444\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Optional[Any],\n\u001b[1;32m   2445\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Iterator[Output]:\n\u001b[0;32m-> 2446\u001b[0m     \u001b[38;5;28;01myield from\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mtransform(\u001b[38;5;28miter\u001b[39m([\u001b[38;5;28minput\u001b[39m]), config, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/runnables/base.py:2433\u001b[0m, in \u001b[0;36mRunnableSequence.transform\u001b[0;34m(self, input, config, **kwargs)\u001b[0m\n\u001b[1;32m   2427\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mtransform\u001b[39m(\n\u001b[1;32m   2428\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m   2429\u001b[0m     \u001b[38;5;28minput\u001b[39m: Iterator[Input],\n\u001b[1;32m   2430\u001b[0m     config: Optional[RunnableConfig] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m   2431\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Optional[Any],\n\u001b[1;32m   2432\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Iterator[Output]:\n\u001b[0;32m-> 2433\u001b[0m     \u001b[38;5;28;01myield from\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_transform_stream_with_config(\n\u001b[1;32m   2434\u001b[0m         \u001b[38;5;28minput\u001b[39m,\n\u001b[1;32m   2435\u001b[0m         \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_transform,\n\u001b[1;32m   2436\u001b[0m         patch_config(config, run_name\u001b[38;5;241m=\u001b[39m(config \u001b[38;5;129;01mor\u001b[39;00m {})\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrun_name\u001b[39m\u001b[38;5;124m\"\u001b[39m) \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mname),\n\u001b[1;32m   2437\u001b[0m         \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[1;32m   2438\u001b[0m     )\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/runnables/base.py:1513\u001b[0m, in \u001b[0;36mRunnable._transform_stream_with_config\u001b[0;34m(self, input, transformer, config, run_type, **kwargs)\u001b[0m\n\u001b[1;32m   1511\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m   1512\u001b[0m     \u001b[38;5;28;01mwhile\u001b[39;00m \u001b[38;5;28;01mTrue\u001b[39;00m:\n\u001b[0;32m-> 1513\u001b[0m         chunk: Output \u001b[38;5;241m=\u001b[39m \u001b[43mcontext\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mnext\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43miterator\u001b[49m\u001b[43m)\u001b[49m  \u001b[38;5;66;03m# type: ignore\u001b[39;00m\n\u001b[1;32m   1514\u001b[0m         \u001b[38;5;28;01myield\u001b[39;00m chunk\n\u001b[1;32m   1515\u001b[0m         \u001b[38;5;28;01mif\u001b[39;00m final_output_supported:\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/runnables/base.py:2397\u001b[0m, in \u001b[0;36mRunnableSequence._transform\u001b[0;34m(self, input, run_manager, config)\u001b[0m\n\u001b[1;32m   2388\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m step \u001b[38;5;129;01min\u001b[39;00m steps:\n\u001b[1;32m   2389\u001b[0m     final_pipeline \u001b[38;5;241m=\u001b[39m step\u001b[38;5;241m.\u001b[39mtransform(\n\u001b[1;32m   2390\u001b[0m         final_pipeline,\n\u001b[1;32m   2391\u001b[0m         patch_config(\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m   2394\u001b[0m         ),\n\u001b[1;32m   2395\u001b[0m     )\n\u001b[0;32m-> 2397\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m output \u001b[38;5;129;01min\u001b[39;00m final_pipeline:\n\u001b[1;32m   2398\u001b[0m     \u001b[38;5;28;01myield\u001b[39;00m output\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/runnables/base.py:1051\u001b[0m, in \u001b[0;36mRunnable.transform\u001b[0;34m(self, input, config, **kwargs)\u001b[0m\n\u001b[1;32m   1048\u001b[0m final: Input\n\u001b[1;32m   1049\u001b[0m got_first_val \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mFalse\u001b[39;00m\n\u001b[0;32m-> 1051\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m chunk \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28minput\u001b[39m:\n\u001b[1;32m   1052\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m got_first_val:\n\u001b[1;32m   1053\u001b[0m         final \u001b[38;5;241m=\u001b[39m chunk\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/runnables/base.py:4173\u001b[0m, in \u001b[0;36mRunnableBindingBase.transform\u001b[0;34m(self, input, config, **kwargs)\u001b[0m\n\u001b[1;32m   4167\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mtransform\u001b[39m(\n\u001b[1;32m   4168\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m   4169\u001b[0m     \u001b[38;5;28minput\u001b[39m: Iterator[Input],\n\u001b[1;32m   4170\u001b[0m     config: Optional[RunnableConfig] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m   4171\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m   4172\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Iterator[Output]:\n\u001b[0;32m-> 4173\u001b[0m     \u001b[38;5;28;01myield from\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mbound\u001b[38;5;241m.\u001b[39mtransform(\n\u001b[1;32m   4174\u001b[0m         \u001b[38;5;28minput\u001b[39m,\n\u001b[1;32m   4175\u001b[0m         \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_merge_configs(config),\n\u001b[1;32m   4176\u001b[0m         \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39m{\u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mkwargs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs},\n\u001b[1;32m   4177\u001b[0m     )\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/runnables/base.py:1061\u001b[0m, in \u001b[0;36mRunnable.transform\u001b[0;34m(self, input, config, **kwargs)\u001b[0m\n\u001b[1;32m   1058\u001b[0m         final \u001b[38;5;241m=\u001b[39m final \u001b[38;5;241m+\u001b[39m chunk  \u001b[38;5;66;03m# type: ignore[operator]\u001b[39;00m\n\u001b[1;32m   1060\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m got_first_val:\n\u001b[0;32m-> 1061\u001b[0m     \u001b[38;5;28;01myield from\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mstream(final, config, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/language_models/llms.py:409\u001b[0m, in \u001b[0;36mBaseLLM.stream\u001b[0;34m(self, input, config, stop, **kwargs)\u001b[0m\n\u001b[1;32m    399\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mstream\u001b[39m(\n\u001b[1;32m    400\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m    401\u001b[0m     \u001b[38;5;28minput\u001b[39m: LanguageModelInput,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    405\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m    406\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Iterator[\u001b[38;5;28mstr\u001b[39m]:\n\u001b[1;32m    407\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mtype\u001b[39m(\u001b[38;5;28mself\u001b[39m)\u001b[38;5;241m.\u001b[39m_stream \u001b[38;5;241m==\u001b[39m BaseLLM\u001b[38;5;241m.\u001b[39m_stream:\n\u001b[1;32m    408\u001b[0m         \u001b[38;5;66;03m# model doesn't implement streaming, so use default implementation\u001b[39;00m\n\u001b[0;32m--> 409\u001b[0m         \u001b[38;5;28;01myield\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43minvoke\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43minput\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstop\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    410\u001b[0m     \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m    411\u001b[0m         prompt \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_convert_input(\u001b[38;5;28minput\u001b[39m)\u001b[38;5;241m.\u001b[39mto_string()\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/language_models/llms.py:273\u001b[0m, in \u001b[0;36mBaseLLM.invoke\u001b[0;34m(self, input, config, stop, **kwargs)\u001b[0m\n\u001b[1;32m    263\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21minvoke\u001b[39m(\n\u001b[1;32m    264\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m    265\u001b[0m     \u001b[38;5;28minput\u001b[39m: LanguageModelInput,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    269\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m    270\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[38;5;28mstr\u001b[39m:\n\u001b[1;32m    271\u001b[0m     config \u001b[38;5;241m=\u001b[39m ensure_config(config)\n\u001b[1;32m    272\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m (\n\u001b[0;32m--> 273\u001b[0m         \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mgenerate_prompt\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    274\u001b[0m \u001b[43m            \u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_convert_input\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43minput\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    275\u001b[0m \u001b[43m            \u001b[49m\u001b[43mstop\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    276\u001b[0m \u001b[43m            \u001b[49m\u001b[43mcallbacks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mcallbacks\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    277\u001b[0m \u001b[43m            \u001b[49m\u001b[43mtags\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mtags\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    278\u001b[0m \u001b[43m            \u001b[49m\u001b[43mmetadata\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mmetadata\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    279\u001b[0m \u001b[43m            \u001b[49m\u001b[43mrun_name\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mrun_name\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    280\u001b[0m \u001b[43m            \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    281\u001b[0m \u001b[43m        \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    282\u001b[0m         \u001b[38;5;241m.\u001b[39mgenerations[\u001b[38;5;241m0\u001b[39m][\u001b[38;5;241m0\u001b[39m]\n\u001b[1;32m    283\u001b[0m         \u001b[38;5;241m.\u001b[39mtext\n\u001b[1;32m    284\u001b[0m     )\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/language_models/llms.py:568\u001b[0m, in \u001b[0;36mBaseLLM.generate_prompt\u001b[0;34m(self, prompts, stop, callbacks, **kwargs)\u001b[0m\n\u001b[1;32m    560\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mgenerate_prompt\u001b[39m(\n\u001b[1;32m    561\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m    562\u001b[0m     prompts: List[PromptValue],\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    565\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m    566\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m LLMResult:\n\u001b[1;32m    567\u001b[0m     prompt_strings \u001b[38;5;241m=\u001b[39m [p\u001b[38;5;241m.\u001b[39mto_string() \u001b[38;5;28;01mfor\u001b[39;00m p \u001b[38;5;129;01min\u001b[39;00m prompts]\n\u001b[0;32m--> 568\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mgenerate\u001b[49m\u001b[43m(\u001b[49m\u001b[43mprompt_strings\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstop\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcallbacks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcallbacks\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/language_models/llms.py:741\u001b[0m, in \u001b[0;36mBaseLLM.generate\u001b[0;34m(self, prompts, stop, callbacks, tags, metadata, run_name, **kwargs)\u001b[0m\n\u001b[1;32m    725\u001b[0m         \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[1;32m    726\u001b[0m             \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mAsked to cache, but no cache found at `langchain.cache`.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m    727\u001b[0m         )\n\u001b[1;32m    728\u001b[0m     run_managers \u001b[38;5;241m=\u001b[39m [\n\u001b[1;32m    729\u001b[0m         callback_manager\u001b[38;5;241m.\u001b[39mon_llm_start(\n\u001b[1;32m    730\u001b[0m             dumpd(\u001b[38;5;28mself\u001b[39m),\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    739\u001b[0m         )\n\u001b[1;32m    740\u001b[0m     ]\n\u001b[0;32m--> 741\u001b[0m     output \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_generate_helper\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    742\u001b[0m \u001b[43m        \u001b[49m\u001b[43mprompts\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mrun_managers\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mbool\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mnew_arg_supported\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\n\u001b[1;32m    743\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    744\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m output\n\u001b[1;32m    745\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(missing_prompts) \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m0\u001b[39m:\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/language_models/llms.py:605\u001b[0m, in \u001b[0;36mBaseLLM._generate_helper\u001b[0;34m(self, prompts, stop, run_managers, new_arg_supported, **kwargs)\u001b[0m\n\u001b[1;32m    603\u001b[0m     \u001b[38;5;28;01mfor\u001b[39;00m run_manager \u001b[38;5;129;01min\u001b[39;00m run_managers:\n\u001b[1;32m    604\u001b[0m         run_manager\u001b[38;5;241m.\u001b[39mon_llm_error(e, response\u001b[38;5;241m=\u001b[39mLLMResult(generations\u001b[38;5;241m=\u001b[39m[]))\n\u001b[0;32m--> 605\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m e\n\u001b[1;32m    606\u001b[0m flattened_outputs \u001b[38;5;241m=\u001b[39m output\u001b[38;5;241m.\u001b[39mflatten()\n\u001b[1;32m    607\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m manager, flattened_output \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mzip\u001b[39m(run_managers, flattened_outputs):\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/language_models/llms.py:592\u001b[0m, in \u001b[0;36mBaseLLM._generate_helper\u001b[0;34m(self, prompts, stop, run_managers, new_arg_supported, **kwargs)\u001b[0m\n\u001b[1;32m    582\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_generate_helper\u001b[39m(\n\u001b[1;32m    583\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m    584\u001b[0m     prompts: List[\u001b[38;5;28mstr\u001b[39m],\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    588\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m    589\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m LLMResult:\n\u001b[1;32m    590\u001b[0m     \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m    591\u001b[0m         output \u001b[38;5;241m=\u001b[39m (\n\u001b[0;32m--> 592\u001b[0m             \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_generate\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    593\u001b[0m \u001b[43m                \u001b[49m\u001b[43mprompts\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    594\u001b[0m \u001b[43m                \u001b[49m\u001b[43mstop\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    595\u001b[0m \u001b[43m                \u001b[49m\u001b[38;5;66;43;03m# TODO: support multiple run managers\u001b[39;49;00m\n\u001b[1;32m    596\u001b[0m \u001b[43m                \u001b[49m\u001b[43mrun_manager\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_managers\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mrun_managers\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01melse\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m    597\u001b[0m \u001b[43m                \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    598\u001b[0m \u001b[43m            \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    599\u001b[0m             \u001b[38;5;28;01mif\u001b[39;00m new_arg_supported\n\u001b[1;32m    600\u001b[0m             \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_generate(prompts, stop\u001b[38;5;241m=\u001b[39mstop)\n\u001b[1;32m    601\u001b[0m         )\n\u001b[1;32m    602\u001b[0m     \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mBaseException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m    603\u001b[0m         \u001b[38;5;28;01mfor\u001b[39;00m run_manager \u001b[38;5;129;01min\u001b[39;00m run_managers:\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_community/llms/huggingface_pipeline.py:261\u001b[0m, in \u001b[0;36mHuggingFacePipeline._generate\u001b[0;34m(self, prompts, stop, run_manager, **kwargs)\u001b[0m\n\u001b[1;32m    258\u001b[0m batch_prompts \u001b[38;5;241m=\u001b[39m prompts[i : i \u001b[38;5;241m+\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mbatch_size]\n\u001b[1;32m    260\u001b[0m \u001b[38;5;66;03m# Process batch of prompts\u001b[39;00m\n\u001b[0;32m--> 261\u001b[0m responses \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mpipeline\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    262\u001b[0m \u001b[43m    \u001b[49m\u001b[43mbatch_prompts\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    263\u001b[0m \u001b[43m    \u001b[49m\u001b[43mstop_sequence\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    264\u001b[0m \u001b[43m    \u001b[49m\u001b[43mreturn_full_text\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m    265\u001b[0m \u001b[43m    \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mpipeline_kwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    266\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    268\u001b[0m \u001b[38;5;66;03m# Process each response in the batch\u001b[39;00m\n\u001b[1;32m    269\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m j, response \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28menumerate\u001b[39m(responses):\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/pipelines/text_generation.py:241\u001b[0m, in \u001b[0;36mTextGenerationPipeline.__call__\u001b[0;34m(self, text_inputs, **kwargs)\u001b[0m\n\u001b[1;32m    239\u001b[0m         \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28msuper\u001b[39m()\u001b[38;5;241m.\u001b[39m\u001b[38;5;21m__call__\u001b[39m(chats, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n\u001b[1;32m    240\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m--> 241\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[38;5;21;43m__call__\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mtext_inputs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/pipelines/base.py:1148\u001b[0m, in \u001b[0;36mPipeline.__call__\u001b[0;34m(self, inputs, num_workers, batch_size, *args, **kwargs)\u001b[0m\n\u001b[1;32m   1145\u001b[0m     \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m   1146\u001b[0m         batch_size \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_batch_size\n\u001b[0;32m-> 1148\u001b[0m preprocess_params, forward_params, postprocess_params \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_sanitize_parameters\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1150\u001b[0m \u001b[38;5;66;03m# Fuse __init__ params and __call__ params without modifying the __init__ ones.\u001b[39;00m\n\u001b[1;32m   1151\u001b[0m preprocess_params \u001b[38;5;241m=\u001b[39m {\u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_preprocess_params, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mpreprocess_params}\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/pipelines/text_generation.py:171\u001b[0m, in \u001b[0;36mTextGenerationPipeline._sanitize_parameters\u001b[0;34m(self, return_full_text, return_tensors, return_text, return_type, clean_up_tokenization_spaces, prefix, handle_long_generation, stop_sequence, add_special_tokens, truncation, padding, max_length, **generate_kwargs)\u001b[0m\n\u001b[1;32m    168\u001b[0m     postprocess_params[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mclean_up_tokenization_spaces\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m clean_up_tokenization_spaces\n\u001b[1;32m    170\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m stop_sequence \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m--> 171\u001b[0m     stop_sequence_ids \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mtokenizer\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mencode\u001b[49m\u001b[43m(\u001b[49m\u001b[43mstop_sequence\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43madd_special_tokens\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m\n\u001b[1;32m    172\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(stop_sequence_ids) \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[1;32m    173\u001b[0m         warnings\u001b[38;5;241m.\u001b[39mwarn(\n\u001b[1;32m    174\u001b[0m             \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mStopping on a multiple token sequence is not yet supported on transformers. The first token of\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m    175\u001b[0m             \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m the stop sequence will be used as the stop sequence string in the interim.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m    176\u001b[0m         )\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2600\u001b[0m, in \u001b[0;36mPreTrainedTokenizerBase.encode\u001b[0;34m(self, text, text_pair, add_special_tokens, padding, truncation, max_length, stride, return_tensors, **kwargs)\u001b[0m\n\u001b[1;32m   2563\u001b[0m \u001b[38;5;129m@add_end_docstrings\u001b[39m(\n\u001b[1;32m   2564\u001b[0m     ENCODE_KWARGS_DOCSTRING,\n\u001b[1;32m   2565\u001b[0m \u001b[38;5;250m    \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m   2583\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[1;32m   2584\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m List[\u001b[38;5;28mint\u001b[39m]:\n\u001b[1;32m   2585\u001b[0m \u001b[38;5;250m    \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m   2586\u001b[0m \u001b[38;5;124;03m    Converts a string to a sequence of ids (integer), using the tokenizer and vocabulary.\u001b[39;00m\n\u001b[1;32m   2587\u001b[0m \n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m   2598\u001b[0m \u001b[38;5;124;03m            method).\u001b[39;00m\n\u001b[1;32m   2599\u001b[0m \u001b[38;5;124;03m    \"\"\"\u001b[39;00m\n\u001b[0;32m-> 2600\u001b[0m     encoded_inputs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mencode_plus\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   2601\u001b[0m \u001b[43m        \u001b[49m\u001b[43mtext\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   2602\u001b[0m \u001b[43m        \u001b[49m\u001b[43mtext_pair\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtext_pair\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   2603\u001b[0m \u001b[43m        \u001b[49m\u001b[43madd_special_tokens\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43madd_special_tokens\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   2604\u001b[0m \u001b[43m        \u001b[49m\u001b[43mpadding\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpadding\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   2605\u001b[0m \u001b[43m        \u001b[49m\u001b[43mtruncation\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtruncation\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   2606\u001b[0m \u001b[43m        \u001b[49m\u001b[43mmax_length\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmax_length\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   2607\u001b[0m \u001b[43m        \u001b[49m\u001b[43mstride\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstride\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   2608\u001b[0m \u001b[43m        \u001b[49m\u001b[43mreturn_tensors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_tensors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   2609\u001b[0m \u001b[43m        \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   2610\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   2612\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m encoded_inputs[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minput_ids\u001b[39m\u001b[38;5;124m\"\u001b[39m]\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3008\u001b[0m, in \u001b[0;36mPreTrainedTokenizerBase.encode_plus\u001b[0;34m(self, text, text_pair, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)\u001b[0m\n\u001b[1;32m   2998\u001b[0m \u001b[38;5;66;03m# Backward compatibility for 'truncation_strategy', 'pad_to_max_length'\u001b[39;00m\n\u001b[1;32m   2999\u001b[0m padding_strategy, truncation_strategy, max_length, kwargs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_get_padding_truncation_strategies(\n\u001b[1;32m   3000\u001b[0m     padding\u001b[38;5;241m=\u001b[39mpadding,\n\u001b[1;32m   3001\u001b[0m     truncation\u001b[38;5;241m=\u001b[39mtruncation,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m   3005\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[1;32m   3006\u001b[0m )\n\u001b[0;32m-> 3008\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_encode_plus\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   3009\u001b[0m \u001b[43m    \u001b[49m\u001b[43mtext\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtext\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3010\u001b[0m \u001b[43m    \u001b[49m\u001b[43mtext_pair\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtext_pair\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3011\u001b[0m \u001b[43m    \u001b[49m\u001b[43madd_special_tokens\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43madd_special_tokens\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3012\u001b[0m \u001b[43m    \u001b[49m\u001b[43mpadding_strategy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpadding_strategy\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3013\u001b[0m \u001b[43m    \u001b[49m\u001b[43mtruncation_strategy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtruncation_strategy\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3014\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmax_length\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmax_length\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3015\u001b[0m \u001b[43m    \u001b[49m\u001b[43mstride\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstride\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3016\u001b[0m \u001b[43m    \u001b[49m\u001b[43mis_split_into_words\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mis_split_into_words\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3017\u001b[0m \u001b[43m    \u001b[49m\u001b[43mpad_to_multiple_of\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpad_to_multiple_of\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3018\u001b[0m \u001b[43m    \u001b[49m\u001b[43mreturn_tensors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_tensors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3019\u001b[0m \u001b[43m    \u001b[49m\u001b[43mreturn_token_type_ids\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_token_type_ids\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3020\u001b[0m \u001b[43m    \u001b[49m\u001b[43mreturn_attention_mask\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_attention_mask\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3021\u001b[0m \u001b[43m    \u001b[49m\u001b[43mreturn_overflowing_tokens\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_overflowing_tokens\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3022\u001b[0m \u001b[43m    \u001b[49m\u001b[43mreturn_special_tokens_mask\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_special_tokens_mask\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3023\u001b[0m \u001b[43m    \u001b[49m\u001b[43mreturn_offsets_mapping\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_offsets_mapping\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3024\u001b[0m \u001b[43m    \u001b[49m\u001b[43mreturn_length\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_length\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3025\u001b[0m \u001b[43m    \u001b[49m\u001b[43mverbose\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mverbose\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3026\u001b[0m \u001b[43m    \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   3027\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py:576\u001b[0m, in \u001b[0;36mPreTrainedTokenizerFast._encode_plus\u001b[0;34m(self, text, text_pair, add_special_tokens, padding_strategy, truncation_strategy, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)\u001b[0m\n\u001b[1;32m    554\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_encode_plus\u001b[39m(\n\u001b[1;32m    555\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m    556\u001b[0m     text: Union[TextInput, PreTokenizedInput],\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    573\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[1;32m    574\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m BatchEncoding:\n\u001b[1;32m    575\u001b[0m     batched_input \u001b[38;5;241m=\u001b[39m [(text, text_pair)] \u001b[38;5;28;01mif\u001b[39;00m text_pair \u001b[38;5;28;01melse\u001b[39;00m [text]\n\u001b[0;32m--> 576\u001b[0m     batched_output \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_batch_encode_plus\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    577\u001b[0m \u001b[43m        \u001b[49m\u001b[43mbatched_input\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    578\u001b[0m \u001b[43m        \u001b[49m\u001b[43mis_split_into_words\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mis_split_into_words\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    579\u001b[0m \u001b[43m        \u001b[49m\u001b[43madd_special_tokens\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43madd_special_tokens\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    580\u001b[0m \u001b[43m        \u001b[49m\u001b[43mpadding_strategy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpadding_strategy\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    581\u001b[0m \u001b[43m        \u001b[49m\u001b[43mtruncation_strategy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtruncation_strategy\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    582\u001b[0m \u001b[43m        \u001b[49m\u001b[43mmax_length\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmax_length\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    583\u001b[0m \u001b[43m        \u001b[49m\u001b[43mstride\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstride\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    584\u001b[0m \u001b[43m        \u001b[49m\u001b[43mpad_to_multiple_of\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpad_to_multiple_of\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    585\u001b[0m \u001b[43m        \u001b[49m\u001b[43mreturn_tensors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_tensors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    586\u001b[0m \u001b[43m        \u001b[49m\u001b[43mreturn_token_type_ids\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_token_type_ids\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    587\u001b[0m \u001b[43m        \u001b[49m\u001b[43mreturn_attention_mask\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_attention_mask\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    588\u001b[0m \u001b[43m        \u001b[49m\u001b[43mreturn_overflowing_tokens\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_overflowing_tokens\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    589\u001b[0m \u001b[43m        \u001b[49m\u001b[43mreturn_special_tokens_mask\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_special_tokens_mask\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    590\u001b[0m \u001b[43m        \u001b[49m\u001b[43mreturn_offsets_mapping\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_offsets_mapping\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    591\u001b[0m \u001b[43m        \u001b[49m\u001b[43mreturn_length\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_length\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    592\u001b[0m \u001b[43m        \u001b[49m\u001b[43mverbose\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mverbose\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    593\u001b[0m \u001b[43m        \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    594\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    596\u001b[0m     \u001b[38;5;66;03m# Return tensor is None, then we can remove the leading batch axis\u001b[39;00m\n\u001b[1;32m    597\u001b[0m     \u001b[38;5;66;03m# Overflowing tokens are returned as a batch of output so we keep them in this case\u001b[39;00m\n\u001b[1;32m    598\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m return_tensors \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m return_overflowing_tokens:\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py:504\u001b[0m, in \u001b[0;36mPreTrainedTokenizerFast._batch_encode_plus\u001b[0;34m(self, batch_text_or_text_pairs, add_special_tokens, padding_strategy, truncation_strategy, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose)\u001b[0m\n\u001b[1;32m    495\u001b[0m \u001b[38;5;66;03m# Set the truncation and padding strategy and restore the initial configuration\u001b[39;00m\n\u001b[1;32m    496\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mset_truncation_and_padding(\n\u001b[1;32m    497\u001b[0m     padding_strategy\u001b[38;5;241m=\u001b[39mpadding_strategy,\n\u001b[1;32m    498\u001b[0m     truncation_strategy\u001b[38;5;241m=\u001b[39mtruncation_strategy,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    501\u001b[0m     pad_to_multiple_of\u001b[38;5;241m=\u001b[39mpad_to_multiple_of,\n\u001b[1;32m    502\u001b[0m )\n\u001b[0;32m--> 504\u001b[0m encodings \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_tokenizer\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mencode_batch\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    505\u001b[0m \u001b[43m    \u001b[49m\u001b[43mbatch_text_or_text_pairs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    506\u001b[0m \u001b[43m    \u001b[49m\u001b[43madd_special_tokens\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43madd_special_tokens\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    507\u001b[0m \u001b[43m    \u001b[49m\u001b[43mis_pretokenized\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mis_split_into_words\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    508\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    510\u001b[0m \u001b[38;5;66;03m# Convert encoding to dict\u001b[39;00m\n\u001b[1;32m    511\u001b[0m \u001b[38;5;66;03m# `Tokens` has type: Tuple[\u001b[39;00m\n\u001b[1;32m    512\u001b[0m \u001b[38;5;66;03m#                       List[Dict[str, List[List[int]]]] or List[Dict[str, 2D-Tensor]],\u001b[39;00m\n\u001b[1;32m    513\u001b[0m \u001b[38;5;66;03m#                       List[EncodingFast]\u001b[39;00m\n\u001b[1;32m    514\u001b[0m \u001b[38;5;66;03m#                    ]\u001b[39;00m\n\u001b[1;32m    515\u001b[0m \u001b[38;5;66;03m# with nested dimensions corresponding to batch, overflows, sequence length\u001b[39;00m\n\u001b[1;32m    516\u001b[0m tokens_and_encodings \u001b[38;5;241m=\u001b[39m [\n\u001b[1;32m    517\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_convert_encoding(\n\u001b[1;32m    518\u001b[0m         encoding\u001b[38;5;241m=\u001b[39mencoding,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    527\u001b[0m     \u001b[38;5;28;01mfor\u001b[39;00m encoding \u001b[38;5;129;01min\u001b[39;00m encodings\n\u001b[1;32m    528\u001b[0m ]\n",
-      "\u001b[0;31mTypeError\u001b[0m: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]"
-     ]
-    }
-   ],
-   "source": [
-    "from langchain_core.tools import tool\n",
-    "from langchain import hub\n",
-    "from langchain.agents import AgentExecutor, create_structured_chat_agent\n",
-    "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
-    "\n",
-    "@tool\n",
-    "def multiply(first_int: int, second_int: int) -> int:\n",
-    "    \"\"\"Multiply two integers together.\"\"\"\n",
-    "    return first_int * second_int\n",
-    "    \n",
-    "@tool\n",
-    "def add(first_int: int, second_int: int) -> int:\n",
-    "    \"Add two integers.\"\n",
-    "    return first_int + second_int\n",
-    "\n",
-    "\n",
-    "@tool\n",
-    "def exponentiate(base: int, exponent: int) -> int:\n",
-    "    \"Exponentiate the base to the exponent power.\"\n",
-    "    return base**exponent\n",
-    "\n",
-    "\n",
-    "tools = [multiply, add, exponentiate]\n",
-    "\n",
-    "\n",
-    "prompt = hub.pull(\"hwchase17/structured-chat-agent\")\n",
-    "\n",
-    "agent = create_structured_chat_agent(llm, tools, prompt)\n",
-    "\n",
-    "agent_executor = AgentExecutor(\n",
-    "    agent=agent, tools=tools, verbose=True, handle_parsing_errors=True\n",
-    ")\n",
-    "\n",
-    "agent_executor.invoke({\"input\": \"what is LangChain?\"})"
-   ]
-  },
   {
    "attachments": {},
    "cell_type": "markdown",
@@ -1359,7 +1194,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 19,
+   "execution_count": 31,
    "id": "5b97eeeb",
    "metadata": {},
    "outputs": [],
@@ -1433,38 +1268,10 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 22,
+   "execution_count": null,
    "id": "0908e5e9-4dcb-4fc8-8480-3cf70fd5e934",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Running on local URL:  http://10.3.233.99:5344\n"
-     ]
-    },
-    {
-     "ename": "ValueError",
-     "evalue": "When localhost is not accessible, a shareable link must be created. Please set share=True or check your proxy settings to allow access to localhost.",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
-      "Cell \u001b[0;32mIn[22], line 324\u001b[0m\n\u001b[1;32m    318\u001b[0m demo\u001b[38;5;241m.\u001b[39mqueue()\n\u001b[1;32m    319\u001b[0m \u001b[38;5;66;03m# if you are launching remotely, specify server_name and server_port\u001b[39;00m\n\u001b[1;32m    320\u001b[0m \u001b[38;5;66;03m#  demo.launch(server_name='your server name', server_port='server port in int')\u001b[39;00m\n\u001b[1;32m    321\u001b[0m \u001b[38;5;66;03m# if you have any issue to launch on your platform, you can pass share=True to launch method:\u001b[39;00m\n\u001b[1;32m    322\u001b[0m \u001b[38;5;66;03m# demo.launch(share=True)\u001b[39;00m\n\u001b[1;32m    323\u001b[0m \u001b[38;5;66;03m# it creates a publicly shareable link for the interface. Read more in the docs: https://gradio.app/docs/\u001b[39;00m\n\u001b[0;32m--> 324\u001b[0m \u001b[43mdemo\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlaunch\u001b[49m\u001b[43m(\u001b[49m\u001b[43mserver_name\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43m10.3.233.99\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mserver_port\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;241;43m5344\u001b[39;49m\u001b[43m)\u001b[49m\n",
-      "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/gradio/blocks.py:2165\u001b[0m, in \u001b[0;36mBlocks.launch\u001b[0;34m(self, inline, inbrowser, share, debug, max_threads, auth, auth_message, prevent_thread_lock, show_error, server_name, server_port, height, width, favicon_path, ssl_keyfile, ssl_certfile, ssl_keyfile_password, ssl_verify, quiet, show_api, allowed_paths, blocked_paths, root_path, app_kwargs, state_session_capacity, share_server_address, share_server_protocol, auth_dependency, _frontend)\u001b[0m\n\u001b[1;32m   2157\u001b[0m \u001b[38;5;66;03m# If running in a colab or not able to access localhost,\u001b[39;00m\n\u001b[1;32m   2158\u001b[0m \u001b[38;5;66;03m# a shareable link must be created.\u001b[39;00m\n\u001b[1;32m   2159\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m (\n\u001b[1;32m   2160\u001b[0m     _frontend\n\u001b[1;32m   2161\u001b[0m     \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m wasm_utils\u001b[38;5;241m.\u001b[39mIS_WASM\n\u001b[1;32m   2162\u001b[0m     \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m networking\u001b[38;5;241m.\u001b[39murl_ok(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mlocal_url)\n\u001b[1;32m   2163\u001b[0m     \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mshare\n\u001b[1;32m   2164\u001b[0m ):\n\u001b[0;32m-> 2165\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[1;32m   2166\u001b[0m         \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mWhen localhost is not accessible, a shareable link must be created. Please set share=True or check your proxy settings to allow access to localhost.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m   2167\u001b[0m     )\n\u001b[1;32m   2169\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mis_colab \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m quiet:\n\u001b[1;32m   2170\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m debug:\n",
-      "\u001b[0;31mValueError\u001b[0m: When localhost is not accessible, a shareable link must be created. Please set share=True or check your proxy settings to allow access to localhost."
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "device must be of type <class 'str'> but got <class 'torch.device'> instead\n",
-      "Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "from langchain.prompts import PromptTemplate\n",
     "from langchain.vectorstores import Chroma\n",
@@ -1789,12 +1596,12 @@
     "# if you have any issue to launch on your platform, you can pass share=True to launch method:\n",
     "# demo.launch(share=True)\n",
     "# it creates a publicly shareable link for the interface. Read more in the docs: https://gradio.app/docs/\n",
-    "demo.launch(server_name='10.3.233.99', server_port=5344)"
+    "demo.launch()"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 29,
+   "execution_count": 30,
    "id": "6f4b5a84-bebf-49b9-b2fa-5e788ed2cbac",
    "metadata": {},
    "outputs": [
@@ -1802,7 +1609,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Closing server running on port: 4545\n"
+      "Closing server running on port: 5579\n"
      ]
     }
    ],
@@ -1828,7 +1635,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.12"
+   "version": "3.11.4"
   },
   "openvino_notebooks": {
    "imageUrl": "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/304aa048-f10c-41c6-bb31-6d2bfdf49cf5",
diff --git a/notebooks/254-llm-chatbot/config.py b/notebooks/254-llm-chatbot/config.py
index f8ad87e77b5..87eb36ffb66 100644
--- a/notebooks/254-llm-chatbot/config.py
+++ b/notebooks/254-llm-chatbot/config.py
@@ -132,16 +132,16 @@ def internlm_partial_text_processor(partial_text, new_text):
         "mistral-7b": {
             "model_id": "mistralai/Mistral-7B-v0.1",
             "remote": False,
-            "start_message": f"<|system|>\n{DEFAULT_SYSTEM_PROMPT}</s>\n",
-            "history_template": "<|user|>\n{user}</s> \n<|assistant|>\n{assistant}</s> \n",
-            "current_message_template": "<|user|>\n{user}</s> \n<|assistant|>\n{assistant}",
-            "rag_prompt_template": f"""<|system|> {DEFAULT_RAG_PROMPT }</s>"""
+            "start_message": f"<s>[INST] <<SYS>>\n{DEFAULT_SYSTEM_PROMPT }\n<</SYS>>\n\n",
+            "history_template": "{user}[/INST]{assistant}</s><s>[INST]",
+            "current_message_template": "{user} [/INST]{assistant}",
+            "tokenizer_kwargs": {"add_special_tokens": False},
+            "partial_text_processor": llama_partial_text_processor,
+            "rag_prompt_template": f"""<s> [INST] {DEFAULT_RAG_PROMPT } [/INST] </s>"""
             + """ 
-            <|user|>
-            Question: {question} 
+            [INST] Question: {question} 
             Context: {context} 
-            Answer: </s>
-            <|assistant|>""",
+            Answer: [/INST]""",
         },
         "zephyr-7b-beta": {
             "model_id": "HuggingFaceH4/zephyr-7b-beta",
@@ -187,8 +187,8 @@ def internlm_partial_text_processor(partial_text, new_text):
         },
     },
     "Chinese":{
-        "qwen1.5-1.8b-chat": {
-            "model_id": "Qwen/Qwen1.5-1.8B-Chat",
+        "qwen1.5-0.5b-chat": {
+            "model_id": "Qwen/Qwen1.5-0.5B-Chat",
             "remote": False,
             "start_message": DEFAULT_SYSTEM_PROMPT_CHINESE,
             "stop_tokens": ["<|im_end|>", "<|endoftext|>"],
@@ -200,11 +200,18 @@ def internlm_partial_text_processor(partial_text, new_text):
             已知内容: {context} 
             回答: <|im_end|><|im_start|>assistant""",
         },
-        "qwen1.5-0.5b-chat": {
-            "model_id": "Qwen/Qwen1.5-0.5B-Chat",
+        "qwen1.5-1.8b-chat": {
+            "model_id": "Qwen/Qwen-1_8B-Chat",
             "remote": False,
             "start_message": DEFAULT_SYSTEM_PROMPT_CHINESE,
             "stop_tokens": ["<|im_end|>", "<|endoftext|>"],
+            "rag_prompt_template": f"""<|im_start|>system
+            {DEFAULT_RAG_PROMPT_CHINESE }<|im_end|>"""
+            + """
+            <|im_start|>user
+            问题: {question} 
+            已知内容: {context} 
+            回答: <|im_end|><|im_start|>assistant""",
         },
         "qwen1.5-7b-chat": {
             "model_id": "Qwen/Qwen1.5-7B-Chat",
@@ -267,6 +274,12 @@ def internlm_partial_text_processor(partial_text, new_text):
             "remote": False,
             "start_message": DEFAULT_SYSTEM_PROMPT_CHINESE,
             "stop_tokens": [2],
+            "rag_prompt_template": f"""{DEFAULT_RAG_PROMPT_CHINESE }"""
+            + """
+            问题: {question} 
+            已知内容: {context} 
+            回答: 
+            """,
         },
         "internlm2-chat-1.8b": {
             "model_id": "internlm/internlm2-chat-1_8b",
@@ -275,6 +288,13 @@ def internlm_partial_text_processor(partial_text, new_text):
             "start_message": DEFAULT_SYSTEM_PROMPT_CHINESE,
             "stop_tokens": [2, 92542],
             "partial_text_processor": internlm_partial_text_processor,
+            "rag_prompt_template": f"""<|im_start|>system
+            {DEFAULT_RAG_PROMPT_CHINESE }<|im_end|>"""
+            + """
+            <|im_start|>user
+            问题: {question} 
+            已知内容: {context} 
+            回答: <|im_end|><|im_start|>assistant""",
         },  
     },
     "Japanese":{
diff --git a/notebooks/254-llm-chatbot/ov_llm_model.py b/notebooks/254-llm-chatbot/ov_llm_model.py
index 10760dd79dd..9b7444edc5b 100644
--- a/notebooks/254-llm-chatbot/ov_llm_model.py
+++ b/notebooks/254-llm-chatbot/ov_llm_model.py
@@ -208,6 +208,25 @@ class OVCHATGLMModel(OVModelForCausalLM):
     """
     Optimum intel compatible model wrapper for CHATGLM2
     """
+
+    def __init__(
+        self,
+        model: "Model",
+        config: "PretrainedConfig" = None,
+        device: str = "CPU",
+        dynamic_shapes: bool = True,
+        ov_config: Optional[Dict[str, str]] = None,
+        model_save_dir: Optional[Union[str, Path]] = None,
+        **kwargs,
+    ):
+        NormalizedConfigManager._conf["chatglm"] = NormalizedTextConfig.with_args(
+            num_layers="num_hidden_layers",
+            num_attention_heads="num_attention_heads",
+            hidden_size="hidden_size",
+        )
+        super().__init__(
+            model, config, device, dynamic_shapes, ov_config, model_save_dir, **kwargs
+        )
     
     def _reshape(self, model: "Model", *args, **kwargs):
         shapes = {}
@@ -224,12 +243,68 @@ def _reshape(self, model: "Model", *args, **kwargs):
                 shapes[inputs][1] = -1
         model.reshape(shapes)
         return model
+
+    @classmethod
+    def _from_pretrained(
+        cls,
+        model_id: Union[str, Path],
+        config: PretrainedConfig,
+        use_auth_token: Optional[Union[bool, str, None]] = None,
+        revision: Optional[Union[str, None]] = None,
+        force_download: bool = False,
+        cache_dir: Optional[str] = None,
+        file_name: Optional[str] = None,
+        subfolder: str = "",
+        from_onnx: bool = False,
+        local_files_only: bool = False,
+        load_in_8bit: bool = False,
+        **kwargs,
+    ):
+        model_path = Path(model_id)
+        default_file_name = OV_XML_FILE_NAME
+        file_name = file_name or default_file_name
+
+        model_cache_path = cls._cached_file(
+            model_path=model_path,
+            use_auth_token=use_auth_token,
+            revision=revision,
+            force_download=force_download,
+            cache_dir=cache_dir,
+            file_name=file_name,
+            subfolder=subfolder,
+            local_files_only=local_files_only,
+        )
+
+        model = cls.load_model(model_cache_path, load_in_8bit=load_in_8bit)
+        init_cls = OVCHATGLMModel
+
+        return init_cls(
+            model=model, config=config, model_save_dir=model_cache_path.parent, **kwargs
+        )
         
         
 class OVQWENModel(OVModelForCausalLM):
     """
     Optimum intel compatible model wrapper for QWEN
     """
+    def __init__(
+        self,
+        model: "Model",
+        config: "PretrainedConfig" = None,
+        device: str = "CPU",
+        dynamic_shapes: bool = True,
+        ov_config: Optional[Dict[str, str]] = None,
+        model_save_dir: Optional[Union[str, Path]] = None,
+        **kwargs,
+    ):
+        NormalizedConfigManager._conf["qwen"] = NormalizedTextConfig.with_args(
+            num_layers="num_hidden_layers",
+            num_attention_heads="num_attention_heads",
+            hidden_size="hidden_size",
+        )
+        super().__init__(
+            model, config, device, dynamic_shapes, ov_config, model_save_dir, **kwargs
+        )
         
     def _reshape(self, model: "Model", *args, **kwargs):
         shapes = {}