diff --git a/notebooks/254-llm-chatbot/254-rag-chatbot.ipynb b/notebooks/254-llm-chatbot/254-rag-chatbot.ipynb index 07f7054f55d..feb37f8404b 100644 --- a/notebooks/254-llm-chatbot/254-rag-chatbot.ipynb +++ b/notebooks/254-llm-chatbot/254-rag-chatbot.ipynb @@ -83,7 +83,7 @@ "\"accelerate\"\\\n", "\"openvino-nightly\"\\\n", "\"gradio\"\\\n", - "\"onnx\" \"einops\" \"transformers_stream_generator\" \"tiktoken\" \"transformers>=4.38.1\" \"bitsandbytes\" \"chromadb\" \"sentence_transformers\" \"langchain>=0.1.7\" \"langchainhub\" \"unstructured\" \"scikit-learn\" \"python-docx\" \"pdfminer.six\"" + "\"onnx\" \"chromadb\" \"sentence_transformers\" \"langchain>=0.1.7\" \"langchainhub\" \"transformers>=4.37.0\" \"unstructured\" \"scikit-learn\" \"python-docx\" \"pdfminer.six\" \"bitsandbytes\"" ] }, { @@ -122,6 +122,7 @@ " except OSError:\n", " notebook_login()\n", "```\n", + "* **mini-cpm-2b-dpo** - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. After Direct Preference Optimization (DPO) fine-tuning, MiniCPM outperforms many popular 7b, 13b and 70b models. More details can be found in [model_card](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp16).\n", "* **red-pajama-3b-chat** - A 2.8B parameter pre-trained language model based on GPT-NEOX architecture. It was developed by Together Computer and leaders from the open-source AI community. The model is fine-tuned on OASST1 and Dolly2 datasets to enhance chatting ability. More details about model can be found in [HuggingFace model card](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1).\n", "* **gemma-7b-it** - Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. This model is instruction-tuned version of 7B parameters model. More details about model can be found in [model card](https://huggingface.co/google/gemma-7b-it).\n", ">**Note**: run model with demo, you will need to accept license agreement. \n", @@ -155,7 +156,7 @@ " except OSError:\n", " notebook_login()\n", "```\n", - "* **qwen1.5-1.8b-chat/qwen1.5-7b-chat** - Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. Qwen1.5 is a language model series including decoder language models of different model sizes. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention. You can find more details about model in the [model repository](https://huggingface.co/Qwen).\n", + "* **qwen1.5-0.5b-chat/qwen1.5-1.8b-chat/qwen1.5-7b-chat** - Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. Qwen1.5 is a language model series including decoder language models of different model sizes. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention. You can find more details about model in the [model repository](https://huggingface.co/Qwen).\n", "* **qwen-7b-chat** - Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. For more details about Qwen, please refer to the [GitHub](https://github.com/QwenLM/Qwen) code repository.\n", "* **mpt-7b-chat** - MPT-7B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference. These architectural changes include performance-optimized layer implementations and the elimination of context length limits by replacing positional embeddings with Attention with Linear Biases ([ALiBi](https://arxiv.org/abs/2108.12409)). Thanks to these modifications, MPT models can be trained with high throughput efficiency and stable convergence. MPT-7B-chat is a chatbot-like model for dialogue generation. It was built by finetuning MPT-7B on the [ShareGPT-Vicuna](https://huggingface.co/datasets/jeffwan/sharegpt_vicuna), [HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3), [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca), [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf), and [Evol-Instruct](https://huggingface.co/datasets/victor123/evol_instruct_70k) datasets. More details about the model can be found in [blog post](https://www.mosaicml.com/blog/mpt-7b), [repository](https://github.com/mosaicml/llm-foundry/) and [HuggingFace model card](https://huggingface.co/mosaicml/mpt-7b-chat).\n", "* **chatglm3-6b** - ChatGLM3-6B is the latest open-source model in the ChatGLM series. While retaining many excellent features such as smooth dialogue and low deployment threshold from the previous two generations, ChatGLM3-6B employs a more diverse training dataset, more sufficient training steps, and a more reasonable training strategy. ChatGLM3-6B adopts a newly designed [Prompt format](https://github.com/THUDM/ChatGLM3/blob/main/PROMPT_en.md), in addition to the normal multi-turn dialogue. You can find more details about model in the [model card](https://huggingface.co/THUDM/chatglm3-6b)\n", @@ -164,7 +165,8 @@ "* **neural-chat-7b-v3-1** - Mistral-7b model fine-tuned using Intel Gaudi. The model fine-tuned on the open source dataset [Open-Orca/SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) and aligned with [Direct Preference Optimization (DPO) algorithm](https://arxiv.org/abs/2305.18290). More details can be found in [model card](https://huggingface.co/Intel/neural-chat-7b-v3-1) and [blog post](https://medium.com/@NeuralCompressor/the-practice-of-supervised-finetuning-and-direct-preference-optimization-on-habana-gaudi2-a1197d8a3cd3).\n", "* **notus-7b-v1** - Notus is a collection of fine-tuned models using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). and related [RLHF](https://huggingface.co/blog/rlhf) techniques. This model is the first version, fine-tuned with DPO over zephyr-7b-sft. Following a data-first approach, the only difference between Notus-7B-v1 and Zephyr-7B-beta is the preference dataset used for dDPO. Proposed approach for dataset creation helps to effectively fine-tune Notus-7b that surpasses Zephyr-7B-beta and Claude 2 on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). More details about model can be found in [model card](https://huggingface.co/argilla/notus-7b-v1).\n", "* **youri-7b-chat** - Youri-7b-chat is a Llama2 based model. [Rinna Co., Ltd.](https://rinna.co.jp/) conducted further pre-training for the Llama2 model with a mixture of English and Japanese datasets to improve Japanese task capability. The model is publicly released on Hugging Face hub. You can find detailed information at the [rinna/youri-7b-chat project page](https://huggingface.co/rinna/youri-7b). \n", - "* **baichuan2-7b-chat** - Baichuan 2 is the new generation of large-scale open-source language models launched by [Baichuan Intelligence inc](https://www.baichuan-ai.com/home). It is trained on a high-quality corpus with 2.6 trillion tokens and has achieved the best performance in authoritative Chinese and English benchmarks of the same size." + "* **baichuan2-7b-chat** - Baichuan 2 is the new generation of large-scale open-source language models launched by [Baichuan Intelligence inc](https://www.baichuan-ai.com/home). It is trained on a high-quality corpus with 2.6 trillion tokens and has achieved the best performance in authoritative Chinese and English benchmarks of the same size.\n", + "* **internlm2-chat-1.8b** - InternLM2 is the second generation InternLM series. Compared to the previous generation model, it shows significant improvements in various capabilities, including reasoning, mathematics, and coding. More details about model can be found in [model repository](https://huggingface.co/internlm)." ] }, { @@ -184,15 +186,15 @@ "name": "stderr", "output_type": "stream", "text": [ - "2024-03-09 06:18:48.945368: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", - "2024-03-09 06:18:48.948759: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.\n", - "2024-03-09 06:18:48.991023: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", - "2024-03-09 06:18:48.991056: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", - "2024-03-09 06:18:48.991087: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", - "2024-03-09 06:18:48.998911: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.\n", - "2024-03-09 06:18:49.000066: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", + "2024-03-06 07:05:19.617312: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", + "2024-03-06 07:05:19.620814: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.\n", + "2024-03-06 07:05:19.663621: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", + "2024-03-06 07:05:19.663653: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", + "2024-03-06 07:05:19.663683: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", + "2024-03-06 07:05:19.671963: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.\n", + "2024-03-06 07:05:19.673938: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", - "2024-03-09 06:18:49.793544: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n" + "2024-03-06 07:05:20.726709: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n" ] } ], @@ -240,7 +242,7 @@ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "b450e5be76a2410785bdbc9604116d53", + "model_id": "5875d10008c442c38ff1d90da874b8dc", "version_major": 2, "version_minor": 0 }, @@ -270,32 +272,32 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 15, "id": "184d1678-0e73-4f35-8af5-1a7d291c2e6e", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "e2075555292c4b91a4125bb5e556856a", + "model_id": "c8d393ddf227409d84313cde097d9896", "version_major": 2, "version_minor": 0 }, "text/plain": [ - "Dropdown(description='Model:', index=3, options=('tiny-llama-1b-chat', 'gemma-2b-it', 'red-pajama-3b-chat', 'g…" + "Dropdown(description='Model:', options=('tiny-llama-1b-chat', 'gemma-2b-it', 'red-pajama-3b-chat', 'gemma-7b-i…" ] }, - "execution_count": 3, + "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "llm_model_ids = [model_id for model_id, model_config in SUPPORTED_LLM_MODELS[model_language.value].items() if model_config.get(\"rag_prompt_template\")]\n", + "llm_model_ids = list(SUPPORTED_LLM_MODELS[model_language.value])\n", "\n", "llm_model_id = widgets.Dropdown(\n", " options=llm_model_ids,\n", - " value=llm_model_ids[0],\n", + " value=llm_model_ids[4],\n", " description=\"Model:\",\n", " disabled=False,\n", ")\n", @@ -305,7 +307,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 16, "id": "49ea95f8", "metadata": {}, "outputs": [ @@ -313,7 +315,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "Selected LLM model zephyr-7b-beta\n" + "Selected LLM model tiny-llama-1b-chat\n" ] } ], @@ -380,14 +382,14 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 10, "id": "c6a38153", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "7e20021257b145519c5eccff50f93b1d", + "model_id": "10a3596a41864effbe8fb9d81723f3ed", "version_major": 2, "version_minor": 0 }, @@ -401,7 +403,7 @@ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "75a4ba1905434e77a2147d94b281fcbb", + "model_id": "da04e6b87e41474194e2de8219da7303", "version_major": 2, "version_minor": 0 }, @@ -415,7 +417,7 @@ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "097690bb77a54eca87851d64b189dda5", + "model_id": "0532ba4230d440aeb3f10cd7becf9156", "version_major": 2, "version_minor": 0 }, @@ -453,7 +455,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 11, "id": "2020d522", "metadata": {}, "outputs": [], @@ -652,7 +654,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 12, "id": "8e127215", "metadata": {}, "outputs": [ @@ -660,7 +662,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "Size of model with INT4 compressed weights is 6943.14 MB\n" + "Size of model with INT4 compressed weights is 1837.58 MB\n" ] } ], @@ -696,14 +698,14 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 17, "id": "ff80e6eb-7923-40ef-93d8-5e6c56e50667", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "533cbd205e284549ab4bf3bba2baea41", + "model_id": "d7e6f5925ad0446ca94e882a8c6503fc", "version_major": 2, "version_minor": 0 }, @@ -711,7 +713,7 @@ "Dropdown(description='Embedding Model:', options=('all-mpnet-base-v2',), value='all-mpnet-base-v2')" ] }, - "execution_count": 8, + "execution_count": 17, "metadata": {}, "output_type": "execute_result" } @@ -794,12 +796,12 @@ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "a208ab4dd63b4a06b4a1cda727653924", + "model_id": "ee9a2eefa59e420693ba647d8d5b70c6", "version_major": 2, "version_minor": 0 }, "text/plain": [ - "Dropdown(description='Device:', options=('CPU', 'GPU', 'AUTO'), value='CPU')" + "Dropdown(description='Device:', options=('CPU', 'AUTO'), value='CPU')" ] }, "execution_count": 11, @@ -856,12 +858,12 @@ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "a6d9632cd2b54e0ba4cf3ca72fad4dc9", + "model_id": "b31a332c59b847269e7a395d34319ad6", "version_major": 2, "version_minor": 0 }, "text/plain": [ - "Dropdown(description='Device:', options=('CPU', 'GPU', 'AUTO'), value='CPU')" + "Dropdown(description='Device:', options=('CPU', 'AUTO'), value='CPU')" ] }, "execution_count": 13, @@ -980,12 +982,12 @@ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "a1f3803a98f2431fbb4614e265bdd401", + "model_id": "d40ce9ed20ac455ab3a92366690955a1", "version_major": 2, "version_minor": 0 }, "text/plain": [ - "Dropdown(description='Model to run:', options=('INT4',), value='INT4')" + "Dropdown(description='Model to run:', options=('FP16',), value='FP16')" ] }, "execution_count": 17, @@ -1014,7 +1016,7 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": 18, "id": "f7f708db-8de1-4efd-94b2-fcabc48d52f4", "metadata": {}, "outputs": [ @@ -1022,13 +1024,81 @@ "name": "stdout", "output_type": "stream", "text": [ - "Loading model from zephyr-7b-beta/INT4_compressed_weights\n" + "Loading model from chatglm3-6b/FP16\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "fcdbf25a78d84edaaf992eef0ff48814", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "tokenizer_config.json: 0%| | 0.00/1.41k [00:00 but got instead\n" - ] - } - ], - "source": [ - "streamer = TextIteratorStreamer(\n", - " tok, timeout=60.0, skip_prompt=True, skip_special_tokens=True\n", - ")\n", - "generate_kwargs = dict(\n", - " model=ov_model,\n", - " tokenizer=tok,\n", - " max_new_tokens=256,\n", - " streamer=streamer,\n", - ")\n", - "if stop_tokens is not None:\n", - " generate_kwargs[\"stopping_criteria\"] = StoppingCriteriaList(stop_tokens)\n", - " \n", - "pipe = pipeline(\"text-generation\", **generate_kwargs)\n", - "llm = HuggingFacePipeline(pipeline=pipe)" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "id": "2312b613-1bd3-4920-97fe-57a8949e22fb", - "metadata": {}, - "outputs": [ - { - "ename": "AttributeError", - "evalue": "'OVModelForCausalLM' object has no attribute 'parameters'", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", - "Cell \u001b[0;32mIn[31], line 4\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtransformers\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m AutoModelForCausalLM, AutoTokenizer, LocalAgent\n\u001b[1;32m 3\u001b[0m agent \u001b[38;5;241m=\u001b[39m LocalAgent(ov_model, tok)\n\u001b[0;32m----> 4\u001b[0m \u001b[43magent\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mhello\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/tools/agents.py:348\u001b[0m, in \u001b[0;36mAgent.run\u001b[0;34m(self, task, return_code, remote, **kwargs)\u001b[0m\n\u001b[1;32m 326\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 327\u001b[0m \u001b[38;5;124;03mSends a request to the agent.\u001b[39;00m\n\u001b[1;32m 328\u001b[0m \n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 345\u001b[0m \u001b[38;5;124;03m```\u001b[39;00m\n\u001b[1;32m 346\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 347\u001b[0m prompt \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mformat_prompt(task)\n\u001b[0;32m--> 348\u001b[0m result \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mgenerate_one\u001b[49m\u001b[43m(\u001b[49m\u001b[43mprompt\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstop\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mTask:\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 349\u001b[0m explanation, code \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mclean_code_for_run(result)\n\u001b[1;32m 351\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mlog(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m==Explanation from the agent==\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;132;01m{\u001b[39;00mexplanation\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/tools/agents.py:744\u001b[0m, in \u001b[0;36mLocalAgent.generate_one\u001b[0;34m(self, prompt, stop)\u001b[0m\n\u001b[1;32m 743\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mgenerate_one\u001b[39m(\u001b[38;5;28mself\u001b[39m, prompt, stop):\n\u001b[0;32m--> 744\u001b[0m encoded_inputs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mtokenizer(prompt, return_tensors\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mpt\u001b[39m\u001b[38;5;124m\"\u001b[39m)\u001b[38;5;241m.\u001b[39mto(\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_model_device\u001b[49m)\n\u001b[1;32m 745\u001b[0m src_len \u001b[38;5;241m=\u001b[39m encoded_inputs[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minput_ids\u001b[39m\u001b[38;5;124m\"\u001b[39m]\u001b[38;5;241m.\u001b[39mshape[\u001b[38;5;241m1\u001b[39m]\n\u001b[1;32m 746\u001b[0m stopping_criteria \u001b[38;5;241m=\u001b[39m StoppingCriteriaList([StopSequenceCriteria(stop, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mtokenizer)])\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/tools/agents.py:740\u001b[0m, in \u001b[0;36mLocalAgent._model_device\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 738\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mhasattr\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmodel, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhf_device_map\u001b[39m\u001b[38;5;124m\"\u001b[39m):\n\u001b[1;32m 739\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mlist\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmodel\u001b[38;5;241m.\u001b[39mhf_device_map\u001b[38;5;241m.\u001b[39mvalues())[\u001b[38;5;241m0\u001b[39m]\n\u001b[0;32m--> 740\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m param \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mmodel\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mparameters\u001b[49m():\n\u001b[1;32m 741\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m param\u001b[38;5;241m.\u001b[39mdevice\n", - "\u001b[0;31mAttributeError\u001b[0m: 'OVModelForCausalLM' object has no attribute 'parameters'" - ] - } - ], - "source": [ - "from transformers import AutoModelForCausalLM, AutoTokenizer, LocalAgent\n", - "\n", - "agent = LocalAgent(ov_model, tok)\n", - "agent.run(\"hello\")" - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "id": "9dc78bda-4be2-4d7b-b0cb-7deb4ed17b16", - "metadata": {}, - "outputs": [ - { - "ename": "AttributeError", - "evalue": "'HuggingFacePipeline' object has no attribute 'bind_tools'", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", - "Cell \u001b[0;32mIn[29], line 36\u001b[0m\n\u001b[1;32m 31\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m base\u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mexponent\n\u001b[1;32m 34\u001b[0m tools \u001b[38;5;241m=\u001b[39m [multiply, add, exponentiate]\n\u001b[0;32m---> 36\u001b[0m llm_with_tools \u001b[38;5;241m=\u001b[39m \u001b[43mllm\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mbind_tools\u001b[49m(tools)\n\u001b[1;32m 38\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01magents\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mformat_scratchpad\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mxml\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m (\n\u001b[1;32m 39\u001b[0m format_xml,\n\u001b[1;32m 40\u001b[0m )\n\u001b[1;32m 41\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01magents\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01moutput_parsers\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mxml\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m XMLAgentOutputParser\n", - "\u001b[0;31mAttributeError\u001b[0m: 'HuggingFacePipeline' object has no attribute 'bind_tools'" - ] - } - ], - "source": [ - "from langchain_core.tools import tool\n", - "from langchain import hub\n", - "from langchain.agents import AgentExecutor, create_react_agent\n", - "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n", - "\n", - "prompt = ChatPromptTemplate.from_messages(\n", - " [\n", - " (\n", - " \"system\",\n", - " \"You are very powerful assistant, but don't know current events\",\n", - " ),\n", - " (\"user\", \"{input}\"),\n", - " MessagesPlaceholder(variable_name=\"agent_scratchpad\"),\n", - " ]\n", - ")\n", - "\n", - "@tool\n", - "def multiply(first_int: int, second_int: int) -> int:\n", - " \"\"\"Multiply two integers together.\"\"\"\n", - " return first_int * second_int\n", - " \n", - "@tool\n", - "def add(first_int: int, second_int: int) -> int:\n", - " \"Add two integers.\"\n", - " return first_int + second_int\n", - "\n", - "\n", - "@tool\n", - "def exponentiate(base: int, exponent: int) -> int:\n", - " \"Exponentiate the base to the exponent power.\"\n", - " return base**exponent\n", - "\n", - "\n", - "tools = [multiply, add, exponentiate]\n", - "\n", - "llm_with_tools = llm.bind_tools(tools)\n", - "\n", - "from langchain.agents.format_scratchpad.xml import (\n", - " format_xml,\n", - ")\n", - "from langchain.agents.output_parsers.xml import XMLAgentOutputParser\n", - "\n", - "agent = (\n", - " {\n", - " \"input\": lambda x: x[\"input\"],\n", - " \"agent_scratchpad\": lambda x: format_xml(\n", - " x[\"intermediate_steps\"]\n", - " ),\n", - " }\n", - " | prompt\n", - " | llm_with_tools\n", - " | XMLAgentOutputParser()\n", - ")\n", - "\n", - "\n", - "# prompt = hub.pull(\"hwchase17/react\")\n", - "# agent = create_react_agent(llm, tools, prompt)\n", - "agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)\n", - "agent_executor.invoke(\n", - " {\n", - " \"input\": \"Take 3 to the fifth power and multiply that by the sum of twelve and three, then square the whole result\"\n", - " }\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 44, - "id": "65b2ebdc-8a50-4580-abeb-3234ea5598a6", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "\n", - "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n" - ] - }, - { - "ename": "TypeError", - "evalue": "TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", - "Cell \u001b[0;32mIn[44], line 34\u001b[0m\n\u001b[1;32m 28\u001b[0m agent \u001b[38;5;241m=\u001b[39m create_structured_chat_agent(llm, tools, prompt)\n\u001b[1;32m 30\u001b[0m agent_executor \u001b[38;5;241m=\u001b[39m AgentExecutor(\n\u001b[1;32m 31\u001b[0m agent\u001b[38;5;241m=\u001b[39magent, tools\u001b[38;5;241m=\u001b[39mtools, verbose\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m, handle_parsing_errors\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m\n\u001b[1;32m 32\u001b[0m )\n\u001b[0;32m---> 34\u001b[0m \u001b[43magent_executor\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43minvoke\u001b[49m\u001b[43m(\u001b[49m\u001b[43m{\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43minput\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m \u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mwhat is LangChain?\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m}\u001b[49m\u001b[43m)\u001b[49m\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain/chains/base.py:163\u001b[0m, in \u001b[0;36mChain.invoke\u001b[0;34m(self, input, config, **kwargs)\u001b[0m\n\u001b[1;32m 161\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mBaseException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 162\u001b[0m run_manager\u001b[38;5;241m.\u001b[39mon_chain_error(e)\n\u001b[0;32m--> 163\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m e\n\u001b[1;32m 164\u001b[0m run_manager\u001b[38;5;241m.\u001b[39mon_chain_end(outputs)\n\u001b[1;32m 166\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m include_run_info:\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain/chains/base.py:153\u001b[0m, in \u001b[0;36mChain.invoke\u001b[0;34m(self, input, config, **kwargs)\u001b[0m\n\u001b[1;32m 150\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 151\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_validate_inputs(inputs)\n\u001b[1;32m 152\u001b[0m outputs \u001b[38;5;241m=\u001b[39m (\n\u001b[0;32m--> 153\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_call\u001b[49m\u001b[43m(\u001b[49m\u001b[43minputs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mrun_manager\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_manager\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 154\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m new_arg_supported\n\u001b[1;32m 155\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_call(inputs)\n\u001b[1;32m 156\u001b[0m )\n\u001b[1;32m 158\u001b[0m final_outputs: Dict[\u001b[38;5;28mstr\u001b[39m, Any] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mprep_outputs(\n\u001b[1;32m 159\u001b[0m inputs, outputs, return_only_outputs\n\u001b[1;32m 160\u001b[0m )\n\u001b[1;32m 161\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mBaseException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain/agents/agent.py:1391\u001b[0m, in \u001b[0;36mAgentExecutor._call\u001b[0;34m(self, inputs, run_manager)\u001b[0m\n\u001b[1;32m 1389\u001b[0m \u001b[38;5;66;03m# We now enter the agent loop (until it returns something).\u001b[39;00m\n\u001b[1;32m 1390\u001b[0m \u001b[38;5;28;01mwhile\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_should_continue(iterations, time_elapsed):\n\u001b[0;32m-> 1391\u001b[0m next_step_output \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_take_next_step\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 1392\u001b[0m \u001b[43m \u001b[49m\u001b[43mname_to_tool_map\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1393\u001b[0m \u001b[43m \u001b[49m\u001b[43mcolor_mapping\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1394\u001b[0m \u001b[43m \u001b[49m\u001b[43minputs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1395\u001b[0m \u001b[43m \u001b[49m\u001b[43mintermediate_steps\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1396\u001b[0m \u001b[43m \u001b[49m\u001b[43mrun_manager\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_manager\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1397\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1398\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(next_step_output, AgentFinish):\n\u001b[1;32m 1399\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_return(\n\u001b[1;32m 1400\u001b[0m next_step_output, intermediate_steps, run_manager\u001b[38;5;241m=\u001b[39mrun_manager\n\u001b[1;32m 1401\u001b[0m )\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain/agents/agent.py:1097\u001b[0m, in \u001b[0;36mAgentExecutor._take_next_step\u001b[0;34m(self, name_to_tool_map, color_mapping, inputs, intermediate_steps, run_manager)\u001b[0m\n\u001b[1;32m 1088\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_take_next_step\u001b[39m(\n\u001b[1;32m 1089\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 1090\u001b[0m name_to_tool_map: Dict[\u001b[38;5;28mstr\u001b[39m, BaseTool],\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 1094\u001b[0m run_manager: Optional[CallbackManagerForChainRun] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 1095\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Union[AgentFinish, List[Tuple[AgentAction, \u001b[38;5;28mstr\u001b[39m]]]:\n\u001b[1;32m 1096\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_consume_next_step(\n\u001b[0;32m-> 1097\u001b[0m [\n\u001b[1;32m 1098\u001b[0m a\n\u001b[1;32m 1099\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m a \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_iter_next_step(\n\u001b[1;32m 1100\u001b[0m name_to_tool_map,\n\u001b[1;32m 1101\u001b[0m color_mapping,\n\u001b[1;32m 1102\u001b[0m inputs,\n\u001b[1;32m 1103\u001b[0m intermediate_steps,\n\u001b[1;32m 1104\u001b[0m run_manager,\n\u001b[1;32m 1105\u001b[0m )\n\u001b[1;32m 1106\u001b[0m ]\n\u001b[1;32m 1107\u001b[0m )\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain/agents/agent.py:1097\u001b[0m, in \u001b[0;36m\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m 1088\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_take_next_step\u001b[39m(\n\u001b[1;32m 1089\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 1090\u001b[0m name_to_tool_map: Dict[\u001b[38;5;28mstr\u001b[39m, BaseTool],\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 1094\u001b[0m run_manager: Optional[CallbackManagerForChainRun] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 1095\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Union[AgentFinish, List[Tuple[AgentAction, \u001b[38;5;28mstr\u001b[39m]]]:\n\u001b[1;32m 1096\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_consume_next_step(\n\u001b[0;32m-> 1097\u001b[0m [\n\u001b[1;32m 1098\u001b[0m a\n\u001b[1;32m 1099\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m a \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_iter_next_step(\n\u001b[1;32m 1100\u001b[0m name_to_tool_map,\n\u001b[1;32m 1101\u001b[0m color_mapping,\n\u001b[1;32m 1102\u001b[0m inputs,\n\u001b[1;32m 1103\u001b[0m intermediate_steps,\n\u001b[1;32m 1104\u001b[0m run_manager,\n\u001b[1;32m 1105\u001b[0m )\n\u001b[1;32m 1106\u001b[0m ]\n\u001b[1;32m 1107\u001b[0m )\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain/agents/agent.py:1125\u001b[0m, in \u001b[0;36mAgentExecutor._iter_next_step\u001b[0;34m(self, name_to_tool_map, color_mapping, inputs, intermediate_steps, run_manager)\u001b[0m\n\u001b[1;32m 1122\u001b[0m intermediate_steps \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_prepare_intermediate_steps(intermediate_steps)\n\u001b[1;32m 1124\u001b[0m \u001b[38;5;66;03m# Call the LLM to see what to do.\u001b[39;00m\n\u001b[0;32m-> 1125\u001b[0m output \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43magent\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mplan\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 1126\u001b[0m \u001b[43m \u001b[49m\u001b[43mintermediate_steps\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1127\u001b[0m \u001b[43m \u001b[49m\u001b[43mcallbacks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_manager\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_child\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mrun_manager\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01melse\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m 1128\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43minputs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1129\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1130\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m OutputParserException \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 1131\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandle_parsing_errors, \u001b[38;5;28mbool\u001b[39m):\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain/agents/agent.py:387\u001b[0m, in \u001b[0;36mRunnableAgent.plan\u001b[0;34m(self, intermediate_steps, callbacks, **kwargs)\u001b[0m\n\u001b[1;32m 381\u001b[0m \u001b[38;5;66;03m# Use streaming to make sure that the underlying LLM is invoked in a streaming\u001b[39;00m\n\u001b[1;32m 382\u001b[0m \u001b[38;5;66;03m# fashion to make it possible to get access to the individual LLM tokens\u001b[39;00m\n\u001b[1;32m 383\u001b[0m \u001b[38;5;66;03m# when using stream_log with the Agent Executor.\u001b[39;00m\n\u001b[1;32m 384\u001b[0m \u001b[38;5;66;03m# Because the response from the plan is not a generator, we need to\u001b[39;00m\n\u001b[1;32m 385\u001b[0m \u001b[38;5;66;03m# accumulate the output into final output and return that.\u001b[39;00m\n\u001b[1;32m 386\u001b[0m final_output: Any \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m--> 387\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m chunk \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mrunnable\u001b[38;5;241m.\u001b[39mstream(inputs, config\u001b[38;5;241m=\u001b[39m{\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcallbacks\u001b[39m\u001b[38;5;124m\"\u001b[39m: callbacks}):\n\u001b[1;32m 388\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m final_output \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m 389\u001b[0m final_output \u001b[38;5;241m=\u001b[39m chunk\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/runnables/base.py:2446\u001b[0m, in \u001b[0;36mRunnableSequence.stream\u001b[0;34m(self, input, config, **kwargs)\u001b[0m\n\u001b[1;32m 2440\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mstream\u001b[39m(\n\u001b[1;32m 2441\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 2442\u001b[0m \u001b[38;5;28minput\u001b[39m: Input,\n\u001b[1;32m 2443\u001b[0m config: Optional[RunnableConfig] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 2444\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Optional[Any],\n\u001b[1;32m 2445\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Iterator[Output]:\n\u001b[0;32m-> 2446\u001b[0m \u001b[38;5;28;01myield from\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mtransform(\u001b[38;5;28miter\u001b[39m([\u001b[38;5;28minput\u001b[39m]), config, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/runnables/base.py:2433\u001b[0m, in \u001b[0;36mRunnableSequence.transform\u001b[0;34m(self, input, config, **kwargs)\u001b[0m\n\u001b[1;32m 2427\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mtransform\u001b[39m(\n\u001b[1;32m 2428\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 2429\u001b[0m \u001b[38;5;28minput\u001b[39m: Iterator[Input],\n\u001b[1;32m 2430\u001b[0m config: Optional[RunnableConfig] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 2431\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Optional[Any],\n\u001b[1;32m 2432\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Iterator[Output]:\n\u001b[0;32m-> 2433\u001b[0m \u001b[38;5;28;01myield from\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_transform_stream_with_config(\n\u001b[1;32m 2434\u001b[0m \u001b[38;5;28minput\u001b[39m,\n\u001b[1;32m 2435\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_transform,\n\u001b[1;32m 2436\u001b[0m patch_config(config, run_name\u001b[38;5;241m=\u001b[39m(config \u001b[38;5;129;01mor\u001b[39;00m {})\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrun_name\u001b[39m\u001b[38;5;124m\"\u001b[39m) \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mname),\n\u001b[1;32m 2437\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[1;32m 2438\u001b[0m )\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/runnables/base.py:1513\u001b[0m, in \u001b[0;36mRunnable._transform_stream_with_config\u001b[0;34m(self, input, transformer, config, run_type, **kwargs)\u001b[0m\n\u001b[1;32m 1511\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 1512\u001b[0m \u001b[38;5;28;01mwhile\u001b[39;00m \u001b[38;5;28;01mTrue\u001b[39;00m:\n\u001b[0;32m-> 1513\u001b[0m chunk: Output \u001b[38;5;241m=\u001b[39m \u001b[43mcontext\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mnext\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43miterator\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;66;03m# type: ignore\u001b[39;00m\n\u001b[1;32m 1514\u001b[0m \u001b[38;5;28;01myield\u001b[39;00m chunk\n\u001b[1;32m 1515\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m final_output_supported:\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/runnables/base.py:2397\u001b[0m, in \u001b[0;36mRunnableSequence._transform\u001b[0;34m(self, input, run_manager, config)\u001b[0m\n\u001b[1;32m 2388\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m step \u001b[38;5;129;01min\u001b[39;00m steps:\n\u001b[1;32m 2389\u001b[0m final_pipeline \u001b[38;5;241m=\u001b[39m step\u001b[38;5;241m.\u001b[39mtransform(\n\u001b[1;32m 2390\u001b[0m final_pipeline,\n\u001b[1;32m 2391\u001b[0m patch_config(\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 2394\u001b[0m ),\n\u001b[1;32m 2395\u001b[0m )\n\u001b[0;32m-> 2397\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m output \u001b[38;5;129;01min\u001b[39;00m final_pipeline:\n\u001b[1;32m 2398\u001b[0m \u001b[38;5;28;01myield\u001b[39;00m output\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/runnables/base.py:1051\u001b[0m, in \u001b[0;36mRunnable.transform\u001b[0;34m(self, input, config, **kwargs)\u001b[0m\n\u001b[1;32m 1048\u001b[0m final: Input\n\u001b[1;32m 1049\u001b[0m got_first_val \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mFalse\u001b[39;00m\n\u001b[0;32m-> 1051\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m chunk \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28minput\u001b[39m:\n\u001b[1;32m 1052\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m got_first_val:\n\u001b[1;32m 1053\u001b[0m final \u001b[38;5;241m=\u001b[39m chunk\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/runnables/base.py:4173\u001b[0m, in \u001b[0;36mRunnableBindingBase.transform\u001b[0;34m(self, input, config, **kwargs)\u001b[0m\n\u001b[1;32m 4167\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mtransform\u001b[39m(\n\u001b[1;32m 4168\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 4169\u001b[0m \u001b[38;5;28minput\u001b[39m: Iterator[Input],\n\u001b[1;32m 4170\u001b[0m config: Optional[RunnableConfig] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 4171\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m 4172\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Iterator[Output]:\n\u001b[0;32m-> 4173\u001b[0m \u001b[38;5;28;01myield from\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mbound\u001b[38;5;241m.\u001b[39mtransform(\n\u001b[1;32m 4174\u001b[0m \u001b[38;5;28minput\u001b[39m,\n\u001b[1;32m 4175\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_merge_configs(config),\n\u001b[1;32m 4176\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39m{\u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mkwargs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs},\n\u001b[1;32m 4177\u001b[0m )\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/runnables/base.py:1061\u001b[0m, in \u001b[0;36mRunnable.transform\u001b[0;34m(self, input, config, **kwargs)\u001b[0m\n\u001b[1;32m 1058\u001b[0m final \u001b[38;5;241m=\u001b[39m final \u001b[38;5;241m+\u001b[39m chunk \u001b[38;5;66;03m# type: ignore[operator]\u001b[39;00m\n\u001b[1;32m 1060\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m got_first_val:\n\u001b[0;32m-> 1061\u001b[0m \u001b[38;5;28;01myield from\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mstream(final, config, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/language_models/llms.py:409\u001b[0m, in \u001b[0;36mBaseLLM.stream\u001b[0;34m(self, input, config, stop, **kwargs)\u001b[0m\n\u001b[1;32m 399\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mstream\u001b[39m(\n\u001b[1;32m 400\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 401\u001b[0m \u001b[38;5;28minput\u001b[39m: LanguageModelInput,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 405\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m 406\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Iterator[\u001b[38;5;28mstr\u001b[39m]:\n\u001b[1;32m 407\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mtype\u001b[39m(\u001b[38;5;28mself\u001b[39m)\u001b[38;5;241m.\u001b[39m_stream \u001b[38;5;241m==\u001b[39m BaseLLM\u001b[38;5;241m.\u001b[39m_stream:\n\u001b[1;32m 408\u001b[0m \u001b[38;5;66;03m# model doesn't implement streaming, so use default implementation\u001b[39;00m\n\u001b[0;32m--> 409\u001b[0m \u001b[38;5;28;01myield\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43minvoke\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43minput\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstop\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 410\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 411\u001b[0m prompt \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_convert_input(\u001b[38;5;28minput\u001b[39m)\u001b[38;5;241m.\u001b[39mto_string()\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/language_models/llms.py:273\u001b[0m, in \u001b[0;36mBaseLLM.invoke\u001b[0;34m(self, input, config, stop, **kwargs)\u001b[0m\n\u001b[1;32m 263\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21minvoke\u001b[39m(\n\u001b[1;32m 264\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 265\u001b[0m \u001b[38;5;28minput\u001b[39m: LanguageModelInput,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 269\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m 270\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[38;5;28mstr\u001b[39m:\n\u001b[1;32m 271\u001b[0m config \u001b[38;5;241m=\u001b[39m ensure_config(config)\n\u001b[1;32m 272\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m (\n\u001b[0;32m--> 273\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mgenerate_prompt\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 274\u001b[0m \u001b[43m \u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_convert_input\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43minput\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 275\u001b[0m \u001b[43m \u001b[49m\u001b[43mstop\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 276\u001b[0m \u001b[43m \u001b[49m\u001b[43mcallbacks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mcallbacks\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 277\u001b[0m \u001b[43m \u001b[49m\u001b[43mtags\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mtags\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 278\u001b[0m \u001b[43m \u001b[49m\u001b[43mmetadata\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mmetadata\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 279\u001b[0m \u001b[43m \u001b[49m\u001b[43mrun_name\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mrun_name\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 280\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 281\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 282\u001b[0m \u001b[38;5;241m.\u001b[39mgenerations[\u001b[38;5;241m0\u001b[39m][\u001b[38;5;241m0\u001b[39m]\n\u001b[1;32m 283\u001b[0m \u001b[38;5;241m.\u001b[39mtext\n\u001b[1;32m 284\u001b[0m )\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/language_models/llms.py:568\u001b[0m, in \u001b[0;36mBaseLLM.generate_prompt\u001b[0;34m(self, prompts, stop, callbacks, **kwargs)\u001b[0m\n\u001b[1;32m 560\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mgenerate_prompt\u001b[39m(\n\u001b[1;32m 561\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 562\u001b[0m prompts: List[PromptValue],\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 565\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m 566\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m LLMResult:\n\u001b[1;32m 567\u001b[0m prompt_strings \u001b[38;5;241m=\u001b[39m [p\u001b[38;5;241m.\u001b[39mto_string() \u001b[38;5;28;01mfor\u001b[39;00m p \u001b[38;5;129;01min\u001b[39;00m prompts]\n\u001b[0;32m--> 568\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mgenerate\u001b[49m\u001b[43m(\u001b[49m\u001b[43mprompt_strings\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstop\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcallbacks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcallbacks\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/language_models/llms.py:741\u001b[0m, in \u001b[0;36mBaseLLM.generate\u001b[0;34m(self, prompts, stop, callbacks, tags, metadata, run_name, **kwargs)\u001b[0m\n\u001b[1;32m 725\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[1;32m 726\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mAsked to cache, but no cache found at `langchain.cache`.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 727\u001b[0m )\n\u001b[1;32m 728\u001b[0m run_managers \u001b[38;5;241m=\u001b[39m [\n\u001b[1;32m 729\u001b[0m callback_manager\u001b[38;5;241m.\u001b[39mon_llm_start(\n\u001b[1;32m 730\u001b[0m dumpd(\u001b[38;5;28mself\u001b[39m),\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 739\u001b[0m )\n\u001b[1;32m 740\u001b[0m ]\n\u001b[0;32m--> 741\u001b[0m output \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_generate_helper\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 742\u001b[0m \u001b[43m \u001b[49m\u001b[43mprompts\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mrun_managers\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mbool\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mnew_arg_supported\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\n\u001b[1;32m 743\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 744\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m output\n\u001b[1;32m 745\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(missing_prompts) \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m0\u001b[39m:\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/language_models/llms.py:605\u001b[0m, in \u001b[0;36mBaseLLM._generate_helper\u001b[0;34m(self, prompts, stop, run_managers, new_arg_supported, **kwargs)\u001b[0m\n\u001b[1;32m 603\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m run_manager \u001b[38;5;129;01min\u001b[39;00m run_managers:\n\u001b[1;32m 604\u001b[0m run_manager\u001b[38;5;241m.\u001b[39mon_llm_error(e, response\u001b[38;5;241m=\u001b[39mLLMResult(generations\u001b[38;5;241m=\u001b[39m[]))\n\u001b[0;32m--> 605\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m e\n\u001b[1;32m 606\u001b[0m flattened_outputs \u001b[38;5;241m=\u001b[39m output\u001b[38;5;241m.\u001b[39mflatten()\n\u001b[1;32m 607\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m manager, flattened_output \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mzip\u001b[39m(run_managers, flattened_outputs):\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_core/language_models/llms.py:592\u001b[0m, in \u001b[0;36mBaseLLM._generate_helper\u001b[0;34m(self, prompts, stop, run_managers, new_arg_supported, **kwargs)\u001b[0m\n\u001b[1;32m 582\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_generate_helper\u001b[39m(\n\u001b[1;32m 583\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 584\u001b[0m prompts: List[\u001b[38;5;28mstr\u001b[39m],\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 588\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m 589\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m LLMResult:\n\u001b[1;32m 590\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 591\u001b[0m output \u001b[38;5;241m=\u001b[39m (\n\u001b[0;32m--> 592\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_generate\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 593\u001b[0m \u001b[43m \u001b[49m\u001b[43mprompts\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 594\u001b[0m \u001b[43m \u001b[49m\u001b[43mstop\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 595\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;66;43;03m# TODO: support multiple run managers\u001b[39;49;00m\n\u001b[1;32m 596\u001b[0m \u001b[43m \u001b[49m\u001b[43mrun_manager\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_managers\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mrun_managers\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01melse\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m 597\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 598\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 599\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m new_arg_supported\n\u001b[1;32m 600\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_generate(prompts, stop\u001b[38;5;241m=\u001b[39mstop)\n\u001b[1;32m 601\u001b[0m )\n\u001b[1;32m 602\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mBaseException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 603\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m run_manager \u001b[38;5;129;01min\u001b[39;00m run_managers:\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/langchain_community/llms/huggingface_pipeline.py:261\u001b[0m, in \u001b[0;36mHuggingFacePipeline._generate\u001b[0;34m(self, prompts, stop, run_manager, **kwargs)\u001b[0m\n\u001b[1;32m 258\u001b[0m batch_prompts \u001b[38;5;241m=\u001b[39m prompts[i : i \u001b[38;5;241m+\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mbatch_size]\n\u001b[1;32m 260\u001b[0m \u001b[38;5;66;03m# Process batch of prompts\u001b[39;00m\n\u001b[0;32m--> 261\u001b[0m responses \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mpipeline\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 262\u001b[0m \u001b[43m \u001b[49m\u001b[43mbatch_prompts\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 263\u001b[0m \u001b[43m \u001b[49m\u001b[43mstop_sequence\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 264\u001b[0m \u001b[43m \u001b[49m\u001b[43mreturn_full_text\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m 265\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mpipeline_kwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 266\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 268\u001b[0m \u001b[38;5;66;03m# Process each response in the batch\u001b[39;00m\n\u001b[1;32m 269\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m j, response \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28menumerate\u001b[39m(responses):\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/pipelines/text_generation.py:241\u001b[0m, in \u001b[0;36mTextGenerationPipeline.__call__\u001b[0;34m(self, text_inputs, **kwargs)\u001b[0m\n\u001b[1;32m 239\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28msuper\u001b[39m()\u001b[38;5;241m.\u001b[39m\u001b[38;5;21m__call__\u001b[39m(chats, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n\u001b[1;32m 240\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m--> 241\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[38;5;21;43m__call__\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mtext_inputs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/pipelines/base.py:1148\u001b[0m, in \u001b[0;36mPipeline.__call__\u001b[0;34m(self, inputs, num_workers, batch_size, *args, **kwargs)\u001b[0m\n\u001b[1;32m 1145\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 1146\u001b[0m batch_size \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_batch_size\n\u001b[0;32m-> 1148\u001b[0m preprocess_params, forward_params, postprocess_params \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_sanitize_parameters\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1150\u001b[0m \u001b[38;5;66;03m# Fuse __init__ params and __call__ params without modifying the __init__ ones.\u001b[39;00m\n\u001b[1;32m 1151\u001b[0m preprocess_params \u001b[38;5;241m=\u001b[39m {\u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_preprocess_params, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mpreprocess_params}\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/pipelines/text_generation.py:171\u001b[0m, in \u001b[0;36mTextGenerationPipeline._sanitize_parameters\u001b[0;34m(self, return_full_text, return_tensors, return_text, return_type, clean_up_tokenization_spaces, prefix, handle_long_generation, stop_sequence, add_special_tokens, truncation, padding, max_length, **generate_kwargs)\u001b[0m\n\u001b[1;32m 168\u001b[0m postprocess_params[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mclean_up_tokenization_spaces\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m clean_up_tokenization_spaces\n\u001b[1;32m 170\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m stop_sequence \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m--> 171\u001b[0m stop_sequence_ids \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mtokenizer\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mencode\u001b[49m\u001b[43m(\u001b[49m\u001b[43mstop_sequence\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43madd_special_tokens\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m\n\u001b[1;32m 172\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(stop_sequence_ids) \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[1;32m 173\u001b[0m warnings\u001b[38;5;241m.\u001b[39mwarn(\n\u001b[1;32m 174\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mStopping on a multiple token sequence is not yet supported on transformers. The first token of\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 175\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m the stop sequence will be used as the stop sequence string in the interim.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 176\u001b[0m )\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2600\u001b[0m, in \u001b[0;36mPreTrainedTokenizerBase.encode\u001b[0;34m(self, text, text_pair, add_special_tokens, padding, truncation, max_length, stride, return_tensors, **kwargs)\u001b[0m\n\u001b[1;32m 2563\u001b[0m \u001b[38;5;129m@add_end_docstrings\u001b[39m(\n\u001b[1;32m 2564\u001b[0m ENCODE_KWARGS_DOCSTRING,\n\u001b[1;32m 2565\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 2583\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[1;32m 2584\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m List[\u001b[38;5;28mint\u001b[39m]:\n\u001b[1;32m 2585\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 2586\u001b[0m \u001b[38;5;124;03m Converts a string to a sequence of ids (integer), using the tokenizer and vocabulary.\u001b[39;00m\n\u001b[1;32m 2587\u001b[0m \n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 2598\u001b[0m \u001b[38;5;124;03m method).\u001b[39;00m\n\u001b[1;32m 2599\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[0;32m-> 2600\u001b[0m encoded_inputs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mencode_plus\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 2601\u001b[0m \u001b[43m \u001b[49m\u001b[43mtext\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2602\u001b[0m \u001b[43m \u001b[49m\u001b[43mtext_pair\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtext_pair\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2603\u001b[0m \u001b[43m \u001b[49m\u001b[43madd_special_tokens\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43madd_special_tokens\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2604\u001b[0m \u001b[43m \u001b[49m\u001b[43mpadding\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpadding\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2605\u001b[0m \u001b[43m \u001b[49m\u001b[43mtruncation\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtruncation\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2606\u001b[0m \u001b[43m \u001b[49m\u001b[43mmax_length\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmax_length\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2607\u001b[0m \u001b[43m \u001b[49m\u001b[43mstride\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstride\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2608\u001b[0m \u001b[43m \u001b[49m\u001b[43mreturn_tensors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_tensors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2609\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2610\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 2612\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m encoded_inputs[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minput_ids\u001b[39m\u001b[38;5;124m\"\u001b[39m]\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3008\u001b[0m, in \u001b[0;36mPreTrainedTokenizerBase.encode_plus\u001b[0;34m(self, text, text_pair, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)\u001b[0m\n\u001b[1;32m 2998\u001b[0m \u001b[38;5;66;03m# Backward compatibility for 'truncation_strategy', 'pad_to_max_length'\u001b[39;00m\n\u001b[1;32m 2999\u001b[0m padding_strategy, truncation_strategy, max_length, kwargs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_get_padding_truncation_strategies(\n\u001b[1;32m 3000\u001b[0m padding\u001b[38;5;241m=\u001b[39mpadding,\n\u001b[1;32m 3001\u001b[0m truncation\u001b[38;5;241m=\u001b[39mtruncation,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 3005\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[1;32m 3006\u001b[0m )\n\u001b[0;32m-> 3008\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_encode_plus\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 3009\u001b[0m \u001b[43m \u001b[49m\u001b[43mtext\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtext\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3010\u001b[0m \u001b[43m \u001b[49m\u001b[43mtext_pair\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtext_pair\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3011\u001b[0m \u001b[43m \u001b[49m\u001b[43madd_special_tokens\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43madd_special_tokens\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3012\u001b[0m \u001b[43m \u001b[49m\u001b[43mpadding_strategy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpadding_strategy\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3013\u001b[0m \u001b[43m \u001b[49m\u001b[43mtruncation_strategy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtruncation_strategy\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3014\u001b[0m \u001b[43m \u001b[49m\u001b[43mmax_length\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmax_length\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3015\u001b[0m \u001b[43m \u001b[49m\u001b[43mstride\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstride\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3016\u001b[0m \u001b[43m \u001b[49m\u001b[43mis_split_into_words\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mis_split_into_words\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3017\u001b[0m \u001b[43m \u001b[49m\u001b[43mpad_to_multiple_of\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpad_to_multiple_of\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3018\u001b[0m \u001b[43m \u001b[49m\u001b[43mreturn_tensors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_tensors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3019\u001b[0m \u001b[43m \u001b[49m\u001b[43mreturn_token_type_ids\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_token_type_ids\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3020\u001b[0m \u001b[43m \u001b[49m\u001b[43mreturn_attention_mask\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_attention_mask\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3021\u001b[0m \u001b[43m \u001b[49m\u001b[43mreturn_overflowing_tokens\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_overflowing_tokens\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3022\u001b[0m \u001b[43m \u001b[49m\u001b[43mreturn_special_tokens_mask\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_special_tokens_mask\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3023\u001b[0m \u001b[43m \u001b[49m\u001b[43mreturn_offsets_mapping\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_offsets_mapping\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3024\u001b[0m \u001b[43m \u001b[49m\u001b[43mreturn_length\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_length\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3025\u001b[0m \u001b[43m \u001b[49m\u001b[43mverbose\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mverbose\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3026\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3027\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py:576\u001b[0m, in \u001b[0;36mPreTrainedTokenizerFast._encode_plus\u001b[0;34m(self, text, text_pair, add_special_tokens, padding_strategy, truncation_strategy, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)\u001b[0m\n\u001b[1;32m 554\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_encode_plus\u001b[39m(\n\u001b[1;32m 555\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 556\u001b[0m text: Union[TextInput, PreTokenizedInput],\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 573\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[1;32m 574\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m BatchEncoding:\n\u001b[1;32m 575\u001b[0m batched_input \u001b[38;5;241m=\u001b[39m [(text, text_pair)] \u001b[38;5;28;01mif\u001b[39;00m text_pair \u001b[38;5;28;01melse\u001b[39;00m [text]\n\u001b[0;32m--> 576\u001b[0m batched_output \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_batch_encode_plus\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 577\u001b[0m \u001b[43m \u001b[49m\u001b[43mbatched_input\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 578\u001b[0m \u001b[43m \u001b[49m\u001b[43mis_split_into_words\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mis_split_into_words\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 579\u001b[0m \u001b[43m \u001b[49m\u001b[43madd_special_tokens\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43madd_special_tokens\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 580\u001b[0m \u001b[43m \u001b[49m\u001b[43mpadding_strategy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpadding_strategy\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 581\u001b[0m \u001b[43m \u001b[49m\u001b[43mtruncation_strategy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtruncation_strategy\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 582\u001b[0m \u001b[43m \u001b[49m\u001b[43mmax_length\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmax_length\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 583\u001b[0m \u001b[43m \u001b[49m\u001b[43mstride\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstride\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 584\u001b[0m \u001b[43m \u001b[49m\u001b[43mpad_to_multiple_of\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpad_to_multiple_of\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 585\u001b[0m \u001b[43m \u001b[49m\u001b[43mreturn_tensors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_tensors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 586\u001b[0m \u001b[43m \u001b[49m\u001b[43mreturn_token_type_ids\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_token_type_ids\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 587\u001b[0m \u001b[43m \u001b[49m\u001b[43mreturn_attention_mask\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_attention_mask\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 588\u001b[0m \u001b[43m \u001b[49m\u001b[43mreturn_overflowing_tokens\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_overflowing_tokens\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 589\u001b[0m \u001b[43m \u001b[49m\u001b[43mreturn_special_tokens_mask\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_special_tokens_mask\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 590\u001b[0m \u001b[43m \u001b[49m\u001b[43mreturn_offsets_mapping\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_offsets_mapping\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 591\u001b[0m \u001b[43m \u001b[49m\u001b[43mreturn_length\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreturn_length\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 592\u001b[0m \u001b[43m \u001b[49m\u001b[43mverbose\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mverbose\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 593\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 594\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 596\u001b[0m \u001b[38;5;66;03m# Return tensor is None, then we can remove the leading batch axis\u001b[39;00m\n\u001b[1;32m 597\u001b[0m \u001b[38;5;66;03m# Overflowing tokens are returned as a batch of output so we keep them in this case\u001b[39;00m\n\u001b[1;32m 598\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m return_tensors \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m return_overflowing_tokens:\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py:504\u001b[0m, in \u001b[0;36mPreTrainedTokenizerFast._batch_encode_plus\u001b[0;34m(self, batch_text_or_text_pairs, add_special_tokens, padding_strategy, truncation_strategy, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose)\u001b[0m\n\u001b[1;32m 495\u001b[0m \u001b[38;5;66;03m# Set the truncation and padding strategy and restore the initial configuration\u001b[39;00m\n\u001b[1;32m 496\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mset_truncation_and_padding(\n\u001b[1;32m 497\u001b[0m padding_strategy\u001b[38;5;241m=\u001b[39mpadding_strategy,\n\u001b[1;32m 498\u001b[0m truncation_strategy\u001b[38;5;241m=\u001b[39mtruncation_strategy,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 501\u001b[0m pad_to_multiple_of\u001b[38;5;241m=\u001b[39mpad_to_multiple_of,\n\u001b[1;32m 502\u001b[0m )\n\u001b[0;32m--> 504\u001b[0m encodings \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_tokenizer\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mencode_batch\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 505\u001b[0m \u001b[43m \u001b[49m\u001b[43mbatch_text_or_text_pairs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 506\u001b[0m \u001b[43m \u001b[49m\u001b[43madd_special_tokens\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43madd_special_tokens\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 507\u001b[0m \u001b[43m \u001b[49m\u001b[43mis_pretokenized\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mis_split_into_words\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 508\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 510\u001b[0m \u001b[38;5;66;03m# Convert encoding to dict\u001b[39;00m\n\u001b[1;32m 511\u001b[0m \u001b[38;5;66;03m# `Tokens` has type: Tuple[\u001b[39;00m\n\u001b[1;32m 512\u001b[0m \u001b[38;5;66;03m# List[Dict[str, List[List[int]]]] or List[Dict[str, 2D-Tensor]],\u001b[39;00m\n\u001b[1;32m 513\u001b[0m \u001b[38;5;66;03m# List[EncodingFast]\u001b[39;00m\n\u001b[1;32m 514\u001b[0m \u001b[38;5;66;03m# ]\u001b[39;00m\n\u001b[1;32m 515\u001b[0m \u001b[38;5;66;03m# with nested dimensions corresponding to batch, overflows, sequence length\u001b[39;00m\n\u001b[1;32m 516\u001b[0m tokens_and_encodings \u001b[38;5;241m=\u001b[39m [\n\u001b[1;32m 517\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_convert_encoding(\n\u001b[1;32m 518\u001b[0m encoding\u001b[38;5;241m=\u001b[39mencoding,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 527\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m encoding \u001b[38;5;129;01min\u001b[39;00m encodings\n\u001b[1;32m 528\u001b[0m ]\n", - "\u001b[0;31mTypeError\u001b[0m: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]" - ] - } - ], - "source": [ - "from langchain_core.tools import tool\n", - "from langchain import hub\n", - "from langchain.agents import AgentExecutor, create_structured_chat_agent\n", - "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n", - "\n", - "@tool\n", - "def multiply(first_int: int, second_int: int) -> int:\n", - " \"\"\"Multiply two integers together.\"\"\"\n", - " return first_int * second_int\n", - " \n", - "@tool\n", - "def add(first_int: int, second_int: int) -> int:\n", - " \"Add two integers.\"\n", - " return first_int + second_int\n", - "\n", - "\n", - "@tool\n", - "def exponentiate(base: int, exponent: int) -> int:\n", - " \"Exponentiate the base to the exponent power.\"\n", - " return base**exponent\n", - "\n", - "\n", - "tools = [multiply, add, exponentiate]\n", - "\n", - "\n", - "prompt = hub.pull(\"hwchase17/structured-chat-agent\")\n", - "\n", - "agent = create_structured_chat_agent(llm, tools, prompt)\n", - "\n", - "agent_executor = AgentExecutor(\n", - " agent=agent, tools=tools, verbose=True, handle_parsing_errors=True\n", - ")\n", - "\n", - "agent_executor.invoke({\"input\": \"what is LangChain?\"})" - ] - }, { "attachments": {}, "cell_type": "markdown", @@ -1359,7 +1194,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 31, "id": "5b97eeeb", "metadata": {}, "outputs": [], @@ -1433,38 +1268,10 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": null, "id": "0908e5e9-4dcb-4fc8-8480-3cf70fd5e934", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Running on local URL: http://10.3.233.99:5344\n" - ] - }, - { - "ename": "ValueError", - "evalue": "When localhost is not accessible, a shareable link must be created. Please set share=True or check your proxy settings to allow access to localhost.", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", - "Cell \u001b[0;32mIn[22], line 324\u001b[0m\n\u001b[1;32m 318\u001b[0m demo\u001b[38;5;241m.\u001b[39mqueue()\n\u001b[1;32m 319\u001b[0m \u001b[38;5;66;03m# if you are launching remotely, specify server_name and server_port\u001b[39;00m\n\u001b[1;32m 320\u001b[0m \u001b[38;5;66;03m# demo.launch(server_name='your server name', server_port='server port in int')\u001b[39;00m\n\u001b[1;32m 321\u001b[0m \u001b[38;5;66;03m# if you have any issue to launch on your platform, you can pass share=True to launch method:\u001b[39;00m\n\u001b[1;32m 322\u001b[0m \u001b[38;5;66;03m# demo.launch(share=True)\u001b[39;00m\n\u001b[1;32m 323\u001b[0m \u001b[38;5;66;03m# it creates a publicly shareable link for the interface. Read more in the docs: https://gradio.app/docs/\u001b[39;00m\n\u001b[0;32m--> 324\u001b[0m \u001b[43mdemo\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlaunch\u001b[49m\u001b[43m(\u001b[49m\u001b[43mserver_name\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43m10.3.233.99\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mserver_port\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;241;43m5344\u001b[39;49m\u001b[43m)\u001b[49m\n", - "File \u001b[0;32m~/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/gradio/blocks.py:2165\u001b[0m, in \u001b[0;36mBlocks.launch\u001b[0;34m(self, inline, inbrowser, share, debug, max_threads, auth, auth_message, prevent_thread_lock, show_error, server_name, server_port, height, width, favicon_path, ssl_keyfile, ssl_certfile, ssl_keyfile_password, ssl_verify, quiet, show_api, allowed_paths, blocked_paths, root_path, app_kwargs, state_session_capacity, share_server_address, share_server_protocol, auth_dependency, _frontend)\u001b[0m\n\u001b[1;32m 2157\u001b[0m \u001b[38;5;66;03m# If running in a colab or not able to access localhost,\u001b[39;00m\n\u001b[1;32m 2158\u001b[0m \u001b[38;5;66;03m# a shareable link must be created.\u001b[39;00m\n\u001b[1;32m 2159\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m (\n\u001b[1;32m 2160\u001b[0m _frontend\n\u001b[1;32m 2161\u001b[0m \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m wasm_utils\u001b[38;5;241m.\u001b[39mIS_WASM\n\u001b[1;32m 2162\u001b[0m \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m networking\u001b[38;5;241m.\u001b[39murl_ok(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mlocal_url)\n\u001b[1;32m 2163\u001b[0m \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mshare\n\u001b[1;32m 2164\u001b[0m ):\n\u001b[0;32m-> 2165\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[1;32m 2166\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mWhen localhost is not accessible, a shareable link must be created. Please set share=True or check your proxy settings to allow access to localhost.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 2167\u001b[0m )\n\u001b[1;32m 2169\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mis_colab \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m quiet:\n\u001b[1;32m 2170\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m debug:\n", - "\u001b[0;31mValueError\u001b[0m: When localhost is not accessible, a shareable link must be created. Please set share=True or check your proxy settings to allow access to localhost." - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "device must be of type but got instead\n", - "Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\n" - ] - } - ], + "outputs": [], "source": [ "from langchain.prompts import PromptTemplate\n", "from langchain.vectorstores import Chroma\n", @@ -1789,12 +1596,12 @@ "# if you have any issue to launch on your platform, you can pass share=True to launch method:\n", "# demo.launch(share=True)\n", "# it creates a publicly shareable link for the interface. Read more in the docs: https://gradio.app/docs/\n", - "demo.launch(server_name='10.3.233.99', server_port=5344)" + "demo.launch()" ] }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 30, "id": "6f4b5a84-bebf-49b9-b2fa-5e788ed2cbac", "metadata": {}, "outputs": [ @@ -1802,7 +1609,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "Closing server running on port: 4545\n" + "Closing server running on port: 5579\n" ] } ], @@ -1828,7 +1635,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" + "version": "3.11.4" }, "openvino_notebooks": { "imageUrl": "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/304aa048-f10c-41c6-bb31-6d2bfdf49cf5", diff --git a/notebooks/254-llm-chatbot/config.py b/notebooks/254-llm-chatbot/config.py index f8ad87e77b5..87eb36ffb66 100644 --- a/notebooks/254-llm-chatbot/config.py +++ b/notebooks/254-llm-chatbot/config.py @@ -132,16 +132,16 @@ def internlm_partial_text_processor(partial_text, new_text): "mistral-7b": { "model_id": "mistralai/Mistral-7B-v0.1", "remote": False, - "start_message": f"<|system|>\n{DEFAULT_SYSTEM_PROMPT}\n", - "history_template": "<|user|>\n{user} \n<|assistant|>\n{assistant} \n", - "current_message_template": "<|user|>\n{user} \n<|assistant|>\n{assistant}", - "rag_prompt_template": f"""<|system|> {DEFAULT_RAG_PROMPT }""" + "start_message": f"[INST] <>\n{DEFAULT_SYSTEM_PROMPT }\n<>\n\n", + "history_template": "{user}[/INST]{assistant}[INST]", + "current_message_template": "{user} [/INST]{assistant}", + "tokenizer_kwargs": {"add_special_tokens": False}, + "partial_text_processor": llama_partial_text_processor, + "rag_prompt_template": f""" [INST] {DEFAULT_RAG_PROMPT } [/INST] """ + """ - <|user|> - Question: {question} + [INST] Question: {question} Context: {context} - Answer: - <|assistant|>""", + Answer: [/INST]""", }, "zephyr-7b-beta": { "model_id": "HuggingFaceH4/zephyr-7b-beta", @@ -187,8 +187,8 @@ def internlm_partial_text_processor(partial_text, new_text): }, }, "Chinese":{ - "qwen1.5-1.8b-chat": { - "model_id": "Qwen/Qwen1.5-1.8B-Chat", + "qwen1.5-0.5b-chat": { + "model_id": "Qwen/Qwen1.5-0.5B-Chat", "remote": False, "start_message": DEFAULT_SYSTEM_PROMPT_CHINESE, "stop_tokens": ["<|im_end|>", "<|endoftext|>"], @@ -200,11 +200,18 @@ def internlm_partial_text_processor(partial_text, new_text): 已知内容: {context} 回答: <|im_end|><|im_start|>assistant""", }, - "qwen1.5-0.5b-chat": { - "model_id": "Qwen/Qwen1.5-0.5B-Chat", + "qwen1.5-1.8b-chat": { + "model_id": "Qwen/Qwen-1_8B-Chat", "remote": False, "start_message": DEFAULT_SYSTEM_PROMPT_CHINESE, "stop_tokens": ["<|im_end|>", "<|endoftext|>"], + "rag_prompt_template": f"""<|im_start|>system + {DEFAULT_RAG_PROMPT_CHINESE }<|im_end|>""" + + """ + <|im_start|>user + 问题: {question} + 已知内容: {context} + 回答: <|im_end|><|im_start|>assistant""", }, "qwen1.5-7b-chat": { "model_id": "Qwen/Qwen1.5-7B-Chat", @@ -267,6 +274,12 @@ def internlm_partial_text_processor(partial_text, new_text): "remote": False, "start_message": DEFAULT_SYSTEM_PROMPT_CHINESE, "stop_tokens": [2], + "rag_prompt_template": f"""{DEFAULT_RAG_PROMPT_CHINESE }""" + + """ + 问题: {question} + 已知内容: {context} + 回答: + """, }, "internlm2-chat-1.8b": { "model_id": "internlm/internlm2-chat-1_8b", @@ -275,6 +288,13 @@ def internlm_partial_text_processor(partial_text, new_text): "start_message": DEFAULT_SYSTEM_PROMPT_CHINESE, "stop_tokens": [2, 92542], "partial_text_processor": internlm_partial_text_processor, + "rag_prompt_template": f"""<|im_start|>system + {DEFAULT_RAG_PROMPT_CHINESE }<|im_end|>""" + + """ + <|im_start|>user + 问题: {question} + 已知内容: {context} + 回答: <|im_end|><|im_start|>assistant""", }, }, "Japanese":{ diff --git a/notebooks/254-llm-chatbot/ov_llm_model.py b/notebooks/254-llm-chatbot/ov_llm_model.py index 10760dd79dd..9b7444edc5b 100644 --- a/notebooks/254-llm-chatbot/ov_llm_model.py +++ b/notebooks/254-llm-chatbot/ov_llm_model.py @@ -208,6 +208,25 @@ class OVCHATGLMModel(OVModelForCausalLM): """ Optimum intel compatible model wrapper for CHATGLM2 """ + + def __init__( + self, + model: "Model", + config: "PretrainedConfig" = None, + device: str = "CPU", + dynamic_shapes: bool = True, + ov_config: Optional[Dict[str, str]] = None, + model_save_dir: Optional[Union[str, Path]] = None, + **kwargs, + ): + NormalizedConfigManager._conf["chatglm"] = NormalizedTextConfig.with_args( + num_layers="num_hidden_layers", + num_attention_heads="num_attention_heads", + hidden_size="hidden_size", + ) + super().__init__( + model, config, device, dynamic_shapes, ov_config, model_save_dir, **kwargs + ) def _reshape(self, model: "Model", *args, **kwargs): shapes = {} @@ -224,12 +243,68 @@ def _reshape(self, model: "Model", *args, **kwargs): shapes[inputs][1] = -1 model.reshape(shapes) return model + + @classmethod + def _from_pretrained( + cls, + model_id: Union[str, Path], + config: PretrainedConfig, + use_auth_token: Optional[Union[bool, str, None]] = None, + revision: Optional[Union[str, None]] = None, + force_download: bool = False, + cache_dir: Optional[str] = None, + file_name: Optional[str] = None, + subfolder: str = "", + from_onnx: bool = False, + local_files_only: bool = False, + load_in_8bit: bool = False, + **kwargs, + ): + model_path = Path(model_id) + default_file_name = OV_XML_FILE_NAME + file_name = file_name or default_file_name + + model_cache_path = cls._cached_file( + model_path=model_path, + use_auth_token=use_auth_token, + revision=revision, + force_download=force_download, + cache_dir=cache_dir, + file_name=file_name, + subfolder=subfolder, + local_files_only=local_files_only, + ) + + model = cls.load_model(model_cache_path, load_in_8bit=load_in_8bit) + init_cls = OVCHATGLMModel + + return init_cls( + model=model, config=config, model_save_dir=model_cache_path.parent, **kwargs + ) class OVQWENModel(OVModelForCausalLM): """ Optimum intel compatible model wrapper for QWEN """ + def __init__( + self, + model: "Model", + config: "PretrainedConfig" = None, + device: str = "CPU", + dynamic_shapes: bool = True, + ov_config: Optional[Dict[str, str]] = None, + model_save_dir: Optional[Union[str, Path]] = None, + **kwargs, + ): + NormalizedConfigManager._conf["qwen"] = NormalizedTextConfig.with_args( + num_layers="num_hidden_layers", + num_attention_heads="num_attention_heads", + hidden_size="hidden_size", + ) + super().__init__( + model, config, device, dynamic_shapes, ov_config, model_save_dir, **kwargs + ) def _reshape(self, model: "Model", *args, **kwargs): shapes = {}