Generate pipeline #334

pavel-esir · 2024-03-30T16:01:53Z

LLM return logits with probabilities of each token, these probabilities can be converted to tokens/words with different technics: greedy decoding, beam search decoding, random sampling, etc.

This requires writing user unfriendly post-processing even for the simplest scenario of greedy decoding. In order to make live easier we we combined all decoding scenarios into a single function call, where the decoding method and parameters are specified by arguments.

In this PR we provide a user friendly API for text generation inspired by generate method from HuggingFace transformers library.

enable calling tokenizers/detokenizers from LLMPipeline
add callback for streaming mode - done partially, need to improve
rewritten samples with the current approach: causal_lm/cpp/generate_pipeline/generate_sample.cpp#L73-L83
Multibatch greedy decoding
Speculative decoding
Grouped Beam Search decoding: ready for batch 1, need to rebase multibatch support after merging Add multi prompt support for beam search #349
Random sampling

Example 1: Greedy search generation

LLMPipeline pipe(model_path, device);

// Will try to load config from generation_config.json.
// but if not found default velues for gready search will be used
GenerationConfig config = pipe.generation_config();

cout << pipe(prompt, config.max_new_tokens(20));

Example 2: TextStreaming mode

LLMPipeline pipe(model_path, device);

GenerationConfig config = pipe.generation_config();

auto text_streamer = TextStreamer{pipe};
auto text_streamer_callback = [&text_streamer](std::vector<int64_t>&& tokens, LLMPipeline& pipe){
    text_streamer.put(tokens[0]);
};

pipe(prompt, config.max_new_tokens(20).set_callback(text_streamer_callback));
text_streamer.end();

CVS-132907 CVS-137920

text_generation/causal_lm/cpp/generate_pipeline/generate_pipeline.hpp

text_generation/causal_lm/cpp/CMakeLists.txt

text_generation/causal_lm/cpp/generate_pipeline/generate_pipeline.hpp

text_generation/causal_lm/cpp/generate_pipeline/generation_config.hpp

text_generation/causal_lm/cpp/generate_pipeline/generate_sample.cpp

text_generation/causal_lm/cpp/generate_pipeline/generate_pipeline.hpp

…ine.hpp

src/README.md

sbalandi · 2024-06-04T10:28:59Z

src/README.md

+pipe = ov_ov_genai.LLMPipeline(model_path)
+
+config = {'num_groups': 3, 'group_size': 5, 'diversity_penalty': 1.5}
+pipe.set_generation_cofnig(config)


I tried and it does not work like this for me, should it work ? It woks for me if config is GenerationConfig object and I got with get_generation_config before

Is it possible to add description about options , which we can configure in config ?

src/README.md

fix ignore_eos fix batched detokenization add generation config validation removed CPU and redundant getting KV cache

* Leftovers * Leftovers * retrigger

* Split text samples to sepparate folders * correct path * correct * correct path

* Assume GenAI is installec * put --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release back

sbalandi · 2024-06-05T21:14:09Z

src/python/py_generate_pipeline.cpp

+        // todo: if input_ids is a ov::Tensor/numpy tensor
+
+        .def("get_tokenizer", &LLMPipeline::get_tokenizer)
+        .def("start_chat", &LLMPipeline::start_chat)


I tried this api start_chat/finish_chat on models form notebook https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot

Two models work fine: llama-3-8b-instruct INT8 and tiny-llama-1b-chat INT8

But with other models I had 2 types of errors:

phi-3-mini-instruct INT8, red-pajama-3b-chat INT4, gemma-2b-it INT4 and gemma-7b-it FP16 fails on the first call of pipe.generate() with error

msg = pipe(q) RuntimeError: bad_expected_access

notus-7b-v1 INT4 notus-7b-v1 FP16 neural-chat-7b-v3-1 INT8mistral-7b INT4 zephyr-7b-beta INT8 - first question is okay, fails on the first call of pipe.generate() with error:

# RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:223: # Exception from src/plugins/intel_cpu/src/node.cpp:1626: # Shape inference of Select node with name __module.model/aten::masked_fill/Select_1 failed: Exception from src/plugins/intel_cpu/src/shape_inference/custom/eltwise.cpp:45: # Eltwise shape infer input shapes dim index: 3 mismatch

several calls of pipe.generate() without start_chat() don't lead to fails

sbalandi · 2024-06-05T21:21:55Z

src/python/py_generate_pipeline.cpp

+            device (str): Device to run the model on (e.g., CPU, GPU). Default is 'CPU'.
+        )")
+
+        .def(py::init([](py::object infer_request, 


I tried to create an infer request from outside with core.compile_model().create_infer_request() and put it here and got an error:

TypeError: __init__(): incompatible constructor arguments. The following argument types are supported: 1. openvino_genai.py_generate_pipeline.LLMPipeline(model_path: str, device: str = 'CPU') 2. openvino_genai.py_generate_pipeline.LLMPipeline(model_path: str, tokenizer: ov::genai::Tokenizer, device: str = 'CPU') 3. openvino_genai.py_generate_pipeline.LLMPipeline(infer_request: object, tokenizer: ov::genai::Tokenizer, config: Optional[ov::genai::GenerationConfig])

Is it right behavior ? How could I create the infer_requst to put it here ?

The exact python line could help. The error message suggests, it's possible to do, but the args were incorrect:
3. openvino_genai.py_generate_pipeline.LLMPipeline(infer_request: object, tokenizer: ov::genai::Tokenizer, config: Optional[ov::genai::GenerationConfig])

ilya-lavrenov · 2024-06-07T11:07:20Z

.github/workflows/causal_lm_cpp.yml

-          python -m pip install --upgrade-strategy eager -r text_generation/causal_lm/cpp/requirements.txt
-          python -m pip install ./thirdparty/openvino_tokenizers/[transformers]
-          sudo apt-get install libtbb-dev
+          python -m pip install --upgrade-strategy eager -r ./samples/cpp/requirements.txt


@Wovchena here we install older OV first, then override with newer one.
Should we change lines order? BTW, why not to install OV Tokenizers simple PyPi?

They install identical OV. ./samples/cpp/requirements.txt has --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release

ilya-lavrenov · 2024-06-07T11:10:48Z

CMakeLists.txt

+        ./samples/cpp/multinomial_causal_lm
+        # Don't install prompt_lookup_decoding_lm and speculative_decoding_lm because they don't use openvino_genai library and arent verifyed yet.
+    DESTINATION samples/cpp/ COMPONENT cpp_samples_genai)
+install(FILES ./samples/cpp/requirements.txt DESTINATION samples/cpp/ COMPONENT cpp_samples_genai)


@Wovchena
why these 2 install rules are not part of samples/cpp/CMakeLists.txt ?

This rule shouldn't be included in the resulting package

ilya-lavrenov · 2024-06-07T11:11:38Z

CMakeLists.txt

+install(FILES ./samples/cpp/requirements.txt DESTINATION samples/cpp/ COMPONENT cpp_samples_genai)
+install(FILES LICENSE DESTINATION licensing COMPONENT licensing_genai RENAME LICENSE-GENAI)
+install(FILES third-party-programs.txt DESTINATION licensing COMPONENT licensing_genai RENAME third-party-programs-genai.txt)
+if(MSVC AND NOT DEFINED CPACK_GENERATOR)


@Wovchena
MSVC is compiler / cmake generator
WIN32 is platform.

Should we use WIN32 here?

ilya-lavrenov · 2024-06-07T11:13:03Z

samples/cpp/beam_search_causal_lm/CMakeLists.txt

+
+find_package(OpenVINOGenAI REQUIRED PATHS
+    "${CMAKE_BINARY_DIR}"  # Reuse the package from the build.
+    ${OpenVINO_DIR}  # GenAI may be installed alogside OpenVINO.


@Wovchena
do we need it now?
setupvars.sh exposes OpenVINOGenAI_DIR

OpenVINO_DIR a good guess anyway.

ilya-lavrenov · 2024-06-07T11:14:33Z

samples/cpp/beam_search_causal_lm/CMakeLists.txt

+)
+add_executable(beam_search_causal_lm beam_search_causal_lm.cpp)
+target_link_libraries(beam_search_causal_lm PRIVATE openvino::genai)
+target_compile_features(beam_search_causal_lm PRIVATE cxx_std_17)


@Wovchena
I don't see that we use some C++17 features in this sample

src/cpp/src/multinomial_decoding.cpp

ilya-lavrenov · 2024-06-07T11:54:08Z

src/cpp/src/llm_pipeline.cpp

+            // previous prompt generation in chat dialog stops with the end of sentence token, 
+            // need to append this token to the current prompt
+            if (is_chat_conversation && !m_is_cache_empty)
+                text = m_tokenizer.get_eos_token() + text;


we agreed to drop it (to emulate skip_scial_tokens=True)

src/cpp/src/generation_config.cpp

src/cpp/src/llm_pipeline.cpp

ilya-lavrenov · 2024-06-07T11:59:13Z

src/cpp/src/llm_pipeline.cpp

+        return result;        
+    }
+
+    std::string apply_chat_template(const std::vector<std::pair<std::string, std::string>>& prompts) const {


@pavel-esir
should it be vector<unordered_map> ?

LLM return logits with probabilities of each token, these probabilities can be converted to tokens/words with different technics: greedy decoding, beam search decoding, random sampling, etc. This requires writing user unfriendly post-processing even for the simplest scenario of greedy decoding. In order to make live easier we we combined all decoding scenarios into a single function call, where the decoding method and parameters are specified by arguments. In this PR we provide a user friendly API for text generation inspired by `generate` method from HuggingFace transformers library. - [x] enable calling tokenizers/detokenizers from LLMPipeline - [ ] add callback for streaming mode - done partially, need to improve - [x] rewritten samples with the current approach: [causal_lm/cpp/generate_pipeline/generate_sample.cpp#L73-L83](https://github.com/pavel-esir/openvino.genai/blob/generate_pipeline/text_generation/causal_lm/cpp/generate_pipeline/generate_sample.cpp#L73-L83) - [x] Multibatch greedy decoding - [ ] Speculative decoding - [ ] Grouped Beam Search decoding: ready for batch 1, need to rebase multibatch support after merging openvinotoolkit#349 - [x] Random sampling Example 1: Greedy search generation ``` LLMPipeline pipe(model_path, device); // Will try to load config from generation_config.json. // but if not found default velues for gready search will be used GenerationConfig config = pipe.generation_config(); cout << pipe(prompt, config.max_new_tokens(20)); ``` Example 2: TextStreaming mode ``` LLMPipeline pipe(model_path, device); GenerationConfig config = pipe.generation_config(); auto text_streamer = TextStreamer{pipe}; auto text_streamer_callback = [&text_streamer](std::vector<int64_t>&& tokens, LLMPipeline& pipe){ text_streamer.put(tokens[0]); }; pipe(prompt, config.max_new_tokens(20).set_callback(text_streamer_callback)); text_streamer.end(); ``` CVS-132907 CVS-137920 --------- Co-authored-by: Wovchena <[email protected]> Co-authored-by: Ilya Lavrenov <[email protected]> Co-authored-by: Alexander Suvorov <[email protected]> Co-authored-by: Yaroslav Tarkan <[email protected]> Co-authored-by: Xiake Sun <[email protected]> Co-authored-by: wenyi5608 <[email protected]> Co-authored-by: Ekaterina Aidova <[email protected]> Co-authored-by: guozhong wang <[email protected]> Co-authored-by: Chen Peter <[email protected]>

LLM return logits with probabilities of each token, these probabilities can be converted to tokens/words with different technics: greedy decoding, beam search decoding, random sampling, etc. This requires writing user unfriendly post-processing even for the simplest scenario of greedy decoding. In order to make live easier we we combined all decoding scenarios into a single function call, where the decoding method and parameters are specified by arguments. In this PR we provide a user friendly API for text generation inspired by `generate` method from HuggingFace transformers library. - [x] enable calling tokenizers/detokenizers from LLMPipeline - [ ] add callback for streaming mode - done partially, need to improve - [x] rewritten samples with the current approach: [causal_lm/cpp/generate_pipeline/generate_sample.cpp#L73-L83](https://github.com/pavel-esir/openvino.genai/blob/generate_pipeline/text_generation/causal_lm/cpp/generate_pipeline/generate_sample.cpp#L73-L83) - [x] Multibatch greedy decoding - [ ] Speculative decoding - [ ] Grouped Beam Search decoding: ready for batch 1, need to rebase multibatch support after merging #349 - [x] Random sampling Example 1: Greedy search generation ``` LLMPipeline pipe(model_path, device); // Will try to load config from generation_config.json. // but if not found default velues for gready search will be used GenerationConfig config = pipe.generation_config(); cout << pipe(prompt, config.max_new_tokens(20)); ``` Example 2: TextStreaming mode ``` LLMPipeline pipe(model_path, device); GenerationConfig config = pipe.generation_config(); auto text_streamer = TextStreamer{pipe}; auto text_streamer_callback = [&text_streamer](std::vector<int64_t>&& tokens, LLMPipeline& pipe){ text_streamer.put(tokens[0]); }; pipe(prompt, config.max_new_tokens(20).set_callback(text_streamer_callback)); text_streamer.end(); ``` CVS-132907 CVS-137920 --------- Co-authored-by: Wovchena <[email protected]> Co-authored-by: Ilya Lavrenov <[email protected]> Co-authored-by: Alexander Suvorov <[email protected]> Co-authored-by: Yaroslav Tarkan <[email protected]> Co-authored-by: Xiake Sun <[email protected]> Co-authored-by: wenyi5608 <[email protected]> Co-authored-by: Ekaterina Aidova <[email protected]> Co-authored-by: guozhong wang <[email protected]> Co-authored-by: Chen Peter <[email protected]>

commit adec0e0 Author: Irina Efode <[email protected]> Date: Tue Jun 11 14:32:45 2024 +0400 Remove extra token desc commit a64f30a Author: Irina Efode <[email protected]> Date: Tue Jun 11 13:36:01 2024 +0400 Working sampler commit 05048ff Author: Irina Efode <[email protected]> Date: Tue Jun 11 13:23:43 2024 +0400 check commit e349418 Merge: bfaa55a 0b1ce98 Author: Irina Efode <[email protected]> Date: Mon Jun 10 23:11:58 2024 +0400 Merge remote-tracking branch 'ilavrenov_upstream/ct-beam-search' into penalties commit 0b1ce98 Merge: 16d857e 2da1556 Author: Ilya Lavrenov <[email protected]> Date: Mon Jun 10 18:52:20 2024 +0400 Merge pull request openvinotoolkit#21 from iefode/n_support Support num_return_seq for multinomial case commit bfaa55a Author: Irina Efode <[email protected]> Date: Mon Jun 10 17:42:01 2024 +0400 Fix tests commit fa0efb6 Author: Irina Efode <[email protected]> Date: Mon Jun 10 16:41:04 2024 +0400 Config tests commit 7551303 Author: Irina Efode <[email protected]> Date: Mon Jun 10 15:34:14 2024 +0400 Implement LogitTransformers. todo config check commit 16d857e Merge: 76148c5 1ee4f38 Author: Ilya Lavrenov <[email protected]> Date: Mon Jun 10 10:41:27 2024 +0200 Merge remote-tracking branch 'upstream/master' into ct-beam-search commit 1ee4f38 Author: guozhong wang <[email protected]> Date: Sun Jun 9 18:26:57 2024 +0800 Add option --prompt_index (openvinotoolkit#481) Run the corresponding prompt according to the option prompt index commit 9902928 Author: Pavel Esir <[email protected]> Date: Fri Jun 7 20:57:47 2024 +0200 Generate pipeline (openvinotoolkit#334) LLM return logits with probabilities of each token, these probabilities can be converted to tokens/words with different technics: greedy decoding, beam search decoding, random sampling, etc. This requires writing user unfriendly post-processing even for the simplest scenario of greedy decoding. In order to make live easier we we combined all decoding scenarios into a single function call, where the decoding method and parameters are specified by arguments. In this PR we provide a user friendly API for text generation inspired by `generate` method from HuggingFace transformers library. - [x] enable calling tokenizers/detokenizers from LLMPipeline - [ ] add callback for streaming mode - done partially, need to improve - [x] rewritten samples with the current approach: [causal_lm/cpp/generate_pipeline/generate_sample.cpp#L73-L83](https://github.com/pavel-esir/openvino.genai/blob/generate_pipeline/text_generation/causal_lm/cpp/generate_pipeline/generate_sample.cpp#L73-L83) - [x] Multibatch greedy decoding - [ ] Speculative decoding - [ ] Grouped Beam Search decoding: ready for batch 1, need to rebase multibatch support after merging openvinotoolkit#349 - [x] Random sampling Example 1: Greedy search generation ``` LLMPipeline pipe(model_path, device); // Will try to load config from generation_config.json. // but if not found default velues for gready search will be used GenerationConfig config = pipe.generation_config(); cout << pipe(prompt, config.max_new_tokens(20)); ``` Example 2: TextStreaming mode ``` LLMPipeline pipe(model_path, device); GenerationConfig config = pipe.generation_config(); auto text_streamer = TextStreamer{pipe}; auto text_streamer_callback = [&text_streamer](std::vector<int64_t>&& tokens, LLMPipeline& pipe){ text_streamer.put(tokens[0]); }; pipe(prompt, config.max_new_tokens(20).set_callback(text_streamer_callback)); text_streamer.end(); ``` CVS-132907 CVS-137920 --------- Co-authored-by: Wovchena <[email protected]> Co-authored-by: Ilya Lavrenov <[email protected]> Co-authored-by: Alexander Suvorov <[email protected]> Co-authored-by: Yaroslav Tarkan <[email protected]> Co-authored-by: Xiake Sun <[email protected]> Co-authored-by: wenyi5608 <[email protected]> Co-authored-by: Ekaterina Aidova <[email protected]> Co-authored-by: guozhong wang <[email protected]> Co-authored-by: Chen Peter <[email protected]> commit 2da1556 Author: Irina Efode <[email protected]> Date: Thu Jun 6 19:24:45 2024 +0400 library/src/continuous_batching_pipeline.cpp commit 7b48fa4 Author: Irina Efode <[email protected]> Date: Thu Jun 6 15:03:05 2024 +0400 enable streaming for greedy commit 5c601e0 Author: Irina Efode <[email protected]> Date: Thu Jun 6 13:29:47 2024 +0400 Comments commit 4f73d36 Author: Irina Efode <[email protected]> Date: Wed Jun 5 22:46:04 2024 +0400 Enable frequency and presence penalties commit 5e49c46 Author: Irina Efode <[email protected]> Date: Wed Jun 5 11:56:31 2024 +0400 Fix python tests commit eb4a219 Author: Irina Efode <[email protected]> Date: Tue Jun 4 22:38:43 2024 +0400 fix assert place commit f4d8461 Author: Irina Efode <[email protected]> Date: Tue Jun 4 22:22:37 2024 +0400 Correct accumulation commit 55448a1 Merge: 1128792 76148c5 Author: Irina Efode <[email protected]> Date: Tue Jun 4 18:56:42 2024 +0400 Merge remote-tracking branch 'ilavrenov_upstream/ct-beam-search' into n_support commit 1128792 Author: Irina Efode <[email protected]> Date: Tue Jun 4 18:52:38 2024 +0400 test commit e245041 Author: Irina Efode <[email protected]> Date: Tue Jun 4 18:52:03 2024 +0400 Apply comments commit 561cde0 Author: guozhong wang <[email protected]> Date: Tue Jun 4 16:27:08 2024 +0800 using sdpa for statble diffusion (openvinotoolkit#458) Co-authored-by: Chen Peter <[email protected]> commit 04510d4 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Jun 3 17:37:41 2024 +0000 Bump optimum[openvino] from 1.19.2 to 1.20.0 in /text_generation/causal_lm/cpp (openvinotoolkit#467) commit db4a88f Merge: e5d33f5 b63bda2 Author: Irina Efode <[email protected]> Date: Mon Jun 3 13:17:32 2024 +0400 Merge remote-tracking branch 'ilavrenov_upstream/ct-beam-search' into n_support commit e5d33f5 Merge: fe29df9 bcdcefc Author: Irina Efode <[email protected]> Date: Fri May 31 14:11:13 2024 +0400 Merge remote-tracking branch 'ilavrenov_upstream/ct-beam-search' into n_support commit fe29df9 Author: Irina Efode <[email protected]> Date: Fri May 31 14:06:51 2024 +0400 Tests + Readme commit 7af72aa Author: Irina Efode <[email protected]> Date: Wed May 29 15:16:23 2024 +0400 Squashed commit of the following: commit 28af66d Author: Anastasiia Pnevskaia <[email protected]> Date: Tue May 28 15:40:15 2024 +0200 Added cache_size to python binding of scheduler config. commit 65a793a Author: Anastasiia Pnevskaia <[email protected]> Date: Tue May 28 15:12:16 2024 +0200 Fixed tests. commit 033558e Author: Irina Efode <[email protected]> Date: Wed May 29 00:40:48 2024 +0400 One more change commit dbae0bf Merge: f992591 2c2799f Author: Irina Efode <[email protected]> Date: Wed May 29 00:38:52 2024 +0400 Merge master, without py tests commit a5b14c7 Author: Lyalyushkin Nikolay <[email protected]> Date: Tue May 28 16:15:42 2024 +0200 grammar corrector support in WWB (openvinotoolkit#462) This PR introduces support for `AutoForSeq2SeqLM` models in WWB. Previously, WWB only supported `AutoForCasualLM`, assuming that the `generate` method copies the prompt to the output. But AutoForSeq2SeqLM generates output differently: there is no copy of the prompt, and it directly generates output. The fix was checked on the [example](https://gist.github.com/ljaljushkin/5a489a27cd0020ddbd42ea7ae54be688). It evaluates grammar correction with Seq2Seq model using WWB. commit f992591 Author: Irina Efode <[email protected]> Date: Tue May 28 17:39:17 2024 +0400 tmp commit 7e771f1 Author: Liwenke <[email protected]> Date: Tue May 28 15:24:15 2024 +0800 Note for wikitext data set connection issue (openvinotoolkit#452) Co-authored-by: Chen Peter <[email protected]> commit 24ef06e Author: guozhong wang <[email protected]> Date: Tue May 28 14:23:19 2024 +0800 Force to generate more tokens (openvinotoolkit#457) commit 1ed7539 Author: guozhong wang <[email protected]> Date: Tue May 28 09:44:45 2024 +0800 Correct flan-t5 output size (openvinotoolkit#451) openvinotoolkit#358 --------- Co-authored-by: Chen Peter <[email protected]> commit b5a9f28 Author: Irina Efode <[email protected]> Date: Mon May 27 23:48:03 2024 +0400 Extend in beam support commit edc53e5 Author: Irina Efode <[email protected]> Date: Fri May 24 17:59:48 2024 +0400 remove extra commit 9038308 Author: Irina Efode <[email protected]> Date: Fri May 24 16:20:13 2024 +0400 Improve multinomial commit c453e3e Author: Irina Efode <[email protected]> Date: Fri May 24 15:42:48 2024 +0400 Support num_return_seq for multinomial case commit e6f05c6 Author: guozhong wang <[email protected]> Date: Thu May 23 11:36:50 2024 +0800 Output median min and avg values to csv (openvinotoolkit#450) Co-authored-by: Chen Peter <[email protected]> commit 25909cc Author: guozhong wang <[email protected]> Date: Thu May 23 11:12:27 2024 +0800 verify beam search 1st token optimization (openvinotoolkit#426) The minimum version of transformers to get 1st and 2nd tokens latency is v4.40-release. commit 03e78fe Author: Chen Peter <[email protected]> Date: Wed May 22 13:06:11 2024 +0800 Revert "Force to generate "inference count" tokens" (openvinotoolkit#455) Reverts openvinotoolkit#289 to unblock the release. Since it causes the performance regression of some models. (WIP to investigate the reason) commit 05a0f36 Author: Ekaterina Aidova <[email protected]> Date: Tue May 21 20:33:26 2024 +0400 fix path based configuration (openvinotoolkit#456) commit 41b07d3 Author: Ekaterina Aidova <[email protected]> Date: Fri May 17 06:20:18 2024 +0400 Fix md5 hash for env that does not support usedforsecurity arg (openvinotoolkit#445) I got an error running benchmarking on my working machine (python3.8, ubuntu20) due to unsupported args for hashlib. ``` [ ERROR ] An exception occurred [ INFO ] Traceback (most recent call last): File "benchmark.py", line 532, in main iter_data_list, pretrain_time = CASE_TO_BENCH[model_args['use_case']](model_path, framework, args.device, model_args, args.num_iters) File "benchmark.py", line 194, in run_text_generation_benchmark run_text_generation(input_text, num, model, tokenizer, args, iter_data_list, warmup_md5, prompt_idx, bench_hook, model_precision, proc_id) File "benchmark.py", line 131, in run_text_generation result_md5_list.append(hashlib.md5(result_text.encode(), usedforsecurity=False).hexdigest()) TypeError: openssl_md5() takes at most 1 argument (2 given) ``` Based on this [StackOverflow issue](https://stackoverflow.com/questions/54717862/how-do-i-know-if-the-usedforsecurity-flag-is-supported-by-hashlib-md5), not all clients support this argument and usage hashlib.new("md5") vs hashlib.md5 should be safe for usage in both cases commit d473e96 Author: guozhong wang <[email protected]> Date: Fri May 17 10:09:27 2024 +0800 output no hook data warning when it is text gen model (openvinotoolkit#449) commit cad3abb Author: guozhong wang <[email protected]> Date: Thu May 16 17:28:49 2024 +0800 Fix an attempt to add a string value to a numerical value (openvinotoolkit#447) commit 93f7670 Author: Ekaterina Aidova <[email protected]> Date: Thu May 16 11:49:08 2024 +0400 update optimum intel commit in llm bench (openvinotoolkit#444) commit d73346c Author: Yaroslav Tarkan <[email protected]> Date: Wed May 15 14:24:30 2024 +0300 Fix noise images generated for '--num' > 1 in Stable Diffusion sample (openvinotoolkit#441) Fixes openvinotoolkit#405

pavel-esir force-pushed the generate_pipeline branch 5 times, most recently from b139e69 to b8026a9 Compare April 4, 2024 14:44

pavel-esir commented Apr 5, 2024

View reviewed changes

text_generation/causal_lm/cpp/generate_pipeline/generate_pipeline.hpp Outdated Show resolved Hide resolved

pavel-esir force-pushed the generate_pipeline branch 3 times, most recently from a6f16ea to d551da3 Compare April 11, 2024 19:15

sammysun0711 reviewed Apr 12, 2024

View reviewed changes

text_generation/causal_lm/cpp/CMakeLists.txt Outdated Show resolved Hide resolved

pavel-esir marked this pull request as ready for review April 12, 2024 11:53

pavel-esir requested review from ilya-lavrenov, olpipi and as-suvorov April 12, 2024 11:55

pavel-esir commented Apr 12, 2024

View reviewed changes

text_generation/causal_lm/cpp/generate_pipeline/generate_pipeline.hpp Outdated Show resolved Hide resolved

ilya-lavrenov reviewed Apr 16, 2024

View reviewed changes

text_generation/causal_lm/cpp/generate_pipeline/generate_pipeline.hpp Outdated Show resolved Hide resolved

ilya-lavrenov reviewed Apr 16, 2024

View reviewed changes

text_generation/causal_lm/cpp/generate_pipeline/generate_pipeline.hpp Outdated Show resolved Hide resolved

text_generation/causal_lm/cpp/generate_pipeline/generate_pipeline.hpp Outdated Show resolved Hide resolved

pavel-esir and others added 12 commits April 17, 2024 09:49

initial generate

ba91fde

LLM pipeline

9d85a0e

Added calculating for several batches

b21c6c1

Greedy search works

e52e90d

rename to GenerationConfig

745a804

Add fluent interface

8895ed0

Update text_generation/causal_lm/cpp/generate_pipeline/generate_pipel…

b24977d

…ine.hpp

cosmetic changes in main

c933ca0

greedy search with batches and left padding works

c43e901

combine LLModel with LLMPipeline

5a914f6

wip: enable calling tokenize/detokenize for LLMPipeline

c1e0c9d

add callback to generate

8d66353

Merge branch 'master' into generate_pipeline

59c1096

sbalandi reviewed Jun 4, 2024

View reviewed changes

tsavina reviewed Jun 4, 2024

View reviewed changes

src/README.md Show resolved Hide resolved

pavel-esir and others added 6 commits June 5, 2024 12:35

read special tokens only from tokenizer_config.json and config.json

28ebc87

fix ignore_eos fix batched detokenization add generation config validation removed CPU and redundant getting KV cache

Leftovers (#18)

da96019

* Leftovers * Leftovers * retrigger

minor typos fix

a7f73a6

Split text samples to separate folders (#19)

a74baa2

* Split text samples to sepparate folders * correct path * correct * correct path

update llm_bench (#17)

13ebf9f

Assume GenAI is installed (#20)

67b1cfa

* Assume GenAI is installec * put --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release back

sbalandi reviewed Jun 5, 2024

View reviewed changes

pavel-esir added 3 commits June 5, 2024 23:33

fix segfault in tests

0bd9cb3

fix converting unfinished utf strings

b618673

load special tokens leftovers

80a17be

ilya-lavrenov mentioned this pull request Jun 6, 2024

Initial support for chat templates ilya-lavrenov/openvino.genai#27

Closed

pavel-esir added 3 commits June 6, 2024 21:19

add config loading tests

743f348

commit forgotten py_generate_pipeline.cpp

51a9a73

fix ScopedVar in Tokenizer for ov_tokenizers_path

2494df1

Wovchena approved these changes Jun 7, 2024

View reviewed changes

pavel-esir and others added 3 commits June 7, 2024 10:43

skip config modification in tmp dir on Win

8f1399f

return back win tests after disabling cleanup

57830ba

Disable unfinished utf string test in Win

2175796

ilya-lavrenov reviewed Jun 7, 2024

View reviewed changes

iefode self-requested a review June 7, 2024 15:16

disable failing win workflows

7c07136

pavel-esir mentioned this pull request Jun 7, 2024

Generate pipeline leftovers batch 1 pavel-esir/openvino.genai#24

Closed

Wovchena merged commit 9902928 into openvinotoolkit:master Jun 7, 2024
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate pipeline #334

Generate pipeline #334

pavel-esir commented Mar 30, 2024 •

edited by ilya-lavrenov

Loading

sbalandi Jun 4, 2024

sbalandi Jun 4, 2024

sbalandi Jun 5, 2024

sbalandi Jun 5, 2024

Wovchena Jun 6, 2024

ilya-lavrenov Jun 7, 2024

Wovchena Jun 7, 2024

ilya-lavrenov Jun 7, 2024

Wovchena Jun 7, 2024

ilya-lavrenov Jun 7, 2024

ilya-lavrenov Jun 7, 2024

Wovchena Jun 7, 2024

ilya-lavrenov Jun 7, 2024

ilya-lavrenov Jun 7, 2024

ilya-lavrenov Jun 7, 2024

Generate pipeline #334

Generate pipeline #334

Conversation

pavel-esir commented Mar 30, 2024 • edited by ilya-lavrenov Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavel-esir commented Mar 30, 2024 •

edited by ilya-lavrenov

Loading