testing branches #1

udaij12 · 2024-08-14T21:40:06Z

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A
Logs for Test A
Test B
Logs for Test B

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

* Implement stateful inference session timeout * Fix Instant.MAX to Millis conversion bug * Add regression tests for sequence continuous batching session timeouts * Lint test file * Update documentation * Update examples/stateful/sequence_batching/Readme.md Co-authored-by: Matthias Reso <[email protected]> * Update examples/stateful/sequence_batching/Readme.md Co-authored-by: Matthias Reso <[email protected]> * Update examples/stateful/sequence_continuous_batching/Readme.md Co-authored-by: Matthias Reso <[email protected]> --------- Co-authored-by: Matthias Reso <[email protected]>

* Leave response and sendError when request is canceled * Add comment referencing the issue. * fix: catch canceled job before closing too

Co-authored-by: Ankith Gunapal <[email protected]>

…on (#3276) * RAG based LLM usecase * RAG based LLM usecase * Changes for deploying RAG * Updated README * Added main blog * Added main blog assets * Added main blog assets * Added use case to index html * Added benchmark config * Minor edits to README * Added new MD for Gen AI usecases * Added link to GV3 tutorial * Addressed review comments * Update examples/usecases/RAG_based_LLM_serving/README.md Co-authored-by: Matthias Reso <[email protected]> * Update examples/usecases/RAG_based_LLM_serving/README.md Co-authored-by: Matthias Reso <[email protected]> * Addressed review comments --------- Co-authored-by: Matthias Reso <[email protected]>

* Add some hints for java devs * Fix spell check

* add kserve gpu tests * add a check to validate gpu mem usage * fix typos * fix typo

* configurable start up time, minimum working example * remove startuptimeout from async worker for now before I confirm what model_load_timeout is * doc updates * remove extra spaces in model manager * apply formatting * remove worker command logging * add tests for long startup timeout * worker thread add logging response timeout if worker state isn't worker_started * add startuptimeout to registerWorkflow function * add startuptimeout to the correct word in spellchecker * working example * small refactor * small refactor * added default value for model status * Update ts_scripts/spellcheck_conf/wordlist.txt * Fix java unit tests * Fix regression test * add startup_timeout for test to cast it to int --------- Co-authored-by: Matthias Reso <[email protected]>

…pos (#3291) * Add REPO_URL in Dockerfile * Add repo url to docker files and build_image.sh * Enable repo url in docker ci workflow * Update docs for repo url

* upgrade to PyTorch 2.4 * remove text classifier models from sanity * update aoti test * added deprecation warning for examples * Update requirements/torch_darwin.txt Co-authored-by: Matthias Reso <[email protected]> --------- Co-authored-by: Matthias Reso <[email protected]>

* add TorchServe with Intel® Extension for PyTorch* guidance * update the content with an existing doc in torchserve example --------- Co-authored-by: Ankith Gunapal <[email protected]>

* Replace git: in correct variable... * Use env.FORK for CI like #3266 did * DEBUG: dump github context * DEBUG: dump github context * Get repo url from head information

* Make repo url work in merge queue ci, which is not a pull request.... * dump context

* Forward additional url segments as url_paths in request header to model * Fix vllm test and clean preproc * First attept to enable OpenAI api for models served via vllm * fix streaming in openai api * Add OpenAIServingCompletion usage example * Add lora modules to vllm engine * Finish openai completion integration; removed req openai client; updated lora example to llama 3.1 * fix lint * Update mistral + llama3 vllm example * Remove openai client from url path test * Add openai chat api to vllm example * Added v1/models endpoint for vllm example * Remove accidential breakpoint() * Add comment to new url_path

…xample (#3300) * Update quickstart llm docker readme; added ts.llm_launcher example * fix wording

Typo fixes in docstrings and comments.

* testing on graviton * testing on graviton * testing on graviton * checking python * rmv python * changing back to python * testing cpu instead * adding torchtext * adding torchtext * testing torchtext * removing two tests * removing pytorch test * adding numpy upgrade * adding numpy upgrade * testing full ci * testing full ci * testing full ci * skipping grpc * addign graviton ci * addign graviton ci * adding ci cpu graviton * adding ci cpu graviton * adding env * skipping a test for now * fixing env variable * removing scripted 3&4 * small changes * fixing lint * fixing lint * fixing lint * removing torchtext --------- Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ankith Gunapal <[email protected]>

* Use spawn instead of fork method for vllm * Fix lint

* Example to demonstrate building a custom endpoint plugin * Fix linter errors * Update readme to include two models * Update Readme and handle case of no registered models

* fixing upload version * fixing upload version * fixing upload version

Signed-off-by: Emmanuel Ferdman <[email protected]>

Co-authored-by: Matthias Reso <[email protected]> Co-authored-by: Ankith Gunapal <[email protected]>

* adding graviton docker image * testing multiplatform ci * testing multiplatform ci * testing multiplatform ci * adding new builder * removing arm * removing arm * testing arm * tests * testing driver command * testing driver command * testing on newer instance * testing on newer instance * testing newer * rm command * changing platform * testing only amd * testing both arch * testing both arch * testing both * remove builder * remove builder * adding amd * building cache * cache 3 * cache 4 * cache 4 * final test * reverting temp changes * testing official release * testing official release * testing official release * adding kserve changes * kserve nightly --------- Co-authored-by: Ankith Gunapal <[email protected]>

* adding graviton docker image * testing multiplatform ci * testing multiplatform ci * testing multiplatform ci * adding new builder * removing arm * removing arm * testing arm * tests * testing driver command * testing driver command * testing on newer instance * testing on newer instance * testing newer * rm command * changing platform * testing only amd * testing both arch * testing both arch * testing both * remove builder * remove builder * adding amd * building cache * cache 3 * cache 4 * cache 4 * final test * reverting temp changes * testing official release * testing official release * testing official release * adding kserve changes * kserve nightly * adding build context --------- Co-authored-by: Ankith Gunapal <[email protected]>

* TRT LLM Integration with LORA * TRT LLM Integration with LORA * TRT LLM Integration with LORA * TRT LLM Integration with LORA * Added launcher support for trt_llm * updated README * updated README * Using the API that supports async generate * Review comments * Apply suggestions from code review Co-authored-by: Matthias Reso <[email protected]> * addressed review comments * Addressed review comments * Updated the async logic based on review comments * Made max_batch_size and kv_cache size configurable for the launcher * fixing lint --------- Co-authored-by: Matthias Reso <[email protected]>

* Bump vllm from 0.5.0 to 0.5.5 in /examples/large_models/vllm Bumps [vllm](https://github.com/vllm-project/vllm) from 0.5.0 to 0.5.5. - [Release notes](https://github.com/vllm-project/vllm/releases) - [Commits](vllm-project/vllm@v0.5.0...v0.5.5) --- updated-dependencies: - dependency-name: vllm dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * Update examples/large_models/vllm/requirements.txt Update vllm dependency to match PT version --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Matthias Reso <[email protected]>

) * Use startup time in async worker thread instead of worker timeout * Fix lint * Update yaml files to use startupTimeout * Update vllm/lora readme

* adding graviton docker image * testing multiplatform ci * testing multiplatform ci * testing multiplatform ci * adding new builder * removing arm * removing arm * testing arm * tests * testing driver command * testing driver command * testing on newer instance * testing on newer instance * testing newer * rm command * changing platform * testing only amd * testing both arch * testing both arch * testing both * remove builder * remove builder * adding amd * building cache * cache 3 * cache 4 * cache 4 * final test * reverting temp changes * testing official release * testing official release * testing official release * adding kserve changes * kserve nightly * adding build context * changing nightly push * remove push on official * remove push on official --------- Co-authored-by: Ankith Gunapal <[email protected]>

Co-authored-by: Boyu Chen <[email protected]> Co-authored-by: Naman Nandan <[email protected]>

Co-authored-by: Matthias Reso <[email protected]> Co-authored-by: Ankith Gunapal <[email protected]>

* testing adding startup * testing adding default * testing newer docker * adding default values for model * testing specific docker * testing specific docker image * fixed format * merge ready * adding source of truth * adding source of truth * fixing tests * format * removing defaults --------- Co-authored-by: Ankith Gunapal <[email protected]>

* set model_snapshot_path to None to prevent unbound local error * address PR comments with pythonic usage, fix README * small change * revert formatting changes

Co-authored-by: Ankith Gunapal <[email protected]>

Bumps [onnx](https://github.com/onnx/onnx) from 1.16.0 to 1.17.0. - [Release notes](https://github.com/onnx/onnx/releases) - [Changelog](https://github.com/onnx/onnx/blob/main/docs/Changelog-ml.md) - [Commits](onnx/onnx@v1.16.0...v1.17.0) --- updated-dependencies: - dependency-name: onnx dependency-type: direct:development ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Added OV SDXL registration to chat_bot app * sdxl image generation * pass model params * fixes * fixes * llm-sd pipeline * store images * need to fix sd_xl checkbox * fix for num_of_img==1 * fix for 1 img, total time * perf fixes * fixes * llm with torch.compile * fixed tocken auth issue, ui fixes * gpt fast version, bad quality of output prompts * rm extra files, updated readme * added llama params, sd default res 768, better prompts * fix, updated default workers num * button for prompts generation * fix * fix * Changed SDXL to LCM SDXL * updated lcm example * updated lcm example * updated lcm example * add llm_sd_app * Updated llm_diffusion_serving_app * Updated llm_diffusion_serving_app * Update llm_diffusion_serving_app * Update llm_diffusion_serving_app * Update examples/usecases/llm_diffusion_serving_app/Readme.md Co-authored-by: Ankith Gunapal <[email protected]> * Update llm_diffusion_serving_app * update llm_diffusion_serving_app * update llm_diffusion_serving_app * update llm_diffusion_serving_app * Update llm_diffusion_serving_app * Update llm_diffusion_serving_app * Minor Updates, Added sd_benchmark * Add docs for llm_diffusion_serving_app * Apply suggestions from code review Co-authored-by: Ankith Gunapal <[email protected]> * Update llm_diffusion_serving_app, fix linter issues * Update img, add assests * update readme --------- Co-authored-by: likholat <[email protected]> Co-authored-by: likholat <[email protected]> Co-authored-by: suryasidd <[email protected]> Co-authored-by: Ankith Gunapal <[email protected]>

* Add AMD backend support * Add AMD frontend support * Add Dockerfile.rocm Co-authored-by: Samu Tamminen <[email protected]> * Add AMD documentation * Fix null pointer bug with populateAccelerators trying to get null AppleUtil GPU env value * Fix formatting --------- Co-authored-by: Rony Leppänen <[email protected]> Co-authored-by: Anders Smedegaard Pedersen <[email protected]> Co-authored-by: Samu Tamminen <[email protected]>

* Add Apple system metrics support Co-authored-by: Bipradip Chowdhury <[email protected]> Co-authored-by: Rony Leppänen <[email protected]> Co-authored-by: Anders Smedegaard Pedersen <[email protected]> * Fix ModelServerTest.testMetricManager for other HW vendors * Add GPUUtilization as expect metric --------- Co-authored-by: Bipradip Chowdhury <[email protected]> Co-authored-by: Rony Leppänen <[email protected]> Co-authored-by: Anders Smedegaard Pedersen <[email protected]>

namannandan and others added 30 commits July 23, 2024 18:07

Leave response and sendError when request is canceled (#3267)

e5db414

* Leave response and sendError when request is canceled * Add comment referencing the issue. * fix: catch canceled job before closing too

remove compile note for hpu (#3271)

3233f44

Co-authored-by: Ankith Gunapal <[email protected]>

squeezenet torch.compile example (#3277)

24e2492

Co-authored-by: Ankith Gunapal <[email protected]>

doc update of the blog (#3280)

04f1e6a

Add some hints for java devs (#3282)

b24c72d

* Add some hints for java devs * Fix spell check

add kserve gpu tests (#3283)

30eb13d

* add kserve gpu tests * add a check to validate gpu mem usage * fix typos * fix typo

Add REPO_URL in Dockerfile to allow docker builds from contributor re…

aa9bb73

…pos (#3291) * Add REPO_URL in Dockerfile * Add repo url to docker files and build_image.sh * Enable repo url in docker ci workflow * Update docs for repo url

add TorchServe with Intel® Extension for PyTorch* guidance (#3285)

8df7f3d

* add TorchServe with Intel® Extension for PyTorch* guidance * update the content with an existing doc in torchserve example --------- Co-authored-by: Ankith Gunapal <[email protected]>

Replace git:// with https:// in repo url (#3293)

179cc4c

Fix docker ci repo_url (#3294)

0901d4b

* Replace git: in correct variable... * Use env.FORK for CI like #3266 did * DEBUG: dump github context * DEBUG: dump github context * Get repo url from head information

Fix/docker repo url3 (#3297)

87e9c35

* Make repo url work in merge queue ci, which is not a pull request.... * dump context

Remove debug step (#3298)

391ee4c

Update quickstart llm docker in serve/readme; added ts.llm_launcher e…

a2ba1c7

…xample (#3300) * Update quickstart llm docker readme; added ts.llm_launcher example * fix wording

typo fixes in HF Transformers example (#3307)

640b406

Typo fixes in docstrings and comments.

Fix wild card in extra files (#3304)

d6ea6e7

Set vllm multiproc method to spawn (#3310)

048aa53

* Use spawn instead of fork method for vllm * Fix lint

Example to demonstrate building a custom endpoint plugin (#3306)

15952d0

* Example to demonstrate building a custom endpoint plugin * Fix linter errors * Update readme to include two models * Update Readme and handle case of no registered models

Benchmark fix (#3316)

d0d38ad

* fixing upload version * fixing upload version * fixing upload version

docs: update WaveGlow links (#3317)

c698fe0

Signed-off-by: Emmanuel Ferdman <[email protected]>

Fix typo: "a asynchronous" -> "an asynchronous" (#3314)

06d2d57

Co-authored-by: Matthias Reso <[email protected]> Co-authored-by: Ankith Gunapal <[email protected]>

Update TS version to 0.12.0 (#3318)

646862e

dependabot bot and others added 17 commits September 17, 2024 23:17

Use startup time in async worker thread instead of worker timeout (#3315

e212294

) * Use startup time in async worker thread instead of worker timeout * Fix lint * Update yaml files to use startupTimeout * Update vllm/lora readme

Clear up neuron cache (#3326)

7161c6f

Co-authored-by: Boyu Chen <[email protected]> Co-authored-by: Naman Nandan <[email protected]>

Fix dockerfile fore renamed forks (#3327)

37a533b

Fix typo: vesion -> version, succsesfully -> successfully (#3322)

d993070

Co-authored-by: Matthias Reso <[email protected]> Co-authored-by: Ankith Gunapal <[email protected]>

Load .tar.gz on load_models=all (#3329)

6881ec5

Rename vllm dockerfile (#3330)

6bdb1ba

set model_snapshot_path to None to prevent unbound local error (#3336)

9d10087

* set model_snapshot_path to None to prevent unbound local error * address PR comments with pythonic usage, fix README * small change * revert formatting changes

Create export path if not present (#3331)

e8879c1

Co-authored-by: Ankith Gunapal <[email protected]>

Update README.md (#3368)

3182443

update prompt template (#3372)

0985386

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing branches #1

testing branches #1

udaij12 commented Aug 14, 2024

testing branches #1

Are you sure you want to change the base?

testing branches #1

Conversation

udaij12 commented Aug 14, 2024

Description

Type of change

Feature/Issue validation/testing

Checklist: