Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing branches #1

Open
wants to merge 47 commits into
base: master
Choose a base branch
from
Open

testing branches #1

wants to merge 47 commits into from

Conversation

udaij12
Copy link
Owner

@udaij12 udaij12 commented Aug 14, 2024

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A
    Logs for Test A

  • Test B
    Logs for Test B

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

namannandan and others added 30 commits July 23, 2024 18:07
* Implement stateful inference session timeout

* Fix Instant.MAX to Millis conversion bug

* Add regression tests for sequence continuous batching session timeouts

* Lint test file

* Update documentation

* Update examples/stateful/sequence_batching/Readme.md

Co-authored-by: Matthias Reso <[email protected]>

* Update examples/stateful/sequence_batching/Readme.md

Co-authored-by: Matthias Reso <[email protected]>

* Update examples/stateful/sequence_continuous_batching/Readme.md

Co-authored-by: Matthias Reso <[email protected]>

---------

Co-authored-by: Matthias Reso <[email protected]>
* Leave response and sendError when request is canceled

* Add comment referencing the issue.

* fix: catch canceled job before closing too
…on (#3276)

* RAG based LLM usecase

* RAG based LLM usecase

* Changes for deploying RAG

* Updated README

* Added main blog

* Added main blog assets

* Added main blog assets

* Added use case to index html

* Added benchmark config

* Minor edits to README

* Added new MD for Gen AI usecases

* Added link to GV3 tutorial

* Addressed review comments

* Update examples/usecases/RAG_based_LLM_serving/README.md

Co-authored-by: Matthias Reso <[email protected]>

* Update examples/usecases/RAG_based_LLM_serving/README.md

Co-authored-by: Matthias Reso <[email protected]>

* Addressed review comments

---------

Co-authored-by: Matthias Reso <[email protected]>
* Add some hints for java devs

* Fix spell check
* add kserve gpu tests

* add a check to validate gpu mem usage

* fix typos

* fix typo
* configurable start up time, minimum working example

* remove startuptimeout from async worker for now before I confirm what model_load_timeout is

* doc updates

* remove extra spaces in model manager

* apply formatting

* remove worker command logging

* add tests for long startup timeout

* worker thread add logging response timeout if worker state isn't worker_started

* add startuptimeout to registerWorkflow function

* add startuptimeout to the correct word in spellchecker

* working example

* small refactor

* small refactor

* added default value for model status

* Update ts_scripts/spellcheck_conf/wordlist.txt

* Fix java unit tests

* Fix regression test

* add startup_timeout for test to cast it to int

---------

Co-authored-by: Matthias Reso <[email protected]>
…pos (#3291)

* Add REPO_URL in Dockerfile

* Add repo url to docker files and build_image.sh

* Enable repo url in docker ci workflow

* Update docs for repo url
* upgrade to PyTorch 2.4

* remove text classifier models from sanity

* update aoti test

* added deprecation warning for examples

* Update requirements/torch_darwin.txt

Co-authored-by: Matthias Reso <[email protected]>

---------

Co-authored-by: Matthias Reso <[email protected]>
* add TorchServe with Intel® Extension for PyTorch* guidance

* update the content with an existing doc in torchserve example

---------

Co-authored-by: Ankith Gunapal <[email protected]>
* Replace git: in correct variable...

* Use env.FORK for CI like #3266 did

* DEBUG: dump github context

* DEBUG: dump github context

* Get repo url from head information
* Make repo url work in merge queue ci, which is not a pull request....

* dump context
* Forward additional url segments as url_paths in request header to model

* Fix vllm test and clean preproc

* First attept to enable OpenAI api for models served via vllm

* fix streaming in openai api

* Add OpenAIServingCompletion usage example

* Add lora modules to vllm engine

* Finish openai completion integration; removed req openai client; updated lora example to llama 3.1

* fix lint

* Update mistral + llama3 vllm example

* Remove openai client from url path test

* Add openai chat api to vllm example

* Added v1/models endpoint for vllm example

* Remove accidential breakpoint()

* Add comment to new url_path
…xample (#3300)

* Update quickstart llm docker readme; added ts.llm_launcher example

* fix wording
Typo fixes in docstrings and comments.
* testing on graviton

* testing on graviton

* testing on graviton

* checking python

* rmv python

* changing back to python

* testing cpu instead

* adding torchtext

* adding torchtext

* testing torchtext

* removing two tests

* removing pytorch test

* adding numpy upgrade

* adding numpy upgrade

* testing full ci

* testing full ci

* testing full ci

* skipping grpc

* addign graviton ci

* addign graviton ci

* adding ci cpu graviton

* adding ci cpu graviton

* adding env

* skipping a test for now

* fixing env variable

* removing scripted 3&4

* small changes

* fixing lint

* fixing lint

* fixing lint

* removing torchtext

---------

Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Ankith Gunapal <[email protected]>
* Use spawn instead of fork method for vllm

* Fix lint
* Example to demonstrate building a custom endpoint plugin

* Fix linter errors

* Update readme to include two models

* Update Readme and handle case of no registered models
* fixing upload version

* fixing upload version

* fixing upload version
Signed-off-by: Emmanuel Ferdman <[email protected]>
Co-authored-by: Matthias Reso <[email protected]>
Co-authored-by: Ankith Gunapal <[email protected]>
* adding graviton docker image

* testing multiplatform ci

* testing multiplatform ci

* testing multiplatform ci

* adding new builder

* removing arm

* removing arm

* testing arm

* tests

* testing driver command

* testing driver command

* testing on newer instance

* testing on newer instance

* testing newer

* rm command

* changing platform

* testing only amd

* testing both arch

* testing both arch

* testing both

* remove builder

* remove builder

* adding amd

* building cache

* cache 3

* cache 4

* cache 4

* final test

* reverting temp changes

* testing official release

* testing official release

* testing official release

* adding kserve changes

* kserve nightly

---------

Co-authored-by: Ankith Gunapal <[email protected]>
* adding graviton docker image

* testing multiplatform ci

* testing multiplatform ci

* testing multiplatform ci

* adding new builder

* removing arm

* removing arm

* testing arm

* tests

* testing driver command

* testing driver command

* testing on newer instance

* testing on newer instance

* testing newer

* rm command

* changing platform

* testing only amd

* testing both arch

* testing both arch

* testing both

* remove builder

* remove builder

* adding amd

* building cache

* cache 3

* cache 4

* cache 4

* final test

* reverting temp changes

* testing official release

* testing official release

* testing official release

* adding kserve changes

* kserve nightly

* adding build context

---------

Co-authored-by: Ankith Gunapal <[email protected]>
* TRT LLM Integration with LORA

* TRT LLM Integration with LORA

* TRT LLM Integration with LORA

* TRT LLM Integration with LORA

* Added launcher support for trt_llm

* updated README

* updated README

* Using the API that supports async generate

* Review comments

* Apply suggestions from code review

Co-authored-by: Matthias Reso <[email protected]>

* addressed review comments

* Addressed review comments

* Updated the async logic based on review comments

* Made max_batch_size and kv_cache size configurable for the launcher

* fixing lint

---------

Co-authored-by: Matthias Reso <[email protected]>
dependabot bot and others added 17 commits September 17, 2024 23:17
* Bump vllm from 0.5.0 to 0.5.5 in /examples/large_models/vllm

Bumps [vllm](https://github.com/vllm-project/vllm) from 0.5.0 to 0.5.5.
- [Release notes](https://github.com/vllm-project/vllm/releases)
- [Commits](vllm-project/vllm@v0.5.0...v0.5.5)

---
updated-dependencies:
- dependency-name: vllm
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* Update examples/large_models/vllm/requirements.txt

Update vllm dependency to match PT version

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias Reso <[email protected]>
)

* Use startup time in async worker thread instead of worker timeout

* Fix lint

* Update yaml files to use startupTimeout

* Update vllm/lora readme
* adding graviton docker image

* testing multiplatform ci

* testing multiplatform ci

* testing multiplatform ci

* adding new builder

* removing arm

* removing arm

* testing arm

* tests

* testing driver command

* testing driver command

* testing on newer instance

* testing on newer instance

* testing newer

* rm command

* changing platform

* testing only amd

* testing both arch

* testing both arch

* testing both

* remove builder

* remove builder

* adding amd

* building cache

* cache 3

* cache 4

* cache 4

* final test

* reverting temp changes

* testing official release

* testing official release

* testing official release

* adding kserve changes

* kserve nightly

* adding build context

* changing nightly push

* remove push on official

* remove push on official

---------

Co-authored-by: Ankith Gunapal <[email protected]>
Co-authored-by: Boyu Chen <[email protected]>
Co-authored-by: Naman Nandan <[email protected]>
* testing adding startup

* testing adding default

* testing newer docker

* adding default values for model

* testing specific docker

* testing specific docker image

* fixed format

* merge ready

* adding source of truth

* adding source of truth

* fixing tests

* format

* removing defaults

---------

Co-authored-by: Ankith Gunapal <[email protected]>
* set model_snapshot_path to None to prevent unbound local error

* address PR comments with pythonic usage, fix README

* small change

* revert formatting changes
Bumps [onnx](https://github.com/onnx/onnx) from 1.16.0 to 1.17.0.
- [Release notes](https://github.com/onnx/onnx/releases)
- [Changelog](https://github.com/onnx/onnx/blob/main/docs/Changelog-ml.md)
- [Commits](onnx/onnx@v1.16.0...v1.17.0)

---
updated-dependencies:
- dependency-name: onnx
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Added OV SDXL registration to chat_bot app

* sdxl image generation

* pass model params

* fixes

* fixes

* llm-sd pipeline

* store images

* need to fix sd_xl checkbox

* fix for num_of_img==1

* fix for 1 img, total time

* perf fixes

* fixes

* llm with torch.compile

* fixed tocken auth issue, ui fixes

* gpt fast version, bad quality of output prompts

* rm extra files, updated readme

* added llama params, sd default res 768, better prompts

* fix, updated default workers num

* button for prompts generation

* fix

* fix

* Changed SDXL to LCM SDXL

* updated lcm example

* updated lcm example

* updated lcm example

* add llm_sd_app

* Updated llm_diffusion_serving_app

* Updated llm_diffusion_serving_app

* Update llm_diffusion_serving_app

* Update llm_diffusion_serving_app

* Update examples/usecases/llm_diffusion_serving_app/Readme.md

Co-authored-by: Ankith Gunapal <[email protected]>

* Update llm_diffusion_serving_app

* update llm_diffusion_serving_app

* update llm_diffusion_serving_app

* update llm_diffusion_serving_app

* Update llm_diffusion_serving_app

* Update llm_diffusion_serving_app

* Minor Updates, Added sd_benchmark

* Add docs for llm_diffusion_serving_app

* Apply suggestions from code review

Co-authored-by: Ankith Gunapal <[email protected]>

* Update llm_diffusion_serving_app, fix linter issues

* Update img, add assests

* update readme

---------

Co-authored-by: likholat <[email protected]>
Co-authored-by: likholat <[email protected]>
Co-authored-by: suryasidd <[email protected]>
Co-authored-by: Ankith Gunapal <[email protected]>
* Add AMD backend support

* Add AMD frontend support

* Add Dockerfile.rocm

Co-authored-by: Samu Tamminen <[email protected]>

* Add AMD documentation

* Fix null pointer bug with populateAccelerators trying to get null AppleUtil GPU env value

* Fix formatting

---------

Co-authored-by: Rony Leppänen <[email protected]>
Co-authored-by: Anders Smedegaard Pedersen <[email protected]>
Co-authored-by: Samu Tamminen <[email protected]>
* Add Apple system metrics support

Co-authored-by: Bipradip Chowdhury <[email protected]>
Co-authored-by: Rony Leppänen <[email protected]>
Co-authored-by: Anders Smedegaard Pedersen <[email protected]>

* Fix ModelServerTest.testMetricManager for other HW vendors

* Add GPUUtilization as expect metric

---------

Co-authored-by: Bipradip Chowdhury <[email protected]>
Co-authored-by: Rony Leppänen <[email protected]>
Co-authored-by: Anders Smedegaard Pedersen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.