forked from pytorch/serve
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testing branches #1
Open
udaij12
wants to merge
47
commits into
udaij12:master
Choose a base branch
from
pytorch:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Implement stateful inference session timeout * Fix Instant.MAX to Millis conversion bug * Add regression tests for sequence continuous batching session timeouts * Lint test file * Update documentation * Update examples/stateful/sequence_batching/Readme.md Co-authored-by: Matthias Reso <[email protected]> * Update examples/stateful/sequence_batching/Readme.md Co-authored-by: Matthias Reso <[email protected]> * Update examples/stateful/sequence_continuous_batching/Readme.md Co-authored-by: Matthias Reso <[email protected]> --------- Co-authored-by: Matthias Reso <[email protected]>
* Leave response and sendError when request is canceled * Add comment referencing the issue. * fix: catch canceled job before closing too
Co-authored-by: Ankith Gunapal <[email protected]>
Co-authored-by: Ankith Gunapal <[email protected]>
…on (#3276) * RAG based LLM usecase * RAG based LLM usecase * Changes for deploying RAG * Updated README * Added main blog * Added main blog assets * Added main blog assets * Added use case to index html * Added benchmark config * Minor edits to README * Added new MD for Gen AI usecases * Added link to GV3 tutorial * Addressed review comments * Update examples/usecases/RAG_based_LLM_serving/README.md Co-authored-by: Matthias Reso <[email protected]> * Update examples/usecases/RAG_based_LLM_serving/README.md Co-authored-by: Matthias Reso <[email protected]> * Addressed review comments --------- Co-authored-by: Matthias Reso <[email protected]>
* Add some hints for java devs * Fix spell check
* add kserve gpu tests * add a check to validate gpu mem usage * fix typos * fix typo
* configurable start up time, minimum working example * remove startuptimeout from async worker for now before I confirm what model_load_timeout is * doc updates * remove extra spaces in model manager * apply formatting * remove worker command logging * add tests for long startup timeout * worker thread add logging response timeout if worker state isn't worker_started * add startuptimeout to registerWorkflow function * add startuptimeout to the correct word in spellchecker * working example * small refactor * small refactor * added default value for model status * Update ts_scripts/spellcheck_conf/wordlist.txt * Fix java unit tests * Fix regression test * add startup_timeout for test to cast it to int --------- Co-authored-by: Matthias Reso <[email protected]>
…pos (#3291) * Add REPO_URL in Dockerfile * Add repo url to docker files and build_image.sh * Enable repo url in docker ci workflow * Update docs for repo url
* upgrade to PyTorch 2.4 * remove text classifier models from sanity * update aoti test * added deprecation warning for examples * Update requirements/torch_darwin.txt Co-authored-by: Matthias Reso <[email protected]> --------- Co-authored-by: Matthias Reso <[email protected]>
* add TorchServe with Intel® Extension for PyTorch* guidance * update the content with an existing doc in torchserve example --------- Co-authored-by: Ankith Gunapal <[email protected]>
* Replace git: in correct variable... * Use env.FORK for CI like #3266 did * DEBUG: dump github context * DEBUG: dump github context * Get repo url from head information
* Make repo url work in merge queue ci, which is not a pull request.... * dump context
* Forward additional url segments as url_paths in request header to model * Fix vllm test and clean preproc * First attept to enable OpenAI api for models served via vllm * fix streaming in openai api * Add OpenAIServingCompletion usage example * Add lora modules to vllm engine * Finish openai completion integration; removed req openai client; updated lora example to llama 3.1 * fix lint * Update mistral + llama3 vllm example * Remove openai client from url path test * Add openai chat api to vllm example * Added v1/models endpoint for vllm example * Remove accidential breakpoint() * Add comment to new url_path
…xample (#3300) * Update quickstart llm docker readme; added ts.llm_launcher example * fix wording
Typo fixes in docstrings and comments.
* testing on graviton * testing on graviton * testing on graviton * checking python * rmv python * changing back to python * testing cpu instead * adding torchtext * adding torchtext * testing torchtext * removing two tests * removing pytorch test * adding numpy upgrade * adding numpy upgrade * testing full ci * testing full ci * testing full ci * skipping grpc * addign graviton ci * addign graviton ci * adding ci cpu graviton * adding ci cpu graviton * adding env * skipping a test for now * fixing env variable * removing scripted 3&4 * small changes * fixing lint * fixing lint * fixing lint * removing torchtext --------- Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ankith Gunapal <[email protected]>
* Use spawn instead of fork method for vllm * Fix lint
* Example to demonstrate building a custom endpoint plugin * Fix linter errors * Update readme to include two models * Update Readme and handle case of no registered models
* fixing upload version * fixing upload version * fixing upload version
Signed-off-by: Emmanuel Ferdman <[email protected]>
Co-authored-by: Matthias Reso <[email protected]> Co-authored-by: Ankith Gunapal <[email protected]>
* adding graviton docker image * testing multiplatform ci * testing multiplatform ci * testing multiplatform ci * adding new builder * removing arm * removing arm * testing arm * tests * testing driver command * testing driver command * testing on newer instance * testing on newer instance * testing newer * rm command * changing platform * testing only amd * testing both arch * testing both arch * testing both * remove builder * remove builder * adding amd * building cache * cache 3 * cache 4 * cache 4 * final test * reverting temp changes * testing official release * testing official release * testing official release * adding kserve changes * kserve nightly --------- Co-authored-by: Ankith Gunapal <[email protected]>
* adding graviton docker image * testing multiplatform ci * testing multiplatform ci * testing multiplatform ci * adding new builder * removing arm * removing arm * testing arm * tests * testing driver command * testing driver command * testing on newer instance * testing on newer instance * testing newer * rm command * changing platform * testing only amd * testing both arch * testing both arch * testing both * remove builder * remove builder * adding amd * building cache * cache 3 * cache 4 * cache 4 * final test * reverting temp changes * testing official release * testing official release * testing official release * adding kserve changes * kserve nightly * adding build context --------- Co-authored-by: Ankith Gunapal <[email protected]>
* TRT LLM Integration with LORA * TRT LLM Integration with LORA * TRT LLM Integration with LORA * TRT LLM Integration with LORA * Added launcher support for trt_llm * updated README * updated README * Using the API that supports async generate * Review comments * Apply suggestions from code review Co-authored-by: Matthias Reso <[email protected]> * addressed review comments * Addressed review comments * Updated the async logic based on review comments * Made max_batch_size and kv_cache size configurable for the launcher * fixing lint --------- Co-authored-by: Matthias Reso <[email protected]>
* Bump vllm from 0.5.0 to 0.5.5 in /examples/large_models/vllm Bumps [vllm](https://github.com/vllm-project/vllm) from 0.5.0 to 0.5.5. - [Release notes](https://github.com/vllm-project/vllm/releases) - [Commits](vllm-project/vllm@v0.5.0...v0.5.5) --- updated-dependencies: - dependency-name: vllm dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * Update examples/large_models/vllm/requirements.txt Update vllm dependency to match PT version --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Matthias Reso <[email protected]>
* adding graviton docker image * testing multiplatform ci * testing multiplatform ci * testing multiplatform ci * adding new builder * removing arm * removing arm * testing arm * tests * testing driver command * testing driver command * testing on newer instance * testing on newer instance * testing newer * rm command * changing platform * testing only amd * testing both arch * testing both arch * testing both * remove builder * remove builder * adding amd * building cache * cache 3 * cache 4 * cache 4 * final test * reverting temp changes * testing official release * testing official release * testing official release * adding kserve changes * kserve nightly * adding build context * changing nightly push * remove push on official * remove push on official --------- Co-authored-by: Ankith Gunapal <[email protected]>
Co-authored-by: Boyu Chen <[email protected]> Co-authored-by: Naman Nandan <[email protected]>
Co-authored-by: Matthias Reso <[email protected]> Co-authored-by: Ankith Gunapal <[email protected]>
* testing adding startup * testing adding default * testing newer docker * adding default values for model * testing specific docker * testing specific docker image * fixed format * merge ready * adding source of truth * adding source of truth * fixing tests * format * removing defaults --------- Co-authored-by: Ankith Gunapal <[email protected]>
* set model_snapshot_path to None to prevent unbound local error * address PR comments with pythonic usage, fix README * small change * revert formatting changes
Co-authored-by: Ankith Gunapal <[email protected]>
Bumps [onnx](https://github.com/onnx/onnx) from 1.16.0 to 1.17.0. - [Release notes](https://github.com/onnx/onnx/releases) - [Changelog](https://github.com/onnx/onnx/blob/main/docs/Changelog-ml.md) - [Commits](onnx/onnx@v1.16.0...v1.17.0) --- updated-dependencies: - dependency-name: onnx dependency-type: direct:development ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Added OV SDXL registration to chat_bot app * sdxl image generation * pass model params * fixes * fixes * llm-sd pipeline * store images * need to fix sd_xl checkbox * fix for num_of_img==1 * fix for 1 img, total time * perf fixes * fixes * llm with torch.compile * fixed tocken auth issue, ui fixes * gpt fast version, bad quality of output prompts * rm extra files, updated readme * added llama params, sd default res 768, better prompts * fix, updated default workers num * button for prompts generation * fix * fix * Changed SDXL to LCM SDXL * updated lcm example * updated lcm example * updated lcm example * add llm_sd_app * Updated llm_diffusion_serving_app * Updated llm_diffusion_serving_app * Update llm_diffusion_serving_app * Update llm_diffusion_serving_app * Update examples/usecases/llm_diffusion_serving_app/Readme.md Co-authored-by: Ankith Gunapal <[email protected]> * Update llm_diffusion_serving_app * update llm_diffusion_serving_app * update llm_diffusion_serving_app * update llm_diffusion_serving_app * Update llm_diffusion_serving_app * Update llm_diffusion_serving_app * Minor Updates, Added sd_benchmark * Add docs for llm_diffusion_serving_app * Apply suggestions from code review Co-authored-by: Ankith Gunapal <[email protected]> * Update llm_diffusion_serving_app, fix linter issues * Update img, add assests * update readme --------- Co-authored-by: likholat <[email protected]> Co-authored-by: likholat <[email protected]> Co-authored-by: suryasidd <[email protected]> Co-authored-by: Ankith Gunapal <[email protected]>
* Add AMD backend support * Add AMD frontend support * Add Dockerfile.rocm Co-authored-by: Samu Tamminen <[email protected]> * Add AMD documentation * Fix null pointer bug with populateAccelerators trying to get null AppleUtil GPU env value * Fix formatting --------- Co-authored-by: Rony Leppänen <[email protected]> Co-authored-by: Anders Smedegaard Pedersen <[email protected]> Co-authored-by: Samu Tamminen <[email protected]>
* Add Apple system metrics support Co-authored-by: Bipradip Chowdhury <[email protected]> Co-authored-by: Rony Leppänen <[email protected]> Co-authored-by: Anders Smedegaard Pedersen <[email protected]> * Fix ModelServerTest.testMetricManager for other HW vendors * Add GPUUtilization as expect metric --------- Co-authored-by: Bipradip Chowdhury <[email protected]> Co-authored-by: Rony Leppänen <[email protected]> Co-authored-by: Anders Smedegaard Pedersen <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Please read our CONTRIBUTING.md prior to creating your first pull request.
Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes #(issue)
Type of change
Please delete options that are not relevant.
Feature/Issue validation/testing
Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Test A
Logs for Test A
Test B
Logs for Test B
Checklist: