-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Issues: huggingface/text-generation-inference
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Llama3.1-8b with LoRa: This model does not support adapter loading.
#2400
opened Aug 12, 2024 by
ilyalasy
2 of 4 tasks
torch.cuda.OutOfMemoryError: CUDA out of memory. Why isn't it handle by the queue system ?
#2417
opened Aug 14, 2024 by
JustAnotherVeryNormalDeveloper
1 of 4 tasks
Running FP8 and INT4 on multiple AMDs fails with
torch.cuda.OutOfMemoryError
#2434
opened Aug 19, 2024 by
peterschmidt85
A seeming typo in
text_generation_server/utils/adapters.py
#2483
opened Sep 2, 2024 by
sadra-barikbin
RuntimeError: weight model.embed_tokens.weight does not exist
#2509
opened Sep 11, 2024 by
jayus71
3 of 4 tasks
Add
response_format
input parameter to v1/chat/completions
endpoint
#2523
opened Sep 16, 2024 by
ktrapeznikov
* HTTP 1.0, assume close after body < HTTP/1.0 503 Service Unavailable
#2526
opened Sep 17, 2024 by
aditivw
4 tasks
Support for returning a
CompletionUsage
object when streaming=True
#2531
opened Sep 17, 2024 by
andrewrreed
Error: Backend(Warmup(Generation("Hidden size mismatch"))) when launch Mixtral-8x22B-v0.1
#2543
opened Sep 21, 2024 by
alexhegit
1 of 4 tasks
Inconsistent Behavior with Multi-LoRA Deployment
#2559
opened Sep 24, 2024 by
charlatan-101
2 of 4 tasks
Passing an
image_url
to a text-only model should fail explicitly
#2565
opened Sep 25, 2024 by
Wauplin
4 tasks
Question: What is preferred way to cite TGI/repo? Didnt see a citation file.
#2569
opened Sep 26, 2024 by
elegantmoose
huggingface_hub.errors.GenerationError: Request failed during generation: Server error:
#2608
opened Oct 4, 2024 by
ivanhe123
2 of 4 tasks
TGI drops requests when 150 requests are sent continuously at the rate of 5 Request Per Second in AMD 8 X MI300x with Llama 3.1 405B
#2635
opened Oct 11, 2024 by
Bihan
2 of 4 tasks
(Prefill) KV Cache Indexing error if started multiple TGI servers concurrently
#2675
opened Oct 21, 2024 by
nathan-az
3 of 4 tasks
CUDA Error: No kernel image is available for execution on the device
#2703
opened Oct 28, 2024 by
shubhamgajbhiye1994
2 of 4 tasks
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-12-20.