huggingface / text-generation-inference Public

Notifications
Fork 1.1k
Star 9.6k

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: huggingface/text-generation-inference

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

180 Open 1,245 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Llama3.1-8b with LoRa: This model does not support adapter loading.

#2400 opened Aug 12, 2024 by ilyalasy

2 of 4 tasks

torch.cuda.OutOfMemoryError: CUDA out of memory. Why isn't it handle by the queue system ?

#2417 opened Aug 14, 2024 by JustAnotherVeryNormalDeveloper

1 of 4 tasks

Running FP8 and INT4 on multiple AMDs fails with torch.cuda.OutOfMemoryError

#2434 opened Aug 19, 2024 by peterschmidt85

Resolve lora loading bug

#2441 opened Aug 21, 2024 by drbh

A seeming typo in text_generation_server/utils/adapters.py

#2483 opened Sep 2, 2024 by sadra-barikbin

Multi-LORA feature question

#2505 opened Sep 9, 2024 by imran3180

Multi-LORA feature question-2

#2506 opened Sep 9, 2024 by imran3180

RuntimeError: weight model.embed_tokens.weight does not exist

#2509 opened Sep 11, 2024 by jayus71

3 of 4 tasks

Add response_format input parameter to v1/chat/completions endpoint

#2523 opened Sep 16, 2024 by ktrapeznikov

* HTTP 1.0, assume close after body < HTTP/1.0 503 Service Unavailable

#2526 opened Sep 17, 2024 by aditivw

4 tasks

Support for returning a CompletionUsage object when streaming=True

#2531 opened Sep 17, 2024 by andrewrreed

Error: Backend(Warmup(Generation("Hidden size mismatch"))) when launch Mixtral-8x22B-v0.1

#2543 opened Sep 21, 2024 by alexhegit

1 of 4 tasks

Inconsistent Behavior with Multi-LoRA Deployment

#2559 opened Sep 24, 2024 by charlatan-101

2 of 4 tasks

Passing an image_url to a text-only model should fail explicitly

#2565 opened Sep 25, 2024 by Wauplin

4 tasks

Question: What is preferred way to cite TGI/repo? Didnt see a citation file.

#2569 opened Sep 26, 2024 by elegantmoose

Remove max_stop_sequences by default

#2584 opened Sep 29, 2024 by sestinj

3 of 4 tasks

huggingface_hub.errors.GenerationError: Request failed during generation: Server error:

#2608 opened Oct 4, 2024 by ivanhe123

2 of 4 tasks

TGI drops requests when 150 requests are sent continuously at the rate of 5 Request Per Second in AMD 8 X MI300x with Llama 3.1 405B

#2635 opened Oct 11, 2024 by Bihan

2 of 4 tasks

[New Model Request] NVLM

#2636 opened Oct 11, 2024 by nbroad1881

2 tasks done

input tokens exceeded max_input_tokens

#2638 opened Oct 12, 2024 by LanSnowZ

2 of 4 tasks

Add AMD gfx110* support

#2641 opened Oct 13, 2024 by cazlo

(Prefill) KV Cache Indexing error if started multiple TGI servers concurrently

#2675 opened Oct 21, 2024 by nathan-az

3 of 4 tasks

TGI Server should be installable via pip

#2696 opened Oct 27, 2024 by cdoern

CUDA Error: No kernel image is available for execution on the device

#2703 opened Oct 28, 2024 by shubhamgajbhiye1994

2 of 4 tasks

detokenize

#2705 opened Oct 29, 2024 by oroojlooy

Previous 1 2 3 4 5 6 7 8 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2024-12-20.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly