-
Notifications
You must be signed in to change notification settings - Fork 27.6k
Issues: huggingface/transformers
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
use_liger_kernel requires much more GPU memory during evaluation than training
bug
#35689
opened Jan 14, 2025 by
Smu-Tan
2 of 4 tasks
Some weights of the model checkpoint at /models/DeepSeek-V3_bf16 were not used when initializing DeepseekV3ForCausalLM
bug
#35688
opened Jan 14, 2025 by
Godlovecui
4 tasks
past_key_values cat out of model generate, output appear disorder
bug
Generation
#35684
opened Jan 14, 2025 by
lzlwakeup
2 of 4 tasks
Support LLMs With No Image Placeholder Embedding in LLava-based Models
Feature request
Request for a new feature
Multimodal
VLM
#35683
opened Jan 14, 2025 by
alex-jw-brooks
Improve Guidance for Using DDP in Request for a new feature
examples/pytorch
Feature request
#35667
opened Jan 13, 2025 by
caojiaolong
AttributeError: 'MERTConfig' object has no attribute 'conv_pos_batch_norm'
bug
#35656
opened Jan 13, 2025 by
JacopoMadaluni
2 of 4 tasks
Unnecessary KV Cache Updates During Training Mode
bug
#35648
opened Jan 13, 2025 by
Hannibal046
2 of 4 tasks
Will Qwen2VL support sequence classification head in the future?
Feature request
Request for a new feature
#35645
opened Jan 13, 2025 by
cv-nlp
tokenizer.decode() and tokenizer.convert_ids_to_tokens() return different results
bug
#35641
opened Jan 12, 2025 by
thangld201
4 tasks
Expected
tensors
and new_tensors
to have the same type but found <class ‘tuple’> and <class ‘torch.Tensor’>
bug
#35640
opened Jan 12, 2025 by
Bruce-Azar-Wayne
4 tasks
set_initialized_submodules too slow when loading big model like DeepSeekV3
bug
#35635
opened Jan 12, 2025 by
hongchuan666
4 tasks
ValueError: MllamaForConditionalGeneration does not support Flash Attention 2.0 yet
bug
#35634
opened Jan 12, 2025 by
yxchng
4 tasks
static cache with mixtral will cause CUDA error: device-side assert triggered
bug
#35626
opened Jan 11, 2025 by
zyxiyy
1 of 4 tasks
Segmentation fault: address not mapped to object at address 0x100000007
bug
#35624
opened Jan 11, 2025 by
mrinaldi97
4 tasks done
Unsupported: hasattr SkipFunctionVariable when i compile the mixtral model with muti-gpus
bug
#35623
opened Jan 11, 2025 by
zyxiyy
4 tasks
The argument "dim" is gone from LlamaRotaryEmbedding initializer. Intentional?
bug
#35621
opened Jan 11, 2025 by
jeffhataws
4 tasks
from_pretrained fails to save weights.py and layers.py into cache, therefore fails to find them in cache
bug
#35619
opened Jan 11, 2025 by
openyk
4 tasks
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.