-
Notifications
You must be signed in to change notification settings - Fork 27.6k
Issues: huggingface/transformers
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
use_liger_kernel requires much more GPU memory during evaluation than training
bug
#35689
opened Jan 14, 2025 by
Smu-Tan
2 of 4 tasks
Some weights of the model checkpoint at /models/DeepSeek-V3_bf16 were not used when initializing DeepseekV3ForCausalLM
bug
#35688
opened Jan 14, 2025 by
Godlovecui
4 tasks
past_key_values cat out of model generate, output appear disorder
bug
Generation
#35684
opened Jan 14, 2025 by
lzlwakeup
2 of 4 tasks
Support LLMs With No Image Placeholder Embedding in LLava-based Models
Feature request
Request for a new feature
Multimodal
VLM
#35683
opened Jan 14, 2025 by
alex-jw-brooks
Improve Guidance for Using DDP in Request for a new feature
examples/pytorch
Feature request
#35667
opened Jan 13, 2025 by
caojiaolong
AttributeError: 'MERTConfig' object has no attribute 'conv_pos_batch_norm'
bug
#35656
opened Jan 13, 2025 by
JacopoMadaluni
2 of 4 tasks
Will Qwen2VL support sequence classification head in the future?
Feature request
Request for a new feature
#35645
opened Jan 13, 2025 by
cv-nlp
tokenizer.decode() and tokenizer.convert_ids_to_tokens() return different results
bug
#35641
opened Jan 12, 2025 by
thangld201
4 tasks
Expected
tensors
and new_tensors
to have the same type but found <class ‘tuple’> and <class ‘torch.Tensor’>
bug
#35640
opened Jan 12, 2025 by
Bruce-Azar-Wayne
4 tasks
set_initialized_submodules too slow when loading big model like DeepSeekV3
bug
#35635
opened Jan 12, 2025 by
hongchuan666
4 tasks
ValueError: MllamaForConditionalGeneration does not support Flash Attention 2.0 yet
bug
#35634
opened Jan 12, 2025 by
yxchng
4 tasks
static cache with mixtral will cause CUDA error: device-side assert triggered
bug
#35626
opened Jan 11, 2025 by
zyxiyy
1 of 4 tasks
Segmentation fault: address not mapped to object at address 0x100000007
bug
#35624
opened Jan 11, 2025 by
mrinaldi97
4 tasks done
Unsupported: hasattr SkipFunctionVariable when i compile the mixtral model with muti-gpus
bug
#35623
opened Jan 11, 2025 by
zyxiyy
4 tasks
from_pretrained fails to save weights.py and layers.py into cache, therefore fails to find them in cache
bug
#35619
opened Jan 11, 2025 by
openyk
4 tasks
Help Understanding Beam Search Scores in Hugging Face (LLaMA + LoRA)
bug
Generation
#35618
opened Jan 10, 2025 by
pratcooper
2 of 4 tasks
Better handeling of hardcoded component in PretrainedModel.from_pretrained.
bug
#35617
opened Jan 10, 2025 by
princethewinner
1 of 4 tasks
Trainer: TensorBoardCallback not working for "on_save" and "on_save_end" events
bug
#35612
opened Jan 10, 2025 by
vecorro
2 of 4 tasks
Trainer sets
state.best_model_checkpoint
even when it doesn't save there; leads to training crash
bug
#35609
opened Jan 10, 2025 by
tomaarsen
2 of 4 tasks
Prompt_ids feature causing repetitions and hallucinations
bug
#35603
opened Jan 10, 2025 by
vchagari
4 tasks
ProTip!
Adding no:label will show everything without a label.