-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suppress warnings from LUKE for unexpected keys #24703
Suppress warnings from LUKE for unexpected keys #24703
Conversation
…ghts by specifying _keys_to_ignore_on_load_unexpected
The documentation is not available anymore as the PR was closed or merged. |
I believe this should not be done this way. These keys should be used only if the default behavior in the modeling code will have different keys than the canonical (original) checkpoints on the Hub. But before further discussion, let's check one thing first: the config in use_entity_aware_attention": true, Are you sure this is the checkpoint that causes confusion ..? |
These keys should be used only if:
|
I have run from transformers import AutoModel
model = transformers.AutoModel.from_pretrained("studio-ousia/mluke-base-lite") but didn't receive any warning. |
Thanks @ydshieh for taking a look for the PR!
When I look at the latest version of the config on the following models, I find
The following Google Colabo notebook shows the warning. |
I believe that this PR is similar to the second point mentioned above. The HF checkpoint is derived from the original checkpoint generated by the original repository. The checkpoint contains additional keys ( I would like to address the problem of the confusing and overwhelming warnings even when it is the default behavior. |
OK I see. We have to use |
We can't change these kinds of keys due to a Hub model repo. author uploading problematic weights/config file. If we change in the way like done in this PR, we won't have any warning when a real problem occurs, and the bugs won't be detected. |
I didn't check the original repo. (which is not me adding that model into use_entity_aware_attention": true, Also, the default value in |
Let me share more context on this problem. The weights uploaded on the HF repo are supposed to work either when I am from the same group of the author of LukeModel and the HF weights are uploaded by me, so I am sure that it follows the intention of the original model. In summary, when some weights should be ignored as the correct behavior, what is the right way to handle that? |
I understand that this is a risk, but couldn't that be mitigated by specifying the correct regex? |
The problem here is the config and the model weight on the hub has inconsistent values. If the model is created with that value set to false, there would not have those extra keys in the model. It is unclear how the Hub author ends up with sich inconsistency. The fix should happen there. Hope this explanation makes things clear. But thank you for your willingness to fix and help transformers better ❤️ |
I believe there is still some misunderstanding.
The inconsistency is intended as having optional extra weights is a part of the model features.
Those extra keys (weights) are optional. |
To be clearer, the extra weights are in this part. transformers/src/transformers/models/luke/modeling_luke.py Lines 523 to 526 in abaca9f
These weights are NOT used in pretraining time, but can be optionally introduced at the fine-tuning time. |
I apologize for any confusion caused by my previous explanation, but I would like to request @NielsRogge's opinion on how to handle these warnings. He helped introduce LUKE in transformers. |
So those weights are not even trained during pretraining time ..? I am a bit confused here. Or it's trained for Luke but not mLuke?
In this case, the original model weights (the checkpoint on the Hub repo
I am wondering what prevents you to remove those extra weights on |
Thank you for your patience. What is entity-aware attention?LUKE and mLUKE take word tokens as well as entity tokens. At fine-tuning time, we can optionally add entity-aware attention. transformers/src/transformers/models/luke/convert_luke_original_pytorch_checkpoint_to_pytorch.py Lines 61 to 67 in abaca9f
So, the checkpoints include these copied weights regardless of whether users enable entity-aware attention at fine-tuning time.
Both LUKE and mLUKE are pretrained without entity-aware attention, but they can still use entity-aware attention by initializing new weights with the corresponding pretrained ones. Why is the default value of
|
Hi @ryokan0123 . Thank you for the detailed information. Looking the following 3 points you mentioned: To make sure, is those extra weights in
what you described could be easily achieved (point 3.) for a user to just specify
And this (different) warning make sense and should be kept. Let me know if you have further question to the above suggested way to (optionally) use/enable non-trained |
Yes, I know that is possible. By randomly initializing the new weights, the model performance would degrade as the model has to learn how to attend to other tokens from scratch in fine-tuning. transformers/src/transformers/models/luke/convert_luke_original_pytorch_checkpoint_to_pytorch.py Lines 61 to 67 in abaca9f
So, to achieve this and suppress warnings, I think there are some options🤔
|
Ok, thank you for the detailed information. I finally understand why you need those weights in the check point, as they are copied from some trained weight. I will have to think a bit more, but I feel the best is to add extra log message to explain the situation. I will come back to you. |
Could you take a look the following and see if you have any comment. I tried to make it short, but still need to explain things 🙏 Summary:
Two suggested actions:
The second approach may not be worth the effort (too much work). The first one isn't really good as |
Note that on main, the code sample provided at the beginning does not issue any warnings (just infos) since the class used (LukeModel) is not the same as the class of the checkpoint (LukeModelForMaskedLM). It's only when loading a model As for how to deal with this, the checkpoint mentioned does not use those extra weights (as seen here in the config) so it should probably not have them in the state dict. You can use the |
I see, it seems the sample code only issues warnings on Colab notebooks. Thank you, @sgugger, for the suggested solution. Using the variant parameter seems a better solution. |
What does this PR do?
Suppress the warnings when instantiating the LUKE models by adding
_keys_to_ignore_on_load_unexpected
.Promblem
Currently, when you instantiate certain LUKE models from the Hugging Face Hub, such as
you receive a warning indicating that a bunch of weights were not loaded.
This seems to depend on the logging setting and is observed on Google Colabo Notebooks.
https://colab.research.google.com/drive/1kYN3eGhx5tzEMnGkUz2jPsdmFyEBwxFA?usp=sharing
This behavior is expected since these weights are optional and only loaded when
use_entity_aware_attention
is set toTrue
. However, it has caused confusion among users, as evidenced by the following issues:studio-ousia/luke#174
https://huggingface.co/studio-ousia/mluke-base/discussions/2#63be8cc6c26a8a4d713ee08a
Solution
I added
_keys_to_ignore_on_load_unexpected
toLukePreTrainedModel
to ignore some unexpected keys in the pretrained weights.