Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

entity-aware self-attention used in mLUKE #174

Open
chantera opened this issue Jun 9, 2023 · 5 comments
Open

entity-aware self-attention used in mLUKE #174

chantera opened this issue Jun 9, 2023 · 5 comments

Comments

@chantera
Copy link
Contributor

chantera commented Jun 9, 2023

According to the mLUKE paper, mLUKE has not used entity-aware self-attention.

The word and entity tokens equally undergo self-attention computation (i.e., no entity-aware self-attention in Yamada et al. (2020)) after embedding layers.

However, the following code gives the warning message:
(The message can be suppressed by giving "use_entity_aware_attention=True".)

>>> model = transformers.AutoModel.from_pretrained("studio-ousia/mluke-base-lite")
Some weights of the model checkpoint at studio-ousia/mluke-base-lite were not used when initializing LukeModel: [
'luke.encoder.layer.0.attention.self.w2e_query.weight', 'luke.encoder.layer.0.attention.self.w2e_query.bias', 
'luke.encoder.layer.0.attention.self.e2w_query.weight', 'luke.encoder.layer.0.attention.self.e2w_query.bias', 
'luke.encoder.layer.0.attention.self.e2e_query.weight', 'luke.encoder.layer.0.attention.self.e2e_query.bias', 
...]

>>> model.config.use_entity_aware_attention
False

In fact, the public model contains weights for entity-aware self-attention.

>>> state_dict = torch.load("pytorch_model.bin")
>>> list(state_dict.keys())
[..., 
'luke.encoder.layer.0.attention.self.w2e_query.weight', 'luke.encoder.layer.0.attention.self.w2e_query.bias', 
'luke.encoder.layer.0.attention.self.e2w_query.weight', 'luke.encoder.layer.0.attention.self.e2w_query.bias', 
'luke.encoder.layer.0.attention.self.e2e_query.weight', 'luke.encoder.layer.0.attention.self.e2e_query.bias', 
...]

Could you make it clear whether mLUKE uses entity-aware self-attention?

config.json should specify use_entity_aware_attention: true unless pytorch_model.bin is updated with no weights for entity-aware self-attention.

@ryokan0123
Copy link
Contributor

In our pretraining and fine-tuning experiments, the mLUKE models did not use entity-aware self-attention.

The attention weights related to entity-aware self-attention (EASA) (e.g., w2e..., e2w..., e2e...) are included in the published model weights for the cases where users want to try using the entity-aware self-attention at fine-tuning time.
The values of the EASA weights are identical to the corresponding w2w... weights, which is how these weights are initialized in the LUKE paper.
By setting use_entity_aware_attention: true, these weights are loaded into the model.

By default, use_entity_aware_attention is set to false and the EASA weights are ignored because it is the setting described in the mLUKE paper.

The warning is somewhat disturbing... but it is expected behavior.

@chantera
Copy link
Contributor Author

chantera commented Jun 9, 2023

The values of the EASA weights are identical to the corresponding w2w... weights, which is how these weights are initialized in the LUKE paper.

I appreciate the clarification. Now I understand.
So, use_entity_aware_attention: false is not misconfiguration and this ignores unused weights.

The warning is somewhat disturbing

To fix this confusing behavior, how about equipping a model with EASA weights regardless of the setting of use_entity_aware_attention?
In this case, the model simply skips computation for the EASA weights when use_entity_aware_attention=False.

@ryokan0123
Copy link
Contributor

how about equipping a model with EASA weights regardless of the setting of use_entity_aware_attention?

Thank you for your suggestion.
It can be an option, but I'm concerned that it adds an unnecessary memory footprint with the unused weights.

Other solutions I can come up with are...

  • Add a custom warning message to LukeModel to clarify what happens when users set use_entity_aware_attention: false.
  • Add an explanation to the model card on hugging face hub.

As this is not the first time this kind of confusion has occurred, I will definitely do the second one later soon.
Then, probably will send a PR to hugging face for the first option.

@chantera
Copy link
Contributor Author

chantera commented Jun 9, 2023

I think we can use PreTrainedModel._keys_to_ignore_on_load_unexpected [transformers/modeling_utils.py].

@ryokan0123
Copy link
Contributor

Thank you for the pointer! It is a clean solution.
I will send a PR to the transformers library to add options to the LUKE model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants