entity-aware self-attention used in mLUKE #174

chantera · 2023-06-09T08:02:33Z

According to the mLUKE paper, mLUKE has not used entity-aware self-attention.

The word and entity tokens equally undergo self-attention computation (i.e., no entity-aware self-attention in Yamada et al. (2020)) after embedding layers.

However, the following code gives the warning message:
(The message can be suppressed by giving "use_entity_aware_attention=True".)

>>> model = transformers.AutoModel.from_pretrained("studio-ousia/mluke-base-lite")
Some weights of the model checkpoint at studio-ousia/mluke-base-lite were not used when initializing LukeModel: [
'luke.encoder.layer.0.attention.self.w2e_query.weight', 'luke.encoder.layer.0.attention.self.w2e_query.bias', 
'luke.encoder.layer.0.attention.self.e2w_query.weight', 'luke.encoder.layer.0.attention.self.e2w_query.bias', 
'luke.encoder.layer.0.attention.self.e2e_query.weight', 'luke.encoder.layer.0.attention.self.e2e_query.bias', 
...]

>>> model.config.use_entity_aware_attention
False

In fact, the public model contains weights for entity-aware self-attention.

>>> state_dict = torch.load("pytorch_model.bin")
>>> list(state_dict.keys())
[..., 
'luke.encoder.layer.0.attention.self.w2e_query.weight', 'luke.encoder.layer.0.attention.self.w2e_query.bias', 
'luke.encoder.layer.0.attention.self.e2w_query.weight', 'luke.encoder.layer.0.attention.self.e2w_query.bias', 
'luke.encoder.layer.0.attention.self.e2e_query.weight', 'luke.encoder.layer.0.attention.self.e2e_query.bias', 
...]

Could you make it clear whether mLUKE uses entity-aware self-attention?

config.json should specify use_entity_aware_attention: true unless pytorch_model.bin is updated with no weights for entity-aware self-attention.

The text was updated successfully, but these errors were encountered:

ryokan0123 · 2023-06-09T08:43:16Z

In our pretraining and fine-tuning experiments, the mLUKE models did not use entity-aware self-attention.

The attention weights related to entity-aware self-attention (EASA) (e.g., w2e..., e2w..., e2e...) are included in the published model weights for the cases where users want to try using the entity-aware self-attention at fine-tuning time.
The values of the EASA weights are identical to the corresponding w2w... weights, which is how these weights are initialized in the LUKE paper.
By setting use_entity_aware_attention: true, these weights are loaded into the model.

By default, use_entity_aware_attention is set to false and the EASA weights are ignored because it is the setting described in the mLUKE paper.

The warning is somewhat disturbing... but it is expected behavior.

chantera · 2023-06-09T09:11:27Z

The values of the EASA weights are identical to the corresponding w2w... weights, which is how these weights are initialized in the LUKE paper.

I appreciate the clarification. Now I understand.
So, use_entity_aware_attention: false is not misconfiguration and this ignores unused weights.

The warning is somewhat disturbing

To fix this confusing behavior, how about equipping a model with EASA weights regardless of the setting of use_entity_aware_attention?
In this case, the model simply skips computation for the EASA weights when use_entity_aware_attention=False.

ryokan0123 · 2023-06-09T09:20:43Z

how about equipping a model with EASA weights regardless of the setting of use_entity_aware_attention?

Thank you for your suggestion.
It can be an option, but I'm concerned that it adds an unnecessary memory footprint with the unused weights.

Other solutions I can come up with are...

Add a custom warning message to LukeModel to clarify what happens when users set use_entity_aware_attention: false.
Add an explanation to the model card on hugging face hub.

As this is not the first time this kind of confusion has occurred, I will definitely do the second one later soon.
Then, probably will send a PR to hugging face for the first option.

chantera · 2023-06-09T09:31:03Z

I think we can use PreTrainedModel._keys_to_ignore_on_load_unexpected [transformers/modeling_utils.py].

ryokan0123 · 2023-06-10T04:02:34Z

Thank you for the pointer! It is a clean solution.
I will send a PR to the transformers library to add options to the LUKE model.

ryokan0123 mentioned this issue Jul 7, 2023

Suppress warnings from LUKE for unexpected keys huggingface/transformers#24703

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

entity-aware self-attention used in mLUKE #174

entity-aware self-attention used in mLUKE #174

chantera commented Jun 9, 2023 •

edited

Loading

ryokan0123 commented Jun 9, 2023

chantera commented Jun 9, 2023

ryokan0123 commented Jun 9, 2023

chantera commented Jun 9, 2023

ryokan0123 commented Jun 10, 2023

entity-aware self-attention used in mLUKE #174

entity-aware self-attention used in mLUKE #174

Comments

chantera commented Jun 9, 2023 • edited Loading

ryokan0123 commented Jun 9, 2023

chantera commented Jun 9, 2023

ryokan0123 commented Jun 9, 2023

chantera commented Jun 9, 2023

ryokan0123 commented Jun 10, 2023

chantera commented Jun 9, 2023 •

edited

Loading