feat: Add gradient testing for Flash Attention 2 #35780

crStiv · 2025-01-20T00:40:27Z

This commit adds gradient testing for Flash Attention 2 (FA2) implementation.
The test ensures that gradients computed with FA2 match those computed with
the eager implementation within specified tolerance thresholds.

Key changes:

Added gradient testing in train mode
Compare gradients between eager and SDPA implementations
Use same tolerance thresholds as forward pass comparison
Properly cleanup gradients and restore eval mode

This addresses the TODO comment in test_modeling_common.py and improves
test coverage for FA2 implementation.

ArthurZucker

Nice! I think the test can be simplified : no longer needs 2 attention models as you can just set the config._attn_implementation for newer models!

Update test_modeling_common.py

ba09dee

crStiv requested review from Rocketknight1 and ArthurZucker as code owners January 20, 2025 00:40

ArthurZucker reviewed Jan 21, 2025

View reviewed changes

Update test_modeling_common.py

3feb46b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add gradient testing for Flash Attention 2 #35780

feat: Add gradient testing for Flash Attention 2 #35780

crStiv commented Jan 20, 2025

ArthurZucker left a comment

feat: Add gradient testing for Flash Attention 2 #35780

Are you sure you want to change the base?

feat: Add gradient testing for Flash Attention 2 #35780

Conversation

crStiv commented Jan 20, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment