Flash Attention for Neuron #939

apoorvtintin · 2025-01-21T23:15:13Z

This PR adds support for flash attention kernel for Neuron implemented through Neuron Kernel Interface (NKI).

The flash attention kernel works with TRN1 and TRN2.

This PR is a newer version of #883 from a different fork. All comments from the previous PR are addressed in this one. It has dropout support.

Dropout and Segment ID support in the flash attention kernel is in progress and will be available at a later date.

kelvin-zou

Maybe wait until this PR is checked in. From what i can tell, your PR also has the remat bug not fixed. #942 (review)

axlearn/common/flash_attention/neuron_attention.py

axlearn/common/flash_attention/utils.py

ruomingp · 2025-01-22T15:03:22Z

axlearn/common/flash_attention/neuron_attention.py

+
+
+def _mha_forward(query, key, value, bias, causal, softmax_scale, dropout_rate):
+    # Get the batch size, sequence lengths, number of heads, and hidden dimension


Nit: end comments with . (here and everywhere)

axlearn/common/flash_attention/neuron_attention.py

kelvin-zou · 2025-01-23T02:38:20Z

axlearn/common/flash_attention/neuron_attention.py

+    key: Tensor,
+    value: Tensor,
+    bias: Tensor,
+    causal: bool = False,


Can we support segment ID? Or a more general masking fn (with optimized handling) is even better.

If not, I am fine with leaving a TODO here, but it is a hard blocker for enabling it for our internal training.

Can we do segment IDs in a separate PR? That involves non-trivial work and needs some time.

apoorvtintin · 2025-01-23T02:39:49Z

Thanks for all the reviews @ruomingp @kelvin-zou. I resolved all the comments, please let me know if any more changes are needed.

axlearn/common/flash_attention/neuron_attention.py

apoorvtintin requested review from ruomingp, markblee and a team as code owners January 21, 2025 23:15

apoorvtintin mentioned this pull request Jan 21, 2025

Flash Attention for Neuron #883

Closed

kelvin-zou reviewed Jan 22, 2025

View reviewed changes

axlearn/common/flash_attention/neuron_attention.py Show resolved Hide resolved

axlearn/common/flash_attention/neuron_attention.py Show resolved Hide resolved

ruomingp reviewed Jan 22, 2025

View reviewed changes

apoorvtintin force-pushed the mainline_upstream_fa branch 3 times, most recently from f40c4cc to 8a92182 Compare January 23, 2025 02:16

Flash Attention for Neuron

73a2808

apoorvtintin force-pushed the mainline_upstream_fa branch from 8a92182 to 73a2808 Compare January 23, 2025 02:32

kelvin-zou reviewed Jan 23, 2025

View reviewed changes

apoorvtintin requested review from ruomingp and kelvin-zou January 23, 2025 02:39

ruomingp approved these changes Jan 23, 2025

View reviewed changes

axlearn/common/flash_attention/neuron_attention.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash Attention for Neuron #939

Flash Attention for Neuron #939

apoorvtintin commented Jan 21, 2025 •

edited

Loading

kelvin-zou left a comment

ruomingp Jan 22, 2025

kelvin-zou Jan 23, 2025

kelvin-zou Jan 23, 2025

indhub Jan 23, 2025

apoorvtintin commented Jan 23, 2025



		def _mha_forward(query, key, value, bias, causal, softmax_scale, dropout_rate):
		# Get the batch size, sequence lengths, number of heads, and hidden dimension

Flash Attention for Neuron #939

Are you sure you want to change the base?

Flash Attention for Neuron #939

Conversation

apoorvtintin commented Jan 21, 2025 • edited Loading

kelvin-zou left a comment

Choose a reason for hiding this comment

ruomingp Jan 22, 2025

Choose a reason for hiding this comment

kelvin-zou Jan 23, 2025

Choose a reason for hiding this comment

kelvin-zou Jan 23, 2025

Choose a reason for hiding this comment

indhub Jan 23, 2025

Choose a reason for hiding this comment

apoorvtintin commented Jan 23, 2025

apoorvtintin commented Jan 21, 2025 •

edited

Loading