clean up and make TMA, scheduling autotunable (#54) · pytorch-labs/tritonbench@c0a8479

Commit

clean up and make TMA, scheduling autotunable (#54)

Summary:
Variants:
- triton_tutorial_flash_v2_opt no TMA, with computation pipelining
- triton_tutorial_flash_v2_tma TMA, with computation pipelining
- triton_tutorial_flash_v2_tma_ws: TMA, with computation pipelining and Warp Spec
- triton_tutorial_flash_v2_ws: no TMA, with computation pipelining and Warp Spec

Pull Request resolved: #54

Test Plan:
```
CUDA_VISIBLE_DEVICES=5 TORCH_CUDA_ARCH_LIST=9.0a python run.py --op flash_attention --only triton_tutorial_flash_v2_opt,triton_tutorial_flash_v2_tma,triton_tutorial_flash_v2 --num-inputs 1 --seq-len 13 --metrics tflops --batch 8 --n-heads 16 --d-head 128

CUDA_VISIBLE_DEVICES=5 TORCH_CUDA_ARCH_LIST=9.0a python run.py --op flash_attention --only triton_tutorial_flash_v2_opt,triton_tutorial_flash_v2_tma,triton_tutorial_flash_v2 --num-inputs 1 --seq-len 13 --metrics accuracy --batch 8 --n-heads 16 --d-head 128 --baseline triton_tutorial_flash_v2

On compiler supporting WarpSpec:
CUDA_VISIBLE_DEVICES=5 TORCH_CUDA_ARCH_LIST=9.0a python run.py --op flash_attention --only triton_tutorial_flash_v2_ws,triton_tutorial_flash_v2_tma_ws,triton_tutorial_flash_v2 --num-inputs 1 --seq-len 13 --metrics tflops --batch 8 --n-heads 16 --d-head 128
CUDA_VISIBLE_DEVICES=5 TORCH_CUDA_ARCH_LIST=9.0a python run.py --op flash_attention --only triton_tutorial_flash_v2_ws,triton_tutorial_flash_v2_tma_ws,triton_tutorial_flash_v2 --num-inputs 1 --seq-len 13 --metrics accuracy --batch 8 --n-heads 16 --d-head 128 --baseline triton_tutorial_flash_v2
```

Reviewed By: htyu

Differential Revision: D66109428

Pulled By: manman-ren

fbshipit-source-id: 52d89e555ae717f2258dddfc17b4011414ef0e83

Loading branch information

manman-ren authored and facebook-github-bot committed Nov 18, 2024

1 parent a886d5f commit c0a8479

0 comments on commit `c0a8479`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `c0a8479`

Commit

There are no files selected for viewing

0 comments on commit c0a8479

0 comments on commit `c0a8479`