Skip to content

Commit

Permalink
Add more doc details
Browse files Browse the repository at this point in the history
  • Loading branch information
msaroufim authored Oct 15, 2024
1 parent fd93e82 commit a17eb3a
Showing 1 changed file with 27 additions and 1 deletion.
28 changes: 27 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,18 @@ Please make sure your Triton compiler is v2.1 or later, and is from the OpenAI T
You can also install in your python venv using latest wheels:
`pip install --pre pytorch-triton-rocm torch --index-url https://download.pytorch.org/whl/nightly/rocm6.2`

## How to add your own benchmark

This repo is configured with a custom Github runner donated by AMD. You can queue jobs to this runner by either merging your code or by opening a pull request. We don't need to merge your code for you to run benchmarks.

The main things you need to run your own benchmark
1. In `kernels/` create a new file that must start with the name `test_`. This is because we use `pytest` to discover your kernel
2. If you want your benchmark results to persist in a Github Artifact, we recommend using the builtin Triton `benchmark.run(save_path="./perf-artifacts/your_kernel", show_plots=True, print_data=True)`

Have fun! We intend for this to be a social repo, if you have any other requests for things we could do better please let us know!

## Existing benchmarks

## `test_flash-attention.py`

This script contains the Flash Attention kernel with the following support
Expand Down Expand Up @@ -37,4 +49,18 @@ Kernel that implements RMS Norm over a row of tensor.
Kernel that implements Layer Normalization over a row on tensor

## `test_dotproduct.py`
Kernel that implements the dot product of two vectors
Kernel that implements the dot product of two vectors

# Dev roadmap

CI changes
* [ ] Doesn't make sense to run the full benchmark suite on each PR, instead only run changed files
* [ ] Considering we have a node, running the tests sequentially seems like a miss, instead should allocate a test to a free gpu. Investigate tech like `pytest-xdist`
* [ ] Setting up triton env takes a few min, we should cache this since it almost never changes

UX changes
Instead of submitting jobs via Github we could do it via Discord. UX would be a
* [ ] user submits a kernel.py in #rocm channel on discord.gg/gpumode and that gets picked up a Discord bot
* [ ] Given a script, use the bot to automatically open a PR for benchmarking. This can be done thanks to tools like https://github.com/PyGithub/PyGithub
* [ ] Once the triggered Github action is complete the bot can reply to the original user message with a link to the generated Github artifact. If the job fails then the bot should link to the failed Github Action
* [ ] Nice to have would be to give users a sense of their position on the queue

0 comments on commit a17eb3a

Please sign in to comment.