Add int8 to gemm w/ addmatrix #3040

alexbaden · 2024-12-18T14:07:05Z

Update the gemm addmatrix benchmark to support int8 inputs as well as bfloat16.

The int8 benchmark is pretty slow - not because Triton performance is bad (it is at least on par with bfloat16) but because PyTorch does not support int8 matmul on GPU, so we have to do the matmul on the GPU. This makes the benchmark something like 20x slower. To fix that, I changed the PyTorch accuracy check to only run for a few shapes instead of all the shapes - I tried to pick shapes that I thought were representative of different cases but am open to suggestions. Now the benchmark runs in reasonable time.

A few open items need to be addressed:

for int8 we want a separate geomean (i.e. geomean of bfloat16, and geomean of int8). What's the best way to keep int8 and bfloat16 separate? I can introduce an environment variable and run the benchmark twice - once only bfloat16, once only int8. Open to other suggestions.
for onednn comparison, there is no problem with bfloat16 but there is no support for GPU matmul w/ int8. I don't think we want to run the comparison vs CPU (it takes too long and gives us no info), so I might need to introduce the env variable anyway to do one run which is float16 w/ onednn, and another run which is int8 w/out onednn.

cc #3014

Egor-Krivov · 2024-12-18T14:37:06Z

I think we treat them as 2 separate benchmarks in terms of reporting (So we have 2 lines of --benchmark gemm-postop-addmatrix & --benchmark gemm-postop-addmatrix-int8). Then all geomeans will work as intended. Otherwise we'll have to introduce some geomean groups in our database.

Then we either run benchmark script twice with different dypes to generate 2 separate report files (I'd prefer this), or modify our report script to add some filtering capability.

About onednn and int8 support. Do we want to measure onednn? If not, and we run it only for validation, maybe we could run it in other precision, like fp32 or bf16 just for validation of the output.

Egor-Krivov · 2024-12-18T14:40:23Z

My only issue with this PR right now is that all charts and GeoMeans for addmatrix benchmark will be discontinued, due to new parameters. Hence, my suggestion to introduce separate benchmark for int8

alexbaden · 2025-01-06T15:30:57Z

I removed the onednn related bits because for onednn we only measure kernel time, and the add step is not fused into the main gemm kernel so the comparison would not be appropriate.

I modified the int8 code to only run as a separate benchmark step controllable by environment variable, so the existing bfloat16 time should not be affected. The default run mode is bfloat16, with an optional int8 mode or all dtypes mode for local runs.

benchmarks/triton_kernels_benchmark/gemm_postop_addmatrix_benchmark.py

whitneywhtsang · 2025-01-06T16:26:40Z

benchmarks/triton_kernels_benchmark/gemm_postop_addmatrix_benchmark.py

-        benchmark_suit.assert_close(triton_fn(), torch_fn(), atol=1e-4, rtol=rtol, err_msg='triton to torch')
+        if dtype.is_floating_point or [B, M, N, K] in [[1, 1024, 1024, 1024], [1, 2048, 2048, 2048],
+                                                       [1, 512, 8192, 32768], [4, 32768, 4096, 128]]:
+            # torch int8 matmul on GPU is not supported. only check a few int8 shapes to reduce runtime


Should we have a env var for all benchmarks to control if we verify the result?
Don't think we should skip checking correctness for some shapes.

The correctness checks are very slow for int8 because they have to be done on CPU. I don't think it is worth doubling (at least) the benchmarks runtime to do correctness checks for all shapes. I tried to pick a small sampling of different shapes (batched and not batched) so we had some int8 correctness checks.

In general, I agree is a good compromise due to torch int8 matmul not supported on GPU, but my concern is it is not obvious to users that some int8 shapes are not tested for correctness. I don't have a better suggestion anyways.

If we wanted to do a CI run I would add the env variable, but if a user is running the benchmarks they can just comment out the if (and unindent the line after it).

add int8 to addmatrix benchmark 2/? add int8 to addmatrix benchmark 3/3

alexbaden requested review from pbchekin, Egor-Krivov, anmyachev and whitneywhtsang December 18, 2024 14:07

vlad-penkin linked an issue Dec 18, 2024 that may be closed by this pull request

Add onednn to gemm benchmarks #3014

Open

alexbaden force-pushed the alex/block_adds_benchmark branch from 164fd00 to 3cc0b6e Compare January 6, 2025 15:29

alexbaden changed the title ~~Add int8 to gemm w/ addmatrix and consider onednn provider~~ Add int8 to gemm w/ addmatrix Jan 6, 2025

alexbaden marked this pull request as ready for review January 6, 2025 15:30

whitneywhtsang reviewed Jan 6, 2025

View reviewed changes

alexbaden force-pushed the alex/block_adds_benchmark branch 2 times, most recently from 1146674 to 530d13c Compare January 9, 2025 15:26

alexbaden requested a review from whitneywhtsang January 9, 2025 17:25

alexbaden added 7 commits January 9, 2025 20:52

add int8 to addmatrix benchmark 1/?

40af3f1

add int8 to addmatrix benchmark 2/? add int8 to addmatrix benchmark 3/3

Split int8 and bfloat16 runs for benchmarks

4218c9b

fix format

6ff45b7

fixup: add int8 file suffix for int8 only runs

dd1e062

address review comments + lint

5f50cd0

fix benchmark filenames

028d544

fixup

72dfa2a

alexbaden force-pushed the alex/block_adds_benchmark branch from 530d13c to 72dfa2a Compare January 9, 2025 20:52

whitneywhtsang approved these changes Jan 9, 2025

View reviewed changes

alexbaden merged commit e27d722 into main Jan 10, 2025
6 checks passed

alexbaden deleted the alex/block_adds_benchmark branch January 10, 2025 01:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add int8 to gemm w/ addmatrix #3040

Add int8 to gemm w/ addmatrix #3040

alexbaden commented Dec 18, 2024 •

edited

Loading

Egor-Krivov commented Dec 18, 2024 •

edited

Loading

Egor-Krivov commented Dec 18, 2024

alexbaden commented Jan 6, 2025

whitneywhtsang Jan 6, 2025

alexbaden Jan 8, 2025

whitneywhtsang Jan 9, 2025

alexbaden Jan 9, 2025

Add int8 to gemm w/ addmatrix #3040

Add int8 to gemm w/ addmatrix #3040

Conversation

alexbaden commented Dec 18, 2024 • edited Loading

Egor-Krivov commented Dec 18, 2024 • edited Loading

Egor-Krivov commented Dec 18, 2024

alexbaden commented Jan 6, 2025

whitneywhtsang Jan 6, 2025

Choose a reason for hiding this comment

alexbaden Jan 8, 2025

Choose a reason for hiding this comment

whitneywhtsang Jan 9, 2025

Choose a reason for hiding this comment

alexbaden Jan 9, 2025

Choose a reason for hiding this comment

alexbaden commented Dec 18, 2024 •

edited

Loading

Egor-Krivov commented Dec 18, 2024 •

edited

Loading