Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge OpenAI Triton commit 4dac289 #265

Merged
merged 13 commits into from
Jan 16, 2024
Merged

Conversation

whitneywhtsang
Copy link
Contributor

@whitneywhtsang whitneywhtsang commented Jan 16, 2024

This PR change the Triton base from bbfdc0d to 4dac289 (Jan 11).

Please do not squash and merge this PR.

ptillet and others added 12 commits January 9, 2024 22:21
AMD is enabled by default, but not ripe for usage (not tested). Lots of
work will be necessary to make everything robust and maintainable.
Solves triton-lang/triton#2898 .

With the [MLIR VS
Code](https://marketplace.visualstudio.com/items?itemName=llvm-vs-code-extensions.vscode-mlir)
plugin, here is how the result looks like:

<img width="1195" alt="image"
src="https://github.com/openai/triton/assets/23236638/529c02a0-6448-4221-90fc-78d5d416356e">

Further efforts require managing the file extension to be `.mlir` rather
than `.ttlr`.
…tup.py (#2906)

Init submodule before trying to check if something is in it
…ods of defining target link libraries (#2907)

Cmake requires that you either specify PUBLIC/PRIVATE keyword in
target_link_libraries, or you don't. Mixing two methods is not
supported.
#2908)

* Adding new `tl.clamp(x, min, max, propagate_nan)` function to triton
language. Lowering it to a sequence of minimum(x, maximum(x, min), max)
in the general case, and to `min.xorsign.abs` inline assembly when
`clamp(x, -limit, limit)` is detected.
* Refactoring the `tl.PropagateNan` enum, so it is defined directly in
MLIR and exported to python FE.
* New tests for clamp and symmetric clamp
Those tests are deprecated, since we have comprehensive test_conversions
now
…t now (#2911)

This PR triton-lang/triton#2887 removes
`third_party/triton_shared`, and the corresponding test should be
removed. Otherwise it will fail (and now it indeed fail) all the CI
tests.
On Hopper when storing mma tensor to shared memory we can use stmatrix
to reduce the number of store intrusctions. This give a very small
improvements to the epilogue for fp16 output. It will later be combined
with cp.async.bulked to improve performance further.
`DistributedEncodingTrait::getCTAOrder()` returns a SmallVector by
value, which is deleted as soon as it is assigned to `ref`. `ref` then
becomes a dangling reference.

To prevent that, we now use a vector instead of an array reference.
@whitneywhtsang whitneywhtsang self-assigned this Jan 16, 2024
@whitneywhtsang whitneywhtsang force-pushed the whitneywhtsang/merge branch 2 times, most recently from e505031 to 7e28281 Compare January 16, 2024 03:41
@whitneywhtsang whitneywhtsang changed the title Merge OpenAI Triton commit 9a38395 Merge OpenAI Triton commit 4dac289 Jan 16, 2024
@whitneywhtsang whitneywhtsang merged commit 7f911ad into llvm-target Jan 16, 2024
2 of 3 checks passed
@whitneywhtsang whitneywhtsang deleted the whitneywhtsang/merge branch January 16, 2024 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants