Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge OpenAI Triton commit 3bac3be #3142

Merged
merged 10 commits into from
Jan 11, 2025
Merged

Merge OpenAI Triton commit 3bac3be #3142

merged 10 commits into from
Jan 11, 2025

Conversation

whitneywhtsang
Copy link
Contributor

@whitneywhtsang whitneywhtsang commented Jan 11, 2025

This PR change the Triton base from 2b41842 to 3bac3be (Jan 10).
Pass rate: 99.89%->97.63% (#3141)

Please do not squash and merge this PR.

Jokeren and others added 10 commits January 10, 2025 12:02
### Introduce the inclusive/exclusive/property metrics system

This commit introduces a new metrics classification system with three
types:
- Inclusive metrics: Can be aggregated and propagated to parent
- Exclusive metrics: Can be aggregated but not propagated
- Property metrics: Cannot be aggregated or propagated

Changes include:

- Update Metric class to support new metric types
- Add metric naming convention with `(inc)`, `(exc)`, `(pty)` suffixes
- Remove aggregable parameter from addMetrics API
- Update documentation in `README.md`
- Add unit tests for new metric types
- Update viewer to handle different metric types

### Add `cpu_timed_scope` to measure CPU time of scoped operations

Changes include:

- Records CPU time as an exclusive metric at the exit of a scope
- Updates Python API to expose `cpu_timed_scope`
- Adds documentation and tests

### Improve thread safety and context management

We simplify and fix the thread model of proton.

Changes include:

- Replace `shared_mutex` with mutex for simplicity
- Update the `shadow` context source to use thread-local context stacks
and allow threads to inherit and shadow main context stack.
- Add documentation for thread safety considerations

### Refactor scope and operation handling

Changes include:

- Rename `addScope` to `addOp` in Data for clarity
- Add clear operation for data cleanup to save CPU DRAM
- Modify scope interface implementation
- Update tests and examples

### Fix profiler and session initializations

Changes include:

- Replace `initializedCount` with a boolean started flag and consider
the number of registered data instead. Previously if we deactivate a
session, the `initializedCount` doesn't get decreased. As a result, the
profiler keeps running even if there's no registered data
- Update session activation/deactivation by ordering interfaces based on
their dependency
- Update test cases
AMD CDNA3 architectures do not have native bf16 VALU instructions so
doing bf16 scaling can be expensive.

This commit prototypes upcasting to fp16 for computation. It would mean
relaxing to support fp16 in dot_scaled frontend and upcast_mxfp op
definitions.

Right now the fp16 path is turned on if one input is fp16 for
prototyping. A more explicit way might be introduced in the future.
The logic is incorrect; we previously missed it due to test skip
conditions disabling the tests. So this commit also restructures the
test skip conditions.
These have been contributed upstream. Also switch a few `std::` usages
to `llvm::`.
Generalize unit tests for different backends, for example not hard
coding `device` with `cuda`.

---------

Signed-off-by: Whitney Tsang <[email protected]>
@whitneywhtsang whitneywhtsang self-assigned this Jan 11, 2025
@whitneywhtsang whitneywhtsang marked this pull request as ready for review January 11, 2025 15:05
@whitneywhtsang whitneywhtsang merged commit 16a54d6 into main Jan 11, 2025
5 checks passed
@whitneywhtsang whitneywhtsang deleted the whitneywhtsang/merge branch January 11, 2025 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants