-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge OpenAI Triton commit 3bac3be
#3142
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…he NVIDIA backend (#5553)
### Introduce the inclusive/exclusive/property metrics system This commit introduces a new metrics classification system with three types: - Inclusive metrics: Can be aggregated and propagated to parent - Exclusive metrics: Can be aggregated but not propagated - Property metrics: Cannot be aggregated or propagated Changes include: - Update Metric class to support new metric types - Add metric naming convention with `(inc)`, `(exc)`, `(pty)` suffixes - Remove aggregable parameter from addMetrics API - Update documentation in `README.md` - Add unit tests for new metric types - Update viewer to handle different metric types ### Add `cpu_timed_scope` to measure CPU time of scoped operations Changes include: - Records CPU time as an exclusive metric at the exit of a scope - Updates Python API to expose `cpu_timed_scope` - Adds documentation and tests ### Improve thread safety and context management We simplify and fix the thread model of proton. Changes include: - Replace `shared_mutex` with mutex for simplicity - Update the `shadow` context source to use thread-local context stacks and allow threads to inherit and shadow main context stack. - Add documentation for thread safety considerations ### Refactor scope and operation handling Changes include: - Rename `addScope` to `addOp` in Data for clarity - Add clear operation for data cleanup to save CPU DRAM - Modify scope interface implementation - Update tests and examples ### Fix profiler and session initializations Changes include: - Replace `initializedCount` with a boolean started flag and consider the number of registered data instead. Previously if we deactivate a session, the `initializedCount` doesn't get decreased. As a result, the profiler keeps running even if there's no registered data - Update session activation/deactivation by ordering interfaces based on their dependency - Update test cases
AMD CDNA3 architectures do not have native bf16 VALU instructions so doing bf16 scaling can be expensive. This commit prototypes upcasting to fp16 for computation. It would mean relaxing to support fp16 in dot_scaled frontend and upcast_mxfp op definitions. Right now the fp16 path is turned on if one input is fp16 for prototyping. A more explicit way might be introduced in the future.
The logic is incorrect; we previously missed it due to test skip conditions disabling the tests. So this commit also restructures the test skip conditions.
These have been contributed upstream. Also switch a few `std::` usages to `llvm::`.
Generalize unit tests for different backends, for example not hard coding `device` with `cuda`. --------- Signed-off-by: Whitney Tsang <[email protected]>
Signed-off-by: Whitney Tsang <[email protected]>
Signed-off-by: Whitney Tsang <[email protected]>
pbchekin
approved these changes
Jan 11, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR change the Triton base from 2b41842 to 3bac3be (Jan 10).
Pass rate: 99.89%->97.63% (#3141)
Please do not squash and merge this PR.