Merge OpenAI Triton commit `3bac3be` #3142

whitneywhtsang · 2025-01-11T05:28:44Z

This PR change the Triton base from 2b41842 to 3bac3be (Jan 10).
Pass rate: 99.89%->97.63% (#3141)

Please do not squash and merge this PR.

…he NVIDIA backend (#5553)

### Introduce the inclusive/exclusive/property metrics system This commit introduces a new metrics classification system with three types: - Inclusive metrics: Can be aggregated and propagated to parent - Exclusive metrics: Can be aggregated but not propagated - Property metrics: Cannot be aggregated or propagated Changes include: - Update Metric class to support new metric types - Add metric naming convention with `(inc)`, `(exc)`, `(pty)` suffixes - Remove aggregable parameter from addMetrics API - Update documentation in `README.md` - Add unit tests for new metric types - Update viewer to handle different metric types ### Add `cpu_timed_scope` to measure CPU time of scoped operations Changes include: - Records CPU time as an exclusive metric at the exit of a scope - Updates Python API to expose `cpu_timed_scope` - Adds documentation and tests ### Improve thread safety and context management We simplify and fix the thread model of proton. Changes include: - Replace `shared_mutex` with mutex for simplicity - Update the `shadow` context source to use thread-local context stacks and allow threads to inherit and shadow main context stack. - Add documentation for thread safety considerations ### Refactor scope and operation handling Changes include: - Rename `addScope` to `addOp` in Data for clarity - Add clear operation for data cleanup to save CPU DRAM - Modify scope interface implementation - Update tests and examples ### Fix profiler and session initializations Changes include: - Replace `initializedCount` with a boolean started flag and consider the number of registered data instead. Previously if we deactivate a session, the `initializedCount` doesn't get decreased. As a result, the profiler keeps running even if there's no registered data - Update session activation/deactivation by ordering interfaces based on their dependency - Update test cases

AMD CDNA3 architectures do not have native bf16 VALU instructions so doing bf16 scaling can be expensive. This commit prototypes upcasting to fp16 for computation. It would mean relaxing to support fp16 in dot_scaled frontend and upcast_mxfp op definitions. Right now the fp16 path is turned on if one input is fp16 for prototyping. A more explicit way might be introduced in the future.

The logic is incorrect; we previously missed it due to test skip conditions disabling the tests. So this commit also restructures the test skip conditions.

These have been contributed upstream. Also switch a few `std::` usages to `llvm::`.

Generalize unit tests for different backends, for example not hard coding `device` with `cuda`. --------- Signed-off-by: Whitney Tsang <[email protected]>

Signed-off-by: Whitney Tsang <[email protected]>

Jokeren and others added 10 commits January 10, 2025 12:02

[BACKEND] Remove all layout conversion decomposition functions from t…

37c7fc4

…he NVIDIA backend (#5553)

[AMD] Disable FP8E4M3FN to FP16 upcast (#5575)

110b66e

The logic is incorrect; we previously missed it due to test skip conditions disabling the tests. So this commit also restructures the test skip conditions.

[Triton] Remove unused min/max_element helpers (NFC) (#5573)

74de6b4

These have been contributed upstream. Also switch a few `std::` usages to `llvm::`.

Generalize unit tests for different backends (#5576)

2efb067

Generalize unit tests for different backends, for example not hard coding `device` with `cuda`. --------- Signed-off-by: Whitney Tsang <[email protected]>

[FRONTEND] rename nv_override_capability -> arch (#5579)

3bac3be

Merge commit '3bac3be56609c8f7286a244d4622ea72a2fc4402'

5bab216

[AccelerateMatmul] Sync from upstream

5f5feb0

Signed-off-by: Whitney Tsang <[email protected]>

[TEST] Fix test_typeconvert_upcast

16a54d6

Signed-off-by: Whitney Tsang <[email protected]>

whitneywhtsang requested a review from pbchekin January 11, 2025 05:28

whitneywhtsang self-assigned this Jan 11, 2025

pbchekin approved these changes Jan 11, 2025

View reviewed changes

whitneywhtsang marked this pull request as ready for review January 11, 2025 15:05

whitneywhtsang merged commit 16a54d6 into main Jan 11, 2025
5 checks passed

whitneywhtsang deleted the whitneywhtsang/merge branch January 11, 2025 15:09

whitneywhtsang mentioned this pull request Jan 17, 2025

Merge OpenAI Triton till Jan 18th #3091

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge OpenAI Triton commit `3bac3be` #3142

Merge OpenAI Triton commit `3bac3be` #3142

whitneywhtsang commented Jan 11, 2025 •

edited

Loading

Merge OpenAI Triton commit 3bac3be #3142

Merge OpenAI Triton commit 3bac3be #3142

Conversation

whitneywhtsang commented Jan 11, 2025 • edited Loading

Merge OpenAI Triton commit `3bac3be` #3142

Merge OpenAI Triton commit `3bac3be` #3142

whitneywhtsang commented Jan 11, 2025 •

edited

Loading