Skip to content

Commit

Permalink
[4.4.01] Patches to 4.4.01 (#2327)
Browse files Browse the repository at this point in the history
* Restore size_t as default offset, in Tribits builds (#2313)

If building KokkosKernels standalone, leave int as the default offset
(this was the case since #2140). But if building KokkosKernels as a
Trilinos/Tribits package, then make size_t the default offset because
this is what the Tpetra stack currently uses.

Signed-off-by: Brian Kelley <bmkelle@sandia.gov>

* Improve crs/bsr sorting performance (#2293)

* CRS sorting improvements

- Wrote bulk sort/permutation based sorting for CRS graph, matrix, and
  BSR matrix (bulk = one large sort of all the entries, using row-major
  dense index as keys)
  - This is more performant for imbalanced entries per row
- If matrix dimensions are too large to do bulk sort, fall back to
  sorting within each row with a thread.

* Add perf test for sort_crs_matrix
* sort_crs: improve parallel labels
* Work around kokkos issue 7036
* sort_crs: replace radix sort lambda with functor
(Lambda segfaults with nvcc+openmp)
---------
Signed-off-by: Brian Kelley <bmkelle@sandia.gov>

* SpAdd handle: delete sort_option getter/setter (#2296)

SpAdd handle was originally a copy-paste of the spgemm
handle way back in #122, and included get_sort_option() and
set_sort_option() from spgemm. But these try to use the member
bool sort_option, which doesn't exist. Somehow these functions never
produced compile errors until someone tried to call them.

* Improve GH action to produce release artifacts (#2312)

* coo2csr: add parens to function calls (#2318)

* Update changelog

* Update master_history.txt

* .github/workflows: Group jobs under common github-AT2 name (#2320)

* Update master_history.txt

---------

Signed-off-by: Brian Kelley <bmkelle@sandia.gov>
Co-authored-by: brian-kelley <bmkelle@sandia.gov>
Co-authored-by: Damien L-G <dalg24@gmail.com>
Co-authored-by: Carl Pearson <cwpearson@users.noreply.github.com>
Co-authored-by: Evan Harvey <57234914+e10harvey@users.noreply.github.com>
5 people authored Sep 12, 2024
1 parent ea2c3ff commit 336ee5f
Showing 18 changed files with 848 additions and 594 deletions.
29 changes: 29 additions & 0 deletions .github/workflows/at2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: github-AT2

on:
pull_request:
paths-ignore:
- '**/*.rst'
- '**/*.md'
- '**/requirements.txt'
- '**/*.py'
- 'docs/**'
types: [ opened, reopened, synchronize ]

permissions:
contents: none

# Cancels any in progress 'workflow' associated with this PR
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
mi210:
uses: ./.github/workflows/mi210.yml
h100:
uses: ./.github/workflows/h100.yml
bdw:
uses: ./.github/workflows/bdw.yml
#spr:
#uses: ./.github/workflows/spr.yml
22 changes: 2 additions & 20 deletions .github/workflows/bdw.yml
Original file line number Diff line number Diff line change
@@ -1,25 +1,7 @@
name: github-BDW
name: Reusable BDW workflow

on:
pull_request:
paths-ignore:
- '**/*.rst'
- '**/*.md'
- '**/requirements.txt'
- '**/*.py'
- 'docs/**'
types: [ opened, reopened, synchronize ]
pull_request_review:
types:
- submitted

permissions:
contents: none

# Cancels any in progress 'workflow' associated with this PR
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
workflow_call

jobs:
# PR_BDW_GNU1020_OPENMP_LEFT_REL_NOETI:
23 changes: 2 additions & 21 deletions .github/workflows/h100.yml
Original file line number Diff line number Diff line change
@@ -1,26 +1,7 @@
name: github-H100
name: Reusable H100 workflow

# Only allow manual runs until at2 runners are available.
on:
pull_request:
paths-ignore:
- '**/*.rst'
- '**/*.md'
- '**/requirements.txt'
- '**/*.py'
- 'docs/**'
types: [ opened, reopened, synchronize ]
pull_request_review:
types:
- submitted

permissions:
contents: none

# Cancels any in progress 'workflow' associated with this PR
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
workflow_call

jobs:
PR_HOPPER90_CUDA1180_CUDA_LEFT_RIGHT_REL:
24 changes: 3 additions & 21 deletions .github/workflows/mi210.yml
Original file line number Diff line number Diff line change
@@ -1,25 +1,7 @@
name: github-MI210
name: Reusable MI210 workflow

on:
pull_request:
paths-ignore:
- '**/*.rst'
- '**/*.md'
- '**/requirements.txt'
- '**/*.py'
- 'docs/**'
types: [ opened, reopened, synchronize ]
pull_request_review:
types:
- submitted

permissions:
contents: none

# Cancels any in progress 'workflow' associated with this PR
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
on:
workflow_call

jobs:
# PR_VEGA90A_ROCM561_HIP_SERIAL_LEFT_REL:
36 changes: 12 additions & 24 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -13,33 +13,26 @@ jobs:
hashes: ${{ steps.hash.outputs.hashes }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Build artifacts
run: |
git archive -o kokkos-kernels-${{ github.ref_name }}.zip HEAD
git archive -o kokkos-kernels-${{ github.ref_name }}.tar.gz HEAD
git archive --prefix=kokkos-kernels-${{ github.ref_name }}/ -o kokkos-kernels-${{ github.ref_name }}.zip HEAD
git archive --prefix=kokkos-kernels-${{ github.ref_name }}/ -o kokkos-kernels-${{ github.ref_name }}.tar.gz HEAD
- name: Generate hashes
shell: bash
id: hash
run: |
# sha256sum generates sha256 hash for all artifacts.
# base64 -w0 encodes to base64 and outputs on a single line.
echo "hashes=$(sha256sum kokkos-kernels-${{ github.ref_name }}.zip kokkos-kernels-${{ github.ref_name }}.tar.gz | base64 -w0)" >> "$GITHUB_OUTPUT"
sha256sum kokkos-kernels-${{ github.ref_name }}.zip kokkos-kernels-${{ github.ref_name }}.tar.gz > kokkos-kernels-${{ github.ref_name }}-SHA-256.txt
echo "hashes=$(base64 -w0 kokkos-kernels-${{ github.ref_name }}-SHA-256.txt)" >> "$GITHUB_OUTPUT"
- name: Upload source code (zip)
uses: actions/upload-artifact@89ef406dd8d7e03cfd12d9e0a4a378f454709029 # v4.3.5
- name: Upload artifacts
uses: actions/upload-artifact@50769540e7f4bd5e21e526ee35c689e35e0d6874 # v4.4.0
with:
name: kokkos-kernels-${{ github.ref_name }}.zip
path: kokkos-kernels-${{ github.ref_name }}.zip
if-no-files-found: error
retention-days: 5

- name: Upload source code (tar.gz)
uses: actions/upload-artifact@89ef406dd8d7e03cfd12d9e0a4a378f454709029 # v4.3.5
with:
name: kokkos-kernels-${{ github.ref_name }}.tar.gz
path: kokkos-kernels-${{ github.ref_name }}.tar.gz
name: release-artifacts
path: kokkos-kernels-${{ github.ref_name }}*
if-no-files-found: error
retention-days: 5

@@ -65,19 +58,14 @@ jobs:
runs-on: ubuntu-latest
if: startsWith(github.ref, 'refs/tags/')
steps:
- name: Download kokkos-kernels-${{ github.ref_name }}.zip
- name: Download artifacts
uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
with:
name: kokkos-kernels-${{ github.ref_name }}.zip

- name: Download kokkos-kernels-${{ github.ref_name }}.tar.gz
uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
with:
name: kokkos-kernels-${{ github.ref_name }}.tar.gz

name: release-artifacts
- name: Upload assets
uses: softprops/action-gh-release@c062e08bd532815e2082a85e87e3ef29c3e6d191 # v2.0.8
with:
files: |
kokkos-kernels-${{ github.ref_name }}.zip
kokkos-kernels-${{ github.ref_name }}.tar.gz
kokkos-kernels-${{ github.ref_name }}-SHA-256.txt
25 changes: 3 additions & 22 deletions .github/workflows/spr.yml
Original file line number Diff line number Diff line change
@@ -1,26 +1,7 @@
name: github-SPR
name: Reusable SPR workflow

# Only allow manual runs until at2 runners are available.
on: workflow_dispatch
#pull_request:
# paths-ignore:
# - '**/*.rst'
# - '**/*.md'
# - '**/requirements.txt'
# - '**/*.py'
# - 'docs/**'
# types: [ opened, reopened, synchronize ]
#pull_request_review:
# types:
# - submitted

permissions:
contents: none

# Cancels any in progress 'workflow' associated with this PR
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
on:
workflow_call

jobs:
PR_SPR_ONEAPI202310_OPENMP_LEFT_MKLBLAS_MKLLAPACK_REL:
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,19 @@
# Change Log

## [4.4.01](https://github.com/kokkos/kokkos-kernels/tree/4.4.01)
[Full Changelog](https://github.com/kokkos/kokkos-kernels/compare/4.4.00...4.4.01)

### Build System:
- Restore size_t as default offset, in Tribits builds [\#2313](https://github.com/kokkos/kokkos-kernels/pull/2313)

### Enhancements:
- Improve crs/bsr sorting performance [\#2293](https://github.com/kokkos/kokkos-kernels/pull/2293)

### Bug Fixes:
- SpAdd handle: delete sort_option getter/setter [\#2296](https://github.com/kokkos/kokkos-kernels/pull/2296)
- Improve GH action to produce release artifacts [\#2312](https://github.com/kokkos/kokkos-kernels/pull/2312)
- coo2csr: add parens to function calls [\#2318](https://github.com/kokkos/kokkos-kernels/pull/2318)

## [4.4.00](https://github.com/kokkos/kokkos-kernels/tree/4.4.00)
[Full Changelog](https://github.com/kokkos/kokkos-kernels/compare/4.3.01...4.4.00)

18 changes: 14 additions & 4 deletions cmake/kokkoskernels_eti_offsets.cmake
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
SET(KOKKOSKERNELS_INST_OFFSET_SIZE_T_DEFAULT OFF)
SET(KOKKOSKERNELS_INST_OFFSET_INT_DEFAULT ${KOKKOSKERNELS_ADD_DEFAULT_ETI})
IF(KOKKOSKERNELS_HAS_TRILINOS)
# In a Trilinos build, size_t is the default offset because this is what Tpetra uses
# TODO: update this when Tpetra can use different offsets
SET(KOKKOSKERNELS_INST_OFFSET_SIZE_T_DEFAULT ${KOKKOSKERNELS_ADD_DEFAULT_ETI})
SET(KOKKOSKERNELS_INST_OFFSET_INT_DEFAULT OFF)
ELSE()
# But in a standalone KokkosKernels build, int is the default offset type
# This provides the maximum TPL compatibility
SET(KOKKOSKERNELS_INST_OFFSET_SIZE_T_DEFAULT OFF)
SET(KOKKOSKERNELS_INST_OFFSET_INT_DEFAULT ${KOKKOSKERNELS_ADD_DEFAULT_ETI})
ENDIF()

SET(OFFSETS
OFFSET_INT
OFFSET_SIZE_T
@@ -12,14 +22,14 @@ KOKKOSKERNELS_ADD_OPTION(
INST_OFFSET_INT
${KOKKOSKERNELS_INST_OFFSET_INT_DEFAULT}
BOOL
"Whether to pre instantiate kernels for the offset type int. This option is KokkosKernels_INST_OFFSET_INT=OFF by default. Default: OFF"
"Whether to pre instantiate kernels for the offset type int. This option is KokkosKernels_INST_OFFSET_INT=OFF by default. Default: ${KOKKOSKERNELS_INST_OFFSET_INT_DEFAULT}"
)

KOKKOSKERNELS_ADD_OPTION(
INST_OFFSET_SIZE_T
${KOKKOSKERNELS_INST_OFFSET_SIZE_T_DEFAULT}
BOOL
"Whether to pre instantiate kernels for the offset type size_t. This option is KokkosKernels_INST_OFFSET_SIZE_T=ON by default. Default: ON"
"Whether to pre instantiate kernels for the offset type size_t. This option is KokkosKernels_INST_OFFSET_SIZE_T=ON by default. Default: ${KOKKOSKERNELS_INST_OFFSET_SIZE_T_DEFAULT}"
)

IF (KOKKOSKERNELS_INST_OFFSET_INT)
20 changes: 15 additions & 5 deletions common/src/KokkosKernels_SimpleUtils.hpp
Original file line number Diff line number Diff line change
@@ -358,13 +358,19 @@ struct ReduceMaxFunctor {
};

template <typename view_type, typename MyExecSpace>
void kk_view_reduce_max(size_t num_elements, view_type view_to_reduce,
void kk_view_reduce_max(const MyExecSpace &exec, size_t num_elements, view_type view_to_reduce,
typename view_type::non_const_value_type &max_reduction) {
typedef Kokkos::RangePolicy<MyExecSpace> my_exec_space;
Kokkos::parallel_reduce("KokkosKernels::Common::ReduceMax", my_exec_space(0, num_elements),
typedef Kokkos::RangePolicy<MyExecSpace> policy_t;
Kokkos::parallel_reduce("KokkosKernels::Common::ReduceMax", policy_t(exec, 0, num_elements),
ReduceMaxFunctor<view_type>(view_to_reduce), max_reduction);
}

template <typename view_type, typename MyExecSpace>
void kk_view_reduce_max(size_t num_elements, view_type view_to_reduce,
typename view_type::non_const_value_type &max_reduction) {
kk_view_reduce_max(MyExecSpace(), num_elements, view_to_reduce, max_reduction);
}

// xorshift hash/pseudorandom function (supported for 32- and 64-bit integer
// types only)
template <typename Value>
@@ -429,10 +435,14 @@ struct SequentialFillFunctor {
val_type start;
};

template <typename ExecSpace, typename V>
void sequential_fill(const ExecSpace &exec, const V &v, typename V::non_const_value_type start = 0) {
Kokkos::parallel_for(Kokkos::RangePolicy<ExecSpace>(exec, 0, v.extent(0)), SequentialFillFunctor<V>(v, start));
}

template <typename V>
void sequential_fill(const V &v, typename V::non_const_value_type start = 0) {
Kokkos::parallel_for(Kokkos::RangePolicy<typename V::execution_space>(0, v.extent(0)),
SequentialFillFunctor<V>(v, start));
sequential_fill(typename V::execution_space(), v, start);
}

} // namespace Impl
6 changes: 6 additions & 0 deletions common/src/KokkosKernels_Utils.hpp
Original file line number Diff line number Diff line change
@@ -1076,6 +1076,12 @@ void view_reduce_max(size_t num_elements, view_type view_to_reduce,
kk_view_reduce_max<view_type, MyExecSpace>(num_elements, view_to_reduce, max_reduction);
}

template <typename view_type, typename MyExecSpace>
void view_reduce_max(const MyExecSpace &exec, size_t num_elements, view_type view_to_reduce,
typename view_type::non_const_value_type &max_reduction) {
kk_view_reduce_max<view_type, MyExecSpace>(exec, num_elements, view_to_reduce, max_reduction);
}

template <typename size_type>
struct ReduceRowSizeFunctor {
const size_type *rowmap_view_begins;
1 change: 1 addition & 0 deletions master_history.txt
Original file line number Diff line number Diff line change
@@ -27,3 +27,4 @@ tag: 4.2.01 date: 01/30/2024 master: f429f6ec release: bcf9854b
tag: 4.3.00 date: 04/03/2024 master: afd65f03 release: ebbf4b78
tag: 4.3.01 date: 05/07/2024 master: 1b0a15f5 release: 58785c1b
tag: 4.4.00 date: 08/08/2024 master: d1a91b8a release: 1145f529
tag: 4.4.01 date: 09/05/2024 master: 0608a337 release: a360d003
9 changes: 9 additions & 0 deletions perf_test/sparse/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -116,6 +116,15 @@ KOKKOSKERNELS_ADD_EXECUTABLE(
SOURCES KokkosSparse_mdf.cpp
)

# For the sake of build times, don't build this CRS sorting perf test by default.
# It can be enabled if needed by setting -DKokkosKernels_ENABLE_SORT_CRS_PERFTEST=ON.
if (KokkosKernels_ENABLE_SORT_CRS_PERFTEST)
KOKKOSKERNELS_ADD_EXECUTABLE(
sparse_sort_crs
SOURCES KokkosSparse_sort_crs.cpp
)
endif ()

if (KokkosKernels_ENABLE_BENCHMARK)
KOKKOSKERNELS_ADD_BENCHMARK(
sparse_par_ilut
Loading

0 comments on commit 336ee5f

Please sign in to comment.