Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 4.5.00 #2427

Merged
merged 480 commits into from
Nov 25, 2024
Merged

Release 4.5.00 #2427

merged 480 commits into from
Nov 25, 2024

Conversation

ndellingwood
Copy link
Contributor

No description provided.

ndellingwood and others added 30 commits March 6, 2024 10:03
* Fix kokkos#2130

- Do not call BsrMatrix spmv impl if block size is 1
- Instead, convert it to unmanaged CrsMatrix and call spmv again
  - cuSPARSE returned an error code in this case
  - Better performance

* Formatting

* Remove redundant remove_pointer_t

Handle is already a non-pointer type
…s#2135)

This could be further automated to run on matrix from suite sparse
…okkos#2133)

Since we are now in the 4.2 series we only support up to 4.1.00.
Older version of Kokkos Core will require older version of Kokkos
Kernels for compatibility. Once 4.3.00 is out we will move to
drop support for the 4.1 series and only keep 4.2 and 4.3 series.
* ODE: adding BDF algorithms

Implementing BDF formula for stiff ODEs.
Orders 1 to 5 are available and tested.
The integrators can be called on GPU to
solve multiple systems in parallel.

* ODE: fixing storage handling for start-up RK stack

* ODE: clang-format

* ODE: first adaptive version of BDF

The current implementation only allows for adaptivity in time,
at this point the BDF Step actually converges as expected with
first order integration!

* ODE: fixing issues with adaptive BDF

The unit-test BDF_adaptive now shows the integration
of the logistic equation using adaptive time steps and
increasing integration order from 1 to 5.

* ODE: running BDF on StiffChemistry problem

The problem runs fine and is solved but there are oscillations
while the behavior of the solution is smooth. More investigation
is needed...

* BDF: fixing types and template parameters in batched calls

Bascially we need template parameters to be more versatile
and cannot assume that all rank1 views will have the exact
same underlying type, for instance layouts can be different.

* More fixes for GPUs only in tests this time.

* ODE: BDF adaptive, fix small bug

After adding rhs and update vectors to temp the subviews taken for
other variables need to be offset appropriately...

* Revert "More fixes for GPUs only in tests this time."

This reverts commit 2f70432.

* Revert "Revert "More fixes for GPUs only in tests this time.""

This reverts commit 836012b.

* ODE: BDF small change to temporarily avoid compile time issue

True fix involving a KOKKOS_VERSION check is upcoming after more
tests on GPU side...

* ODE: BDF fix for some printf statements that will go away soon...

* ODE: adding benchmark for BDF

The benchmark helps us monitor the performance of the BDF
implementaiton across multiple platforms as well as impact of
changes over time.

* ODE: improve benchmark interface...

* ODE: BDF changes to use RMS norm and change some default values

Small changes to compare more closely with reference implementation.
Some of these might be reverted eventually but that's fine for now.

* ODE: BDF convergence more stable and results look pretty good now!

Changing the Newton solver convergence criteria as well as changing
a few default input parameters leads to a more stable algorithms
which can now integrate the stiff Henderson autocatalytic example
well in 66 time steps instead of 200k for fixed order integration...

* ODE: BDF fix bug in initial time step calculation

The initial step routine was overwriting the initial right hand side
which led to obvious issues further down the road... now things should
work fine. Need to figure out if I can re-initialize the variables in
the perf test while excluding that time from each iteration.

* ODE: BDF removing bad print statement...

std::cout in device code

* ODE - BDF: improving perf test

Basically adding new untimed setup within the main loop of the
benchmark to reset the intial conditions, buffers and vectors
ahead of each iteration.

* Modifying unit-test to catch proper return type

* Applying clang-format
add rocm/5.6.1 and rocm/6.0.0, and openblas/0.3.23 as tpl
…2134)

* Sparse MKL: changing the location of the MKL_SAFE_CALL macro

Moving the macro outside of namespaces to ensure that it will be
interpreted correctly when called from any other location in the
library.

It does not make much sense to guard Impl code in the Experimental
namespace and in this case it cleans up a problem with namespace
disambiguation for the compiler...

* Sparse BsrSpMV: removing Experimental namespace from Impl namespace

* Applying clang-format

* Sparse SpMV: fixing more namespace issues!
…ia-caraway

cm_test_all_sandia: update caraway compilers
…kos#2140)

This change makes it easier for customer to leverage TPL support
which almost always requires offset=int, ordinal=int to be enabled
meaning that no TPL support is available with our default ETI...
Resolve compilation errors in nightly cuda/12.2 A100 build
…ssing_descriptor

Spmv bsr matrix fix missing matrix descriptor (rocsparse)
Temporary objects like "A()" get destructed immediately.
For the object to have scope lifetime, it needs a name like "A a();".
This was causing cusparse/rocsparse spmv to always execute on the default stream,
causing incorrect timing in the spmv perf test.
It actually is part of the public interface
…-namespacing

KokkosSparse_spmv_bsrmatrix_spec: fix Bsr_TC_Precision namespacing
* Spmv perf test improvements

- Add option to flush caches by filling a dummy buffer between
iterations
- Add option to call the non-reuse interface instead of handle/reuse
interface
- Fix modes T, H in nonsquare case (make x,y the correct length)

* Fix mode help text
One of the overload requires an unused template, removing that
extraneous template and simplify how that function is called in
a second overload.
Co-authored-by: brian-kelley <[email protected]>
module updates post TOSS upgrade
This is only hit when spmv is called with integer scalars,
which doesn't happen in our CI but does often in Tpetra.
…ia-solo

cm_test_all_sandia: solo updates
* SPMV tpl fixes, workaround

* Avoid possible integer conversion warnings

* Document cusparseSpMM algos that were tested
KokkosKernels Utils: cleaning the zero_vector interface
Now a declaration like CrsMatrix<Scalar, Ordinal, Device>
will by default use an ETI'd type combination (as int is the default
ETI'd offset)
dependabot bot and others added 18 commits October 28, 2024 09:26
Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.1 to 4.2.2.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@eef6144...11bd719)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
In favor of KOKKOSBATCHED_IMPL_ENABLE_INTEL_MKL

Signed-off-by: Carl Pearson <[email protected]>
* fix include path of Impl

Signed-off-by: Yuuichi Asahi <[email protected]>

* improve batched serial laswp tests

Signed-off-by: Yuuichi Asahi <[email protected]>

* fix comments in Test_Batched_SerialLaswp.hpp

Signed-off-by: Yuuichi Asahi <[email protected]>

---------

Signed-off-by: Yuuichi Asahi <[email protected]>
Co-authored-by: Yuuichi Asahi <[email protected]>
* implement batched serial iamax

Signed-off-by: Yuuichi Asahi <[email protected]>

* Add missing static_assertion in iamax

Signed-off-by: Yuuichi Asahi <[email protected]>

* fix: CodeQL

Signed-off-by: Yuuichi Asahi <[email protected]>

* fix: reintroduce RealType in impl_test_batched_iamax

Signed-off-by: Yuuichi Asahi <[email protected]>

* fix: use view size_type as a return type of iamax

Signed-off-by: Yuuichi Asahi <[email protected]>

---------

Signed-off-by: Yuuichi Asahi <[email protected]>
Co-authored-by: Yuuichi Asahi <[email protected]>
* CodeQL: trying to fix issues with multiplication results conversion

This avoids potential overflow when low precision data is multiplied
and then store in higher precision variable: size_t = int * int
Focusing on issues in the library for now, unit-tests will be fixed
later.

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* Applying clang-format

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* Switching a few static_cast to size_t for clarity

After discussion in the PR, these changes should not result in issues
when passed to the view constructors and improve clarity for future
maintenance.

Signed-off-by: Luc Berger-Vergiat <[email protected]>

---------

Signed-off-by: Luc Berger-Vergiat <[email protected]>
In favor of KOKKOSBATCHED_IMPL_ENABLE_INTEL_MKL_BATCHED

Signed-off-by: Carl Pearson <[email protected]>
Let's set a good example in our examples

Signed-off-by: Carl Pearson <[email protected]>
Just like the previous round of fixes related to multiplication
overflowing when result type has wider range, this should get
CodeQL to be a little happier.

Signed-off-by: Luc Berger-Vergiat <[email protected]>
Last one of a series of fixes to clean-up the CodeQL
safety issues, after that we should be all clean!

Signed-off-by: Luc Berger-Vergiat <[email protected]>
* Add address sanitizer and most of undefined sanitizer.

Exclude vptr due to Preconditioner visibility.
Exclude signed integer overflow because we do this all over the place.

Signed-off-by: Carl Pearson <[email protected]>

* Reducing ETI scope a lot to improve build size and time

This is not a permanent fix, we probably need to set this build on a different platform but should be enough to get one set of results and observe how good/bad we are doing...

Signed-off-by: Carl Pearson <[email protected]>

* ci: osx-ci -> ubuntu-asan-ubsan-ci

Signed-off-by: Carl Pearson <[email protected]>

* ci: drop compiler warnings on ci sanitizers build

Signed-off-by: Carl Pearson <[email protected]>

* ci: Kokkos_DIR -> Kokkos_ROOT

Signed-off-by: Carl Pearson <[email protected]>

* ci: ditch relative paths and working directories

Signed-off-by: Carl Pearson <[email protected]>

* ci: drop Kokkos_ENABLE_DEPRECATED_CODE_3

Signed-off-by: Carl Pearson <[email protected]>

* ci: fix kokkos kernels source path

Signed-off-by: Carl Pearson <[email protected]>

* ci: add UBSAN_OPTIONS to get stack trace

Signed-off-by: Carl Pearson <[email protected]>

---------

Signed-off-by: Carl Pearson <[email protected]>
Co-authored-by: Luc Berger <[email protected]>
Bumps [actions/dependency-review-action](https://github.com/actions/dependency-review-action) from 4.3.5 to 4.4.0.
- [Release notes](https://github.com/actions/dependency-review-action/releases)
- [Commits](actions/dependency-review-action@a6993e2...4081bf9)

---
updated-dependencies:
- dependency-name: actions/dependency-review-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [softprops/action-gh-release](https://github.com/softprops/action-gh-release) from 2.0.8 to 2.0.9.
- [Release notes](https://github.com/softprops/action-gh-release/releases)
- [Changelog](https://github.com/softprops/action-gh-release/blob/master/CHANGELOG.md)
- [Commits](softprops/action-gh-release@c062e08...e7a8f85)

---
updated-dependencies:
- dependency-name: softprops/action-gh-release
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* ODE - RK: fixing small issues reported by Yaro

1. fix integer division to floating point division
2. fix evaluation of max scaled error
3. increase or decrease time step using uniform formula
4. use num_steps instead of max_steps for dt calculation
5. add a time step when using constant dt to avoid issues with round-off errors
6. fixing exponent and moving adaptivity computation out of RKStep
7. adding time step counter
8. adding more tests and keep track of time steps if wanted

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* RK: fixing variable name after rebase

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* RK: enabling most methods after fixing test related issues

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* RK: passing new unit-tests

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* Applying clang-format

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* RK: fix bad subview creation

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* RK: fix bug that computes the inital step size for non-adaptive case

This prevents having the user defined time step and leads to
wrong results. The rate of convergence tests are now passing!

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* clang-format...

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* RK: tweaking the tolerances a bit

On GPU the lowest order method (RK1-2) is accumulating a bit more
errors than on CPU. Only an issue when comparing values to zero
where the absolute tolerance is needed to detect good conv.

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* Adding reference for some implementation details and heuristic values

Signed-off-by: Luc Berger-Vergiat <[email protected]>

---------

Signed-off-by: Luc Berger-Vergiat <[email protected]>
Signed-off-by: Nathan Ellingwood <[email protected]>
* Update changelog for 4.5.00

Signed-off-by: Nathan Ellingwood <[email protected]>

* Update CHANGELOG.md

Grouping some work for identifier redefinition, atomic API update.
Moving SVD from ODE to LAPACK
Adding ODE PR

---------

Signed-off-by: Nathan Ellingwood <[email protected]>
Co-authored-by: Luc Berger <[email protected]>
* ODE: skipping autocatalytic test on SYCL

For the time being it is unclear why this particular case
leads to a runtime error from the SYCL API?

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* ODE: formatting

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* ODE: forgot to check if the SYCL space is enabled in Kokkos

Signed-off-by: Luc Berger-Vergiat <[email protected]>

---------

Signed-off-by: Luc Berger-Vergiat <[email protected]>
Part of Kokkos C++ Performance Portability Programming EcoSystem 4.5

Signed-off-by: Nathan Ellingwood <[email protected]>
Signed-off-by: Nathan Ellingwood <[email protected]>
@ndellingwood
Copy link
Contributor Author

Trilinos snapshot PR: trilinos/Trilinos#13589
Please do not merge until confirmation that PR has passed testing and is approved

@ndellingwood ndellingwood added the AT2-SPECIAL-APPROVAL Mark .github changes as approved. label Nov 21, 2024
@ndellingwood
Copy link
Contributor Author

PR: trilinos/Trilinos#13589 passed, ready for final review

@lucbv lucbv merged commit 957ac84 into kokkos:master Nov 25, 2024
14 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AT2-SPECIAL-APPROVAL Mark .github changes as approved.
Projects
None yet
Development

Successfully merging this pull request may close these issues.