Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve batched serial gemm #2469

Merged
merged 7 commits into from
Jan 9, 2025

Conversation

yasahi-hpc
Copy link
Contributor

This PR aims at improving the implementation and testing of serial Gemm.

  • Moving implementation details into Impl namespace.
  • Moving OpID and OpConj into blas/impl/KokkosBlas_util.hpp
  • Add all the specializations of the combinations of Non-Trans/Trans/ConjTrans
  • Add a check function of input to force A, B, C are rank 0, 1, 2 Kokkos::View
  • Covering all the unit-tests for all the combinations of Non-Trans/Trans/ConjTrans
  • Deprecated warnings for the older implementation details under KokkosBatched namespace

Yuuichi Asahi added 7 commits January 7, 2025 14:35
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
@yasahi-hpc yasahi-hpc force-pushed the improve-batched-serial-gemm branch from c701a6c to 198a7af Compare January 7, 2025 07:39
@yasahi-hpc
Copy link
Contributor Author

@cwpearson I have removed the global namespace following your suggestions for #2331
I have also confirmed that Trilinos does not rely on the implementation details directly.
I am worried about the potential usage of the APIs with DynRankViews though

@yasahi-hpc yasahi-hpc requested a review from cwpearson January 7, 2025 09:01

eps *= std::is_same<value_type, Kokkos::Experimental::half_t>::value ||
std::is_same<value_type, Kokkos::Experimental::bhalf_t>::value
eps *= std::is_same_v<ValueType, Kokkos::Experimental::half_t> ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition of epsilon worries me a bit, we should revisit to actually compute the expected accuracy based on the compute parameters chosen for the test...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you.
It may be that we need to pay a bit more attention to the value range for fp16 test cases.

Copy link
Contributor

@lucbv lucbv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look fine to me, let's run this through the auto-tester.

@lucbv lucbv added the AT2-CI-APPROVAL Approve CI to run at SNL label Jan 7, 2025
@cwpearson
Copy link
Contributor

@cwpearson I have removed the global namespace following your suggestions for #2331 I have also confirmed that Trilinos does not rely on the implementation details directly. I am worried about the potential usage of the APIs with DynRankViews though

There's Kokkos::is_dyn_rank_view_v if you want to modify the static asserts, or are you worried for another reason?

@yasahi-hpc
Copy link
Contributor Author

@cwpearson I have removed the global namespace following your suggestions for #2331 I have also confirmed that Trilinos does not rely on the implementation details directly. I am worried about the potential usage of the APIs with DynRankViews though

There's Kokkos::is_dyn_rank_view_v if you want to modify the static asserts, or are you worried for another reason?

I personally prefer not to allow DynRankViews because we cannot check ranks of DynRankViews at compile time.

@lucbv lucbv merged commit f638914 into kokkos:develop Jan 9, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AT2-CI-APPROVAL Approve CI to run at SNL enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants