Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly Cuda failures with Cusparse enabled #2025

Closed
ndellingwood opened this issue Oct 30, 2023 · 2 comments
Closed

Nightly Cuda failures with Cusparse enabled #2025

ndellingwood opened this issue Oct 30, 2023 · 2 comments

Comments

@ndellingwood
Copy link
Contributor

ndellingwood commented Oct 30, 2023

Nightly Cuda builds with CuSparse enabled (cuda/11.4.2, cuda/11.8.0, ) are failing to compile with output:

00:06:12 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_Cuda_11_8_0_Gcc_11_3_0_Hopper90-cusparse-cublas/kokkos-kernels/sparse/impl/KokkosSparse_sptrsv_cuSPARSE_impl.hpp(336): error: class "Kokkos::Serial" has no member "cuda_stream"
00:06:12           detected during:
00:06:12             instantiation of "void KokkosSparse::Impl::sptrsvcuSPARSE_solve(ExecutionSpace &, KernelHandle *, KernelHandle::nnz_lno_t, ain_row_index_view_type, ain_nonzero_index_view_type, ain_values_scalar_view_type, b_values_scalar_view_type, x_values_scalar_view_type, __nv_bool) [with ExecutionSpace=Kokkos::Serial::execution_space, KernelHandle=KokkosSparse::Experimental::SPTRSVHandle<const size_t, const int, const double, Kokkos::Serial::execution_space, Kokkos::HostSpace::memory_space, Kokkos::HostSpace::memory_space>, ain_row_index_view_type=Kokkos::View<const size_t *, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<3U>>, ain_nonzero_index_view_type=Kokkos::View<const int *, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<3U>>, ain_values_scalar_view_type=Kokkos::View<const double *, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<3U>>, b_values_scalar_view_type=Kokkos::View<const double *, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<3U>>, x_values_scalar_view_type=Kokkos::View<double *, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<1U>>]" 
00:06:12 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_Cuda_11_8_0_Gcc_11_3_0_Hopper90-cusparse-cublas/kokkos-kernels/sparse/src/KokkosSparse_sptrsv.hpp(402): here
00:06:12             instantiation of "void KokkosSparse::Experimental::sptrsv_solve(ExecutionSpace &, KernelHandle *, lno_row_view_t_, lno_nnz_view_t_, scalar_nnz_view_t_, BType, XType) [with ExecutionSpace=Kokkos::Serial::execution_space, KernelHandle=KokkosKernels::Experimental::KokkosKernelsHandle<const size_t, const int, const double, Kokkos::Serial, Kokkos::HostSpace, Kokkos::HostSpace>, lno_row_view_t_=Kokkos::View<const size_t *, Kokkos::Cuda::array_layout, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<0U>>, lno_nnz_view_t_=Kokkos::View<int *, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<0U>>, scalar_nnz_view_t_=Kokkos::View<double *, Kokkos::LayoutRight, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, void>, BType=Kokkos::View<double *, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<1U>>, XType=Kokkos::View<double *, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<1U>>]" 
00:06:12 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_Cuda_11_8_0_Gcc_11_3_0_Hopper90-cusparse-cublas/kokkos-kernels/sparse/src/KokkosSparse_sptrsv.hpp(439): here
00:06:12             instantiation of "void KokkosSparse::Experimental::sptrsv_solve(KernelHandle *, lno_row_view_t_, lno_nnz_view_t_, scalar_nnz_view_t_, BType, XType) [with KernelHandle=KokkosKernels::Experimental::KokkosKernelsHandle<const size_t, const int, const double, Kokkos::Serial, Kokkos::HostSpace, Kokkos::HostSpace>, lno_row_view_t_=Kokkos::View<const size_t *, Kokkos::Cuda::array_layout, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<0U>>, lno_nnz_view_t_=Kokkos::View<int *, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<0U>>, scalar_nnz_view_t_=Kokkos::View<double *, Kokkos::LayoutRight, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, void>, BType=Kokkos::View<double *, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<1U>>, XType=Kokkos::View<double *, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<1U>>]" 
00:06:12 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_Cuda_11_8_0_Gcc_11_3_0_Hopper90-cusparse-cublas/kokkos-kernels/sparse/impl/KokkosSparse_twostage_gauss_seidel_impl.hpp(985): here
00:06:12             instantiation of "void KokkosSparse::Impl::TwostageGaussSeidel<HandleType, input_row_map_view_t, input_entries_view_t, input_values_view_t>::apply(x_value_array_type, y_value_array_type, __nv_bool, int, KokkosSparse::Impl::TwostageGaussSeidel<HandleType, input_row_map_view_t, input_entries_view_t, input_values_view_t>::scalar_t, __nv_bool, __nv_bool, __nv_bool) [with HandleType=KokkosKernels::Experimental::KokkosKernelsHandle<const size_t, const int, const double, Kokkos::Serial, Kokkos::HostSpace, Kokkos::HostSpace>, input_row_map_view_t=Kokkos::View<const size_t *, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<1U>>, input_entries_view_t=Kokkos::View<const int *, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<1U>>, input_values_view_t=Kokkos::View<const double *, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<1U>>, x_value_array_type=Kokkos::View<double **, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<1U>>, y_value_array_type=Kokkos::View<const double **, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<1U>>]" 
00:06:12 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_Cuda_11_8_0_Gcc_11_3_0_Hopper90-cusparse-cublas/kokkos-kernels/sparse/impl/KokkosSparse_gauss_seidel_spec.hpp(325): here
00:06:12             instantiation of "void KokkosSparse::Impl::GAUSS_SEIDEL_APPLY<ExecSpaceIn, KernelHandle, format, a_size_view_t_, a_lno_view_t, a_scalar_view_t, x_scalar_view_t, y_scalar_view_t, false, true>::gauss_seidel_apply(const ExecSpaceIn &, KernelHandle *, KernelHandle::const_nnz_lno_t, KernelHandle::const_nnz_lno_t, a_size_view_t_, a_lno_view_t, a_scalar_view_t, x_scalar_view_t, y_scalar_view_t, __nv_bool, __nv_bool, KernelHandle::nnz_scalar_t, int, __nv_bool, __nv_bool) [with ExecSpaceIn=Kokkos::Serial, KernelHandle=KokkosKernels::Experimental::KokkosKernelsHandle<const size_t, const int, const double, Kokkos::Serial, Kokkos::HostSpace, Kokkos::HostSpace>, format=KokkosSparse::SparseMatrixFormat::BSR, a_size_view_t_=Kokkos::View<const size_t *, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<1U>>, a_lno_view_t=Kokkos::View<const int *, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<1U>>, a_scalar_view_t=Kokkos::View<const double *, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<1U>>, x_scalar_view_t=Kokkos::View<double **, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<1U>>, y_scalar_view_t=Kokkos::View<const double **, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::HostSpace::execution_space, Kokkos::HostSpace::memory_space>, Kokkos::MemoryTraits<1U>>]" 
00:06:12 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_Cuda_11_8_0_Gcc_11_3_0_Hopper90-cusparse-cublas/Build/sparse/eti/generated_specializations_cpp/gauss_seidel_apply/Sparse_gauss_seidel_apply_eti_DOUBLE_ORDINAL_INT_OFFSET_SIZE_T_LAYOUTLEFT_EXECSPACE_SERIAL_MEMSPACE_HOSTSPACE.cpp(24): here
00:06:12 
00:06:13 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_Cuda_11_8_0_Gcc_11_3_0_Hopper90-cusparse-cublas/kokkos-kernels/sparse/impl/KokkosSparse_sptrsv_cuSPARSE_impl.hpp(65): error: class "Kokkos::Serial" has no member "cuda_stream"
00:06:13           detected during:
...

https://jenkins-son.sandia.gov/job/KokkosKernels_Nightly_Blake_Cuda_11_8_0_Gcc_11_3_0_Hopper90-cusparse-cublas/

The compilation errors began following merge of these commits, based on the error message likely following merge of #1982 , @lucbv @e10harvey can you advise?

Add sptrsv execution space overloads (detail)
Address CI build errors (detail)
hide native merge-path SpMV behind "native-merge" (detail)
test native-merge algorithm (detail)
Quick fix for night compilation with Trilinos (detail)

Reproducer (Blake H100 queue):

salloc -N 1 -p H100

module load cmake gcc/11.3.0 cuda/11.8.0 git openblas/0.3.23

$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-cuda --arch=HOPPER90 --compiler=$KOKKOS_PATH/bin/nvcc_wrapper --kokkos-path=$KOKKOS_PATH --with-tpls=cusparse,cublas

make -j16
@lucbv
Copy link
Contributor

lucbv commented Oct 30, 2023

I have started looking at this, I can probably get a PR for it by the end of the day if no one else is working on this?

@lucbv
Copy link
Contributor

lucbv commented Oct 31, 2023

Let's see if PR #2026 has properly fixed this issue, it worked locally at least

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants