Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly test failure, Sycl backend on PVC, sycl_test.RK_Count: runtime error #2414

Closed
ndellingwood opened this issue Nov 5, 2024 · 2 comments

Comments

@ndellingwood
Copy link
Contributor

Nightly Sycl builds on PVC have a test failure in the sycl_test.RK_Count subtest:

23:43:24 [ RUN      ] sycl_test.RK_Count
23:43:24 terminate called after throwing an instance of 'std::runtime_error'
23:43:24   what():  There was a synchronous SYCL error:
23:43:24 Native API failed. Native API returns: -1 (PI_ERROR_DEVICE_NOT_FOUND) -1 (PI_ERROR_DEVICE_NOT_FOUND)

Follow merge of #2229 , @lucbv can you take a look?

Reproducer: (blake PV queue)

source /projects/x86-64-icelake-rocky8/spack-config/blake-setup-user-module-env.sh
module purge
module load cmake intel-oneapi-compilers/2023.1.0 intel-oneapi-dpl/2022.1.0 git

# Required for the hashmap accumulator
export ZES_ENABLE_SYSMAN=1

$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-sycl --arch=INTEL_PVC --compiler=icpx --cxxflags="-fp-model=precise" --shared --kokkos-cmake-flags=-DKokkos_ENABLE_ONEDPL=OFF --kokkos-path=$KOKKOS_PATH
@lucbv
Copy link
Contributor

lucbv commented Nov 5, 2024

Yeah, will have a look. We should look at having one PV build in CI after the release to avoid this kind of issues.
This seems like a fun SYCL specific problem since we did not observe issues with Cuda and HIP

@masterleinad
Copy link
Contributor

Can't reproduce on Aurora:

[==========] Running 18 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 18 tests from sycl_test
[ RUN      ] sycl_test.RKSolve_serial
[       OK ] sycl_test.RKSolve_serial (545 ms)
[ RUN      ] sycl_test.RK_conv_rate
[       OK ] sycl_test.RK_conv_rate (37 ms)
[ RUN      ] sycl_test.RK_adaptivity
[       OK ] sycl_test.RK_adaptivity (1 ms)
[ RUN      ] sycl_test.RK_chem_models
[       OK ] sycl_test.RK_chem_models (281 ms)
[ RUN      ] sycl_test.RK_Count
[       OK ] sycl_test.RK_Count (97532 ms)
[ RUN      ] sycl_test.Newton_status_float
KokkosBatched::gesv: the currently implemented static pivoting failed.
NewtonFunctor: Linear solve gesv returned failure! 
[       OK ] sycl_test.Newton_status_float (6 ms)
[ RUN      ] sycl_test.Newton_status_double
KokkosBatched::gesv: the currently implemented static pivoting failed.
NewtonFunctor: Linear solve gesv returned failure! 
[       OK ] sycl_test.Newton_status_double (4 ms)
[ RUN      ] sycl_test.Newton_simple_float
[       OK ] sycl_test.Newton_simple_float (5 ms)
[ RUN      ] sycl_test.Newton_simple_double
[       OK ] sycl_test.Newton_simple_double (6 ms)
[ RUN      ] sycl_test.Newton_system_float
[       OK ] sycl_test.Newton_system_float (2 ms)
[ RUN      ] sycl_test.Newton_system_double
[       OK ] sycl_test.Newton_system_double (3 ms)
[ RUN      ] sycl_test.Newton_parallel_float
[       OK ] sycl_test.Newton_parallel_float (3 ms)
[ RUN      ] sycl_test.Newton_parallel_double
[       OK ] sycl_test.Newton_parallel_double (2 ms)
[ RUN      ] sycl_test.BDF_Logistic_serial
[       OK ] sycl_test.BDF_Logistic_serial (123 ms)
[ RUN      ] sycl_test.BDF_LotkaVolterra_serial
[       OK ] sycl_test.BDF_LotkaVolterra_serial (41 ms)
[ RUN      ] sycl_test.BDF_StiffChemistry_serial
[       OK ] sycl_test.BDF_StiffChemistry_serial (9516 ms)
[ RUN      ] sycl_test.BDF_Nordsieck
compute_coeffs
compute_coeffs
R: 
{ 1, 1, }
{ 0, -0.8, }
D before update:
  { 1, 0, 0 }
  { -0.0001, 0.0001, 0 }
compute_coeffs
SerialGemm
compute_coeffs
SerialGemm
D after update:
  { 1, 0, 0 }
  { -8e-05, 8e-05, 0 }
[       OK ] sycl_test.BDF_Nordsieck (0 ms)
[ RUN      ] sycl_test.BDF_StiffChemistry_adaptive
Stiff Chemistry solution at t=500: {0.462966, 3.42699e-06, 0.53703}
[       OK ] sycl_test.BDF_StiffChemistry_adaptive (12 ms)
[----------] 18 tests from sycl_test (108127 ms total)

[----------] Global test environment tear-down
[==========] 18 tests from 1 test suite ran. (108127 ms total)
[  PASSED  ] 18 tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants