From f6b022d91b37400b2502bfde55f5ca91712bbca2 Mon Sep 17 00:00:00 2001 From: Baptiste Legouix Date: Fri, 10 Nov 2023 22:19:34 +0100 Subject: [PATCH 1/3] Splines release (#201) * wip * wip * wip * wip * wip on serial version of sll in ddc * compile but tests fail * clangd-format * fix virtual/ovveride keywords, ctest pass * operator << inside matrix class * Create function_basis_specs * Update function_basis_specs * First PETSc KSP solve (but of the wrong problem) * fix petsc things, matrix correctly stored (+ restore original solve method call) * wip * ksp solve (with random matrix) works in sequential, gpu wip * petsc ksp works with kokkos on cpu but still not on gpu * working on gpu ? * setup for theards-based batching -> PETSc error * solve on gpu with ginkgo (non-batched) * clean cmake for ginkgo link * Coo -> Csr * batch solve on gpu * wip * matrix-matrix * minors * sparse csr with 97% sparsity * logger * convergence logger * Matrix_Sparse class (with krylov solver) * wip * first working version of 1D splines ! * gko::Executor based on Kokkos:ExecutorSpace * rsync from gysela/sll (new changes) + fix kokkos<->ginkgo link * wip * batched spline_builder 1D tested on cpu, only with spline computation along first dimension * works on GPU * Begin templating + works on CPU and GPU * ddc policy obtained from Kokkos ExecSpace * policy from for_each.hpp * nbatch != ncells * remove zeros from csr and externalize preconditionner * Cleaning, several Kokkos::view replaced by Chunks * wip (not in compile state) * TypeSeqReplace * still wip (no compilation) * replace_dim_of (not tested yet) * wip * wip * Works for X and Y interest dimensions! Fully templated * Update tests/type_seq.cpp Co-authored-by: Thomas Padioleau * Update include/ddc/detail/type_seq.hpp Co-authored-by: Thomas Padioleau * fix * wip * wip * wip * Merge * wip * cleaning * wip * Generalize also DiscreteDomain constructor * clang-format * 3D batched_spline_builder working! * handle 1D case (but problem with sfinae and nvcc) * replace sfinae with an if constexpr for nvcc compatibility * handle 1D case * (huge) cleaning * clangd-format * Kokkos_malloc->Kokkos::View and remove_zeros in factorization_method() * wip * wip * chunks of B&X to be trated by ginkgo (parallel not working) * fix * remove a deepcopy + rename * not working * fix * 1D case naturally managed * 4D tests * remove MemorySpace from template * wip * batched evaluator (segfault on gpu) * wip * wip * bug commit * wip * working on gpu BUT there is a bypass of Chunkspan[] on gpu which makes it wrong! * cleaning * clang-format * determine matrix_sparse slicing according to backend * comment non-batched cmake * minor change to reduce number of warnings * wip * wip * wip * ddc namespace for ex-sll & clang-format * reactivate periodic test * changes for gysela * wip * other changes for gysela * minor fix * wip * recover non_periodic 1D test * recover 2D_spline test * recover other 2d test * prepare non_periodic batched splines * wip on non-periodic * restore non-batched Lapack method with a kwargs to choose between ginkgo and lapack * builder_2d based on Lapack + fix + clang-format * clang-format * wip * minor fix * kokkos::finalize() when sigint * OpenMP second loop in spline_builder_2d (but non_periodic test dont works with LAPACK, which is a bit weird) * main loops of spline_builder_2D parallelized but LAPACK not working in non_periodic case * 2D exec_spaces based on 1D spline_builders * wip * improve slicing * workaround to issue #193 * call of operator[] inside spline_evaluator * optimize allocation * fix * fix * ginkgo inplace solve * spline_evalutor derivates * wip * integrals * cleaning + comments * wip on release (removing files) * reactivate few tests * wip on splines-release * wip on splines-release * wip on splines-release * wip on splines-release * wip on splines-release * wip on splines-release * clang-format * cmake changes * minor * remove SLL_ prefix in tests * clean ddc.hpp * forgotten in merge * revert kokkos-finalize-sigint merge * cleaning * fix on GPU * throw runtime error when MatrixSparse::get_element() is called * periodicity test * fix periodicity * clean spline_evaluator * fix * remove GInkgo * fix on GPU * clang-format * characteristics_advection example * improve advection test * add test matrix_sparse * splines.hpp * simplify includes * fix * benchmark (but problem) * wip on benchmark * benchmark working * memory occupancy benchmark * plotter * improve * improve plotter * changes from Thomas' review * changes from Thomas' review on matrix_sparse * changes from Thomas' review * changes from Thomas' review in spline_evaluator_batched * remove bernstein * remove gauss_legendre * remove chunk * remove non_periodic * LICENSE * changes in cmake, ddc core and kernels managed separatly * remove LAPACK load (added by mistake) * changes from Thomas' review * remove logger * simplify tests headers * externalize ginkgo_exec + remove cmake options * clang-format * ginkgo solver_factory in constructor * optimize allocation * change omp params slicing * typo * clang-format * fix alloc * fix alloc again * tune omp ginkgo * several benchmarks * clang-format * add missing py * remove unused loggers * tuning * reference tuning in the benchmark * clang-format * tuning * minor changes for benchmark & tuning * several improvements benchmarking * fix * tuning * tuning serial * clang-format * Merge branch 'main' into splines * Remove Virginie from authors (because Gauss-Legendre) * fix test (but relax on tol is weird...) * clang-tidy * clang-tidy * remove deprecated test * move ginkgo solvers to reduce number of instances * try to fix CI * test common main * remove C-style arrays * fix * changes in the test * typo * fix for hip ci * Update tests.yml (#217) * fix MatrixSizesFixture.Sparse test * fix * fixes for warnings * revert an optimization which was actually slower * fix * Move LD_LIBRARY_PATH to Docker build (#219) * Update include/ddc/discrete_element.hpp * Update include/ddc/ddc.hpp * Update ginkgo_executors.hpp --------- Co-authored-by: Thomas Padioleau Co-authored-by: Julien Bigot --- CMakeLists.txt | 14 +- benchmarks/CMakeLists.txt | 13 +- benchmarks/splines.cpp | 245 +++++++ benchmarks/splines_plot.py | 139 ++++ docker/doxygen/bash_run | 2 + docker/latest/Dockerfile | 5 + docker/latest/bash_run | 2 + docker/oldest/Dockerfile | 5 +- docker/oldest/bash_run | 2 + examples/CMakeLists.txt | 10 +- examples/characteristics_advection.cpp | 302 +++++++++ include/ddc/ddc.hpp | 4 + include/ddc/kernels/splines.hpp | 19 + include/ddc/kernels/splines/AUTHORS | 16 + include/ddc/kernels/splines/bspline.hpp | 7 + .../kernels/splines/bsplines_non_uniform.hpp | 533 +++++++++++++++ .../ddc/kernels/splines/bsplines_uniform.hpp | 490 ++++++++++++++ .../splines/greville_interpolation_points.hpp | 196 ++++++ .../splines/knots_as_interpolation_points.hpp | 51 ++ include/ddc/kernels/splines/math_tools.hpp | 111 ++++ include/ddc/kernels/splines/matrix.hpp | 107 +++ include/ddc/kernels/splines/matrix_maker.hpp | 25 + include/ddc/kernels/splines/matrix_sparse.hpp | 325 ++++++++++ .../kernels/splines/null_boundary_value.hpp | 24 + .../splines/spline_boundary_conditions.hpp | 64 ++ .../kernels/splines/spline_boundary_value.hpp | 30 + .../ddc/kernels/splines/spline_builder.hpp | 611 ++++++++++++++++++ .../splines/spline_builder_batched.hpp | 217 +++++++ .../ddc/kernels/splines/spline_evaluator.hpp | 210 ++++++ .../splines/spline_evaluator_batched.hpp | 253 ++++++++ include/ddc/kernels/splines/view.hpp | 146 +++++ include/ddc/misc/ginkgo_executors.hpp | 45 ++ tests/CMakeLists.txt | 15 +- tests/splines/CMakeLists.txt | 102 +++ tests/splines/batched_spline_builder.cpp | 514 +++++++++++++++ tests/splines/bsplines.cpp | 91 +++ tests/splines/cosine_evaluator.hpp | 76 +++ tests/splines/matrix.cpp | 97 +++ tests/splines/periodic_spline_builder.cpp | 146 +++++ ...periodic_spline_builder_ordered_points.cpp | 61 ++ tests/splines/periodicity_spline_builder.cpp | 219 +++++++ tests/splines/polynomial_evaluator.hpp | 96 +++ tests/splines/spline_error_bounds.hpp | 97 +++ tests/splines/test_utils.hpp | 102 +++ tests/splines/view.cpp | 30 + 45 files changed, 5860 insertions(+), 9 deletions(-) create mode 100644 benchmarks/splines.cpp create mode 100644 benchmarks/splines_plot.py create mode 100644 examples/characteristics_advection.cpp create mode 100644 include/ddc/kernels/splines.hpp create mode 100644 include/ddc/kernels/splines/AUTHORS create mode 100644 include/ddc/kernels/splines/bspline.hpp create mode 100644 include/ddc/kernels/splines/bsplines_non_uniform.hpp create mode 100644 include/ddc/kernels/splines/bsplines_uniform.hpp create mode 100644 include/ddc/kernels/splines/greville_interpolation_points.hpp create mode 100644 include/ddc/kernels/splines/knots_as_interpolation_points.hpp create mode 100644 include/ddc/kernels/splines/math_tools.hpp create mode 100644 include/ddc/kernels/splines/matrix.hpp create mode 100644 include/ddc/kernels/splines/matrix_maker.hpp create mode 100644 include/ddc/kernels/splines/matrix_sparse.hpp create mode 100644 include/ddc/kernels/splines/null_boundary_value.hpp create mode 100644 include/ddc/kernels/splines/spline_boundary_conditions.hpp create mode 100644 include/ddc/kernels/splines/spline_boundary_value.hpp create mode 100644 include/ddc/kernels/splines/spline_builder.hpp create mode 100644 include/ddc/kernels/splines/spline_builder_batched.hpp create mode 100644 include/ddc/kernels/splines/spline_evaluator.hpp create mode 100644 include/ddc/kernels/splines/spline_evaluator_batched.hpp create mode 100644 include/ddc/kernels/splines/view.hpp create mode 100644 include/ddc/misc/ginkgo_executors.hpp create mode 100644 tests/splines/CMakeLists.txt create mode 100644 tests/splines/batched_spline_builder.cpp create mode 100644 tests/splines/bsplines.cpp create mode 100644 tests/splines/cosine_evaluator.hpp create mode 100644 tests/splines/matrix.cpp create mode 100644 tests/splines/periodic_spline_builder.cpp create mode 100644 tests/splines/periodic_spline_builder_ordered_points.cpp create mode 100644 tests/splines/periodicity_spline_builder.cpp create mode 100644 tests/splines/polynomial_evaluator.hpp create mode 100644 tests/splines/spline_error_bounds.hpp create mode 100644 tests/splines/test_utils.hpp create mode 100644 tests/splines/view.cpp diff --git a/CMakeLists.txt b/CMakeLists.txt index a1bb03126..24db769b9 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -5,6 +5,8 @@ project(DDC VERSION 0.0.0 LANGUAGES CXX) # List of options +option(BUILD_FFT_KERNEL "Build DDC kernel for FFT" ON) +option(BUILD_SPLINES_KERNEL "Build DDC kernel for splines" ON) option(BUILD_BENCHMARKS "Build DDC benchmarks." OFF) option(BUILD_DOCUMENTATION "Build DDC documentation/website" OFF) option(BUILD_EXAMPLES "Build DDC examples" ON) @@ -94,12 +96,12 @@ endif() # FFTW list( APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake" ) # Maybe not specific to FFTW -if(NOT FFTW_FOUND) +if("${BUILD_FFT_KERNEL}" AND NOT FFTW_FOUND) find_package( FFTW MODULE REQUIRED ) endif() ## CUDA + CUDAToolkit -if("${Kokkos_ENABLE_CUDA}" AND NOT("${HIP_FOR_NVIDIA}")) +if("${BUILD_FFT_KERNEL}" AND "${Kokkos_ENABLE_CUDA}" AND NOT("${HIP_FOR_NVIDIA}")) find_package( CUDAToolkit MODULE REQUIRED ) if( NOT(CUDAToolkit_FOUND) ) message(FATAL_ERROR "CUDAToolkit not found." ) @@ -237,7 +239,7 @@ if("${Kokkos_ENABLE_HIP}") target_compile_definitions(DDC INTERFACE hipfft_AVAIL) endif() -if("${HIP_FOR_NVIDIA}") +if("${BUILD_FFT_KERNEL}" AND "${HIP_FOR_NVIDIA}") find_package( HIP REQUIRED ) target_include_directories(DDC @@ -261,6 +263,12 @@ if("${HIP_FOR_NVIDIA}") endif() endif() +if("${BUILD_SPLINES_KERNEL}") + # Ginkgo + find_package(Ginkgo 1.6.0 EXACT REQUIRED) + target_link_libraries(DDC INTERFACE Ginkgo::ginkgo) + target_compile_definitions(DDC INTERFACE ginkgo_AVAIL) +endif() ## The PDI wrapper diff --git a/benchmarks/CMakeLists.txt b/benchmarks/CMakeLists.txt index 9b95a95ab..117769288 100644 --- a/benchmarks/CMakeLists.txt +++ b/benchmarks/CMakeLists.txt @@ -1,8 +1,17 @@ # SPDX-License-Identifier: MIT -add_executable(ddc_benchmarks deepcopy.cpp) -target_link_libraries(ddc_benchmarks +add_executable(ddc_benchmark_deepcopy deepcopy.cpp) +target_link_libraries(ddc_benchmark_deepcopy PUBLIC benchmark::benchmark DDC::DDC ) + +if("${BUILD_SPLINES_KERNEL}") + add_executable(ddc_benchmark_splines splines.cpp) + target_link_libraries(ddc_benchmark_splines + PUBLIC + benchmark::benchmark + DDC::DDC + ) +endif() diff --git a/benchmarks/splines.cpp b/benchmarks/splines.cpp new file mode 100644 index 000000000..24eb6ce0f --- /dev/null +++ b/benchmarks/splines.cpp @@ -0,0 +1,245 @@ +// SPDX-License-Identifier: MIT +#include +#include +#include +#include +#include + +#include +#include + +#include + +namespace { + +static constexpr std::size_t s_degree_x = 3; + +struct X +{ + static constexpr bool PERIODIC = true; +}; + +using BSplinesX = ddc::UniformBSplines; +using GrevillePoints = ddc:: + GrevilleInterpolationPoints; +using DDimX = GrevillePoints::interpolation_mesh_type; + +struct Y; +using DDimY = ddc::UniformPointSampling; + + +} // namespace + +// Function to monitor GPU memory asynchronously +void monitorMemoryAsync(std::mutex& mutex, bool& monitorFlag, size_t& maxUsedMem) +{ + size_t freeMem = 0; + size_t totalMem = 0; + while (monitorFlag) { + std::this_thread::sleep_for(std::chrono::milliseconds(1)); // Adjust the interval as needed + + // Acquire a lock to ensure thread safety when accessing CUDA functions + std::lock_guard lock(mutex); + +#if defined(__CUDACC__) + cudaMemGetInfo(&freeMem, &totalMem); +#endif + maxUsedMem = std::max(maxUsedMem, totalMem - freeMem); + } +} + +static void characteristics_advection(benchmark::State& state) +{ + size_t freeMem = 0; + size_t totalMem = 0; +#if defined(__CUDACC__) + cudaMemGetInfo(&freeMem, &totalMem); +#endif + size_t initUsedMem + = totalMem + - freeMem; // cudaMemGetInfo gives GPU total memory occupation, we consider that other users of the GPU have constant occupancy and substract it. + size_t maxUsedMem = initUsedMem; + + bool monitorFlag = true; + std::mutex mutex; + // Create a thread to monitor GPU memory asynchronously + std::thread monitorThread( + monitorMemoryAsync, + std::ref(mutex), + std::ref(monitorFlag), + std::ref(maxUsedMem)); + + ddc::init_discrete_space< + BSplinesX>(ddc::Coordinate(-1.), ddc::Coordinate(1.), state.range(0)); + ddc::init_discrete_space(ddc::GrevilleInterpolationPoints< + BSplinesX, + ddc::BoundCond::PERIODIC, + ddc::BoundCond::PERIODIC>::get_sampling()); + ddc::DiscreteDomain y_domain + = ddc::init_discrete_space(DDimY:: + init(ddc::Coordinate(-1.), + ddc::Coordinate(1.), + ddc::DiscreteVector(state.range(1)))); + + auto const x_domain = ddc::GrevilleInterpolationPoints< + BSplinesX, + ddc::BoundCond::PERIODIC, + ddc::BoundCond::PERIODIC>::get_domain(); + ddc::Chunk density_alloc( + ddc::DiscreteDomain(x_domain, y_domain), + ddc::DeviceAllocator()); + ddc::ChunkSpan const density = density_alloc.span_view(); + // Initialize the density on the main domain + ddc::DiscreteDomain x_mesh + = ddc::DiscreteDomain(x_domain, y_domain); + ddc::for_each( + ddc::policies::parallel_device, + x_mesh, + DDC_LAMBDA(ddc::DiscreteElement const ixy) { + double const x = ddc::coordinate(ddc::select(ixy)); + double const y = ddc::coordinate(ddc::select(ixy)); + density(ixy) = 9.999 * Kokkos::exp(-(x * x + y * y) / 0.1 / 2); + // initial_density(ixy) = 9.999 * ((x * x + y * y) < 0.25); + }); + ddc::SplineBuilderBatched< + ddc::SplineBuilder< + Kokkos::DefaultExecutionSpace, + Kokkos::DefaultExecutionSpace::memory_space, + BSplinesX, + DDimX, + ddc::BoundCond::PERIODIC, + ddc::BoundCond::PERIODIC>, + DDimX, + DDimY> + spline_builder(x_mesh, state.range(2), state.range(3), state.range(4)); + ddc::SplineEvaluatorBatched< + ddc::SplineEvaluator< + Kokkos::DefaultExecutionSpace, + Kokkos::DefaultExecutionSpace::memory_space, + BSplinesX, + DDimX>, + DDimX, + DDimY> + spline_evaluator( + spline_builder.spline_domain(), + ddc::g_null_boundary, + ddc::g_null_boundary); + ddc::Chunk coef_alloc( + spline_builder.spline_domain(), + ddc::KokkosAllocator()); + ddc::ChunkSpan coef = coef_alloc.span_view(); + ddc::Chunk feet_coords_alloc( + spline_builder.vals_domain(), + ddc::KokkosAllocator< + ddc::Coordinate, + Kokkos::DefaultExecutionSpace::memory_space>()); + ddc::ChunkSpan feet_coords = feet_coords_alloc.span_view(); + + for (auto _ : state) { + Kokkos::Profiling::pushRegion("FeetCharacteristics"); + ddc::for_each( + ddc::policies::parallel_device, + feet_coords.domain(), + DDC_LAMBDA(ddc::DiscreteElement const e) { + feet_coords(e) = ddc::Coordinate( + ddc::coordinate(ddc::select(e)) + - ddc::Coordinate(0.0176429863), + ddc::coordinate(ddc::select(e))); + }); + Kokkos::Profiling::popRegion(); + Kokkos::Profiling::pushRegion("SplineBuilder"); + spline_builder(coef, density); + Kokkos::Profiling::popRegion(); + Kokkos::Profiling::pushRegion("SplineEvaluator"); + spline_evaluator(density, feet_coords.span_cview(), coef.span_cview()); + Kokkos::Profiling::popRegion(); + } + monitorFlag = false; + monitorThread.join(); + state.SetBytesProcessed( + int64_t(state.iterations()) + * int64_t(state.range(0) * state.range(1) * sizeof(double))); + state.counters["gpu_mem_occupancy"] = maxUsedMem - initUsedMem; + //////////////////////////////////////////////////// + /// --------------- HUGE WARNING --------------- /// + /// The following lines are forbidden in a prod- /// + /// uction code. It is a necessary workaround /// + /// which must be used ONLY for Google Benchmark./// + /// The reason is it acts on underlying global /// + /// variables, which is always a bad idea. /// + //////////////////////////////////////////////////// + ddc::detail::g_discrete_space_dual.reset(); + ddc::detail::g_discrete_space_dual.reset(); + ddc::detail::g_discrete_space_dual.reset(); + ddc::detail::g_discrete_space_dual.reset(); + //////////////////////////////////////////////////// +} + +// Tuning : 512 cols and 8 precond on CPU, 16384 cols and 1 precond on GPU + +#ifdef KOKKOS_ENABLE_CUDA +std::string chip = "gpu"; +int cols_per_par_chunk_ref = 1024; +int par_chunks_per_seq_chunk_ref = 160; +unsigned int preconditionner_max_block_size_ref = 1u; +#elif defined(KOKKOS_ENABLE_OPENMP) +std::string chip = "cpu"; +int cols_per_par_chunk_ref = 512; +int par_chunks_per_seq_chunk_ref = 160; +unsigned int preconditionner_max_block_size_ref = 8u; +#elif defined(KOKKOS_ENABLE_SERIAL) +std::string chip = "cpu"; +int cols_per_par_chunk_ref = 512; +int par_chunks_per_seq_chunk_ref = 1; +unsigned int preconditionner_max_block_size_ref = 8u; +#endif + +BENCHMARK(characteristics_advection) + ->RangeMultiplier(2) + ->Ranges( + {{100, 1000}, + {100, 500000}, + {cols_per_par_chunk_ref, cols_per_par_chunk_ref}, + {par_chunks_per_seq_chunk_ref, par_chunks_per_seq_chunk_ref}, + {preconditionner_max_block_size_ref, preconditionner_max_block_size_ref}}) + ->MinTime(3); +/* +BENCHMARK(characteristics_advection) + ->RangeMultiplier(2) + ->Ranges({{100, 1000}, {100000, 100000}, {64,65535}, {par_chunks_per_seq_chunk_ref, par_chunks_per_seq_chunk_ref}, {preconditionner_max_block_size_ref, preconditionner_max_block_size_ref}}) + ->MinTime(3); +*/ +/* +BENCHMARK(characteristics_advection) + ->RangeMultiplier(2) + ->Ranges({{100, 1000}, {100000, 100000}, {cols_per_par_chunk_ref, cols_per_par_chunk_ref}, {1, 10000}, {preconditionner_max_block_size_ref, preconditionner_max_block_size_ref}}) + ->MinTime(3); +*/ +/* +BENCHMARK(characteristics_advection) + ->RangeMultiplier(2) + ->Ranges({{100, 1000}, {100000, 100000}, {cols_per_par_chunk_ref, cols_per_par_chunk_ref}, {par_chunks_per_seq_chunk_ref, par_chunks_per_seq_chunk_ref}, {1, 32}}) + ->MinTime(3); +*/ + +int main(int argc, char** argv) +{ + ::benchmark::Initialize(&argc, argv); + ::benchmark::AddCustomContext("chip", chip); + ::benchmark::AddCustomContext("cols_per_par_chunk_ref", std::to_string(cols_per_par_chunk_ref)); + ::benchmark::AddCustomContext( + "par_chunks_per_seq_chunk_ref", + std::to_string(par_chunks_per_seq_chunk_ref)); + ::benchmark::AddCustomContext( + "preconditionner_max_block_size_ref", + std::to_string(preconditionner_max_block_size_ref)); + if (::benchmark::ReportUnrecognizedArguments(argc, argv)) { + return 1; + } + { + ddc::ScopeGuard const guard; + ::benchmark::RunSpecifiedBenchmarks(); + } + ::benchmark::Shutdown(); + return 0; +} diff --git a/benchmarks/splines_plot.py b/benchmarks/splines_plot.py new file mode 100644 index 000000000..8f68f3157 --- /dev/null +++ b/benchmarks/splines_plot.py @@ -0,0 +1,139 @@ +# First execute : +# ./benchmarks/ddc_benchmark_splines --benchmark_format=json --benchmark_out=splines_bench.json +# then execute this code will be able to plot results: +# python3 splines_plot.py /path/to/splines_bench.json + +import argparse +import matplotlib.pyplot as plt +import json +import numpy as np + +parser = argparse.ArgumentParser(description="Plot bytes_per_second from a JSON file.") +parser.add_argument("json_file", help="Path to the JSON file") +args = parser.parse_args() + +with open(args.json_file, 'r') as file: + data = json.load(file); + +# Extract the values at the end of "name" and corresponding "bytes_per_second" +nx_values = sorted(set(int(benchmark["name"].split("/")[1]) for benchmark in data["benchmarks"])) +data_groups = {nx: {"ny": [], "cols_per_par_chunk": [], "par_chunks_per_seq_chunk": [], "preconditionner_max_block_size": [], "bytes_per_second": [], "gpu_mem_occupancy": []} for nx in nx_values} + +for benchmark in data["benchmarks"]: + nx = int(benchmark["name"].split("/")[1]) + data_groups[nx]["ny"].append(int(benchmark["name"].split("/")[2])) + data_groups[nx]["cols_per_par_chunk"].append(int(benchmark["name"].split("/")[3])) + data_groups[nx]["par_chunks_per_seq_chunk"].append(int(benchmark["name"].split("/")[4])) + data_groups[nx]["preconditionner_max_block_size"].append(int(benchmark["name"].split("/")[5])) + data_groups[nx]["bytes_per_second"].append(benchmark["bytes_per_second"]) + data_groups[nx]["gpu_mem_occupancy"].append(benchmark["gpu_mem_occupancy"]) + +######## +## ny ## +######## + +# Plotting the data for each group +plt.figure(figsize=(8, 6)) +for nx, group_data in data_groups.items(): + ny = group_data["ny"] + bandwidth = [group_data["bytes_per_second"][i] for i in range(len(ny))] + plt.plot(ny, bandwidth, marker='o', markersize=5, label=f'nx={nx}') + +x = np.linspace(min(ny), 10*min(ny)) +plt.plot(x, np.mean([data_groups[nx]["bytes_per_second"][0] for nx in nx_values])/min(ny)*x, linestyle='--', color='black', label='perfect scaling') + +# Plotting the data +plt.grid() +plt.xscale("log") +plt.xlabel("ny") +plt.ylabel("Bandwidth [B/s]") +plt.title("Bandwidth on "+str.upper(data["context"]["chip"])); +plt.legend() +plt.savefig("bandwidth_ny.png") + +#gpu_mem +plt.figure(figsize=(8, 6)) +for nx, group_data in data_groups.items(): + ny = [group_data["ny"][i] for i in range(len(group_data["ny"])) if group_data["ny"][i]>=8e3] + gpu_mem_overhead = [(group_data["gpu_mem_occupancy"][i]-nx*group_data["ny"][i]*8)/(nx*group_data["ny"][i]*8)*100 for i in range(len(group_data["ny"])) if group_data["ny"][i]>=8e3] + plt.plot(ny, gpu_mem_overhead, marker='o', markersize=5, label=f'nx={nx}') + +# Plotting the data +plt.grid() +plt.xscale("log") +plt.xlabel("ny") +plt.ylabel("Relative memory overhead [%]") +plt.title("Relative memory occupancy overhead") +plt.legend() +plt.savefig("gpu_mem_occupancy.png") + +######################## +## cols_per_par_chunk ## +######################## + +# Plotting the data for each group +plt.figure(figsize=(8, 6)) +for nx, group_data in data_groups.items(): + cols_per_par_chunk = group_data["cols_per_par_chunk"] + bandwidth = [group_data["bytes_per_second"][i] for i in range(len(cols_per_par_chunk))] + plt.plot(cols_per_par_chunk, bandwidth, marker='o', markersize=5, label=f'nx={nx}') + +x = [(int)(data["context"]["cols_per_par_chunk_ref"]), (int)(data["context"]["cols_per_par_chunk_ref"])*1.001]; +plt.plot(x, [0.99*min([min(group_data["bytes_per_second"]) for nx, group_data in data_groups.items()]), 1.01*max([max(group_data["bytes_per_second"]) for nx, group_data in data_groups.items()])], linestyle='dotted', color='black', label='reference config') + +# Plotting the data +plt.grid() +plt.xscale("log") +plt.xlabel("cols_per_par_chunk") +plt.ylabel("Bandwidth [B/s]") +plt.title("Bandwidth on "+str.upper(data["context"]["chip"])+" (with ny=100000)"); +plt.legend() +plt.savefig("bandwidth_cols.png") + +############################## +## par_chunks_per_seq_chunk ## +############################## + +# Plotting the data for each group +plt.figure(figsize=(8, 6)) +for nx, group_data in data_groups.items(): + par_chunks_per_seq_chunk = group_data["par_chunks_per_seq_chunk"] + bandwidth = [group_data["bytes_per_second"][i] for i in range(len(par_chunks_per_seq_chunk))] + plt.plot(par_chunks_per_seq_chunk, bandwidth, marker='o', markersize=5, label=f'nx={nx}') + +x = [(int)(data["context"]["par_chunks_per_seq_chunk_ref"]), (int)(data["context"]["par_chunks_per_seq_chunk_ref"])*1.001]; +plt.plot(x, [0.99*min([min(group_data["bytes_per_second"]) for nx, group_data in data_groups.items()]), 1.01*max([max(group_data["bytes_per_second"]) for nx, group_data in data_groups.items()])], linestyle='dotted', color='black', label='reference config') + +# Plotting the data +plt.grid() +plt.xscale("log") +plt.xlabel("par_chunks_per_seq_chunk") +plt.ylabel("Bandwidth [B/s]") +plt.title("Bandwidth on "+str.upper(data["context"]["chip"])+" (with ny=100000)"); +plt.legend() +plt.savefig("bandwidth_par_chunks.png") + +##################### +## preconditionner ## +##################### + +# Plotting the data for each group +plt.figure(figsize=(8, 6)) +for nx, group_data in data_groups.items(): + preconditionner_max_block_size = group_data["preconditionner_max_block_size"] + bandwidth = [group_data["bytes_per_second"][i] for i in range(len(preconditionner_max_block_size))] + plt.plot(preconditionner_max_block_size, bandwidth, marker='o', markersize=5, label=f'nx={nx}') + +x = [(int)(data["context"]["preconditionner_max_block_size_ref"]), (int)(data["context"]["preconditionner_max_block_size_ref"])*1.001]; +plt.plot(x, [0.99*min([min(group_data["bytes_per_second"]) for nx, group_data in data_groups.items()]), 1.01*max([max(group_data["bytes_per_second"]) for nx, group_data in data_groups.items()])], linestyle='dotted', color='black', label='reference config') + +# Plotting the data +plt.grid() +plt.xscale("log") +plt.xlabel("preconditionner_max_block_size") +plt.ylabel("Bandwidth [B/s]") +plt.title("Bandwidth on "+str.upper(data["context"]["chip"])+" (with ny=100000)"); +plt.legend() +plt.savefig("bandwidth_precond.png") + +plt.close(); diff --git a/docker/doxygen/bash_run b/docker/doxygen/bash_run index 225b15815..e604ac39a 100644 --- a/docker/doxygen/bash_run +++ b/docker/doxygen/bash_run @@ -1,4 +1,6 @@ #!/bin/bash # SPDX-License-Identifier: MIT +. /etc/profile + exec "$@" diff --git a/docker/latest/Dockerfile b/docker/latest/Dockerfile index 07fb253dc..256d1a431 100644 --- a/docker/latest/Dockerfile +++ b/docker/latest/Dockerfile @@ -84,6 +84,9 @@ RUN chmod +x /bin/bash_run \ && if [ "xcuda" = "x${BACKEND}" ] \ ; then echo 'CUDA_GCC=gcc-10' > /etc/profile.d/ddc-cuda.sh \ ; echo 'CUDA_GXX=g++-10' >> /etc/profile.d/ddc-cuda.sh \ + ; fi \ + && if [ "xhip" = "x${BACKEND}" ] \ + ; then echo 'export LD_LIBRARY_PATH="/opt/rocm/lib"' > /etc/profile.d/10-rocm.sh \ ; fi USER ci:ci @@ -91,3 +94,5 @@ WORKDIR /data ENTRYPOINT ["/bin/bash_run"] CMD ["/bin/bash", "-li"] + + diff --git a/docker/latest/bash_run b/docker/latest/bash_run index 225b15815..e604ac39a 100644 --- a/docker/latest/bash_run +++ b/docker/latest/bash_run @@ -1,4 +1,6 @@ #!/bin/bash # SPDX-License-Identifier: MIT +. /etc/profile + exec "$@" diff --git a/docker/oldest/Dockerfile b/docker/oldest/Dockerfile index 7b2d81261..57a65c122 100644 --- a/docker/oldest/Dockerfile +++ b/docker/oldest/Dockerfile @@ -73,7 +73,10 @@ RUN chmod +x /bin/bash_run \ && apt-get clean -y \ && apt-get autoclean -y \ && rm -rf /var/lib/apt/lists/* \ - && useradd -d /data -m -U ci + && useradd -d /data -m -U ci \ + && if [ "xhip" = "x${BACKEND}" ] \ + ; then echo 'export LD_LIBRARY_PATH="/opt/rocm/lib"' > /etc/profile.d/10-rocm.sh \ + ; fi USER ci:ci WORKDIR /data diff --git a/docker/oldest/bash_run b/docker/oldest/bash_run index 225b15815..e604ac39a 100644 --- a/docker/oldest/bash_run +++ b/docker/oldest/bash_run @@ -1,4 +1,6 @@ #!/bin/bash # SPDX-License-Identifier: MIT +. /etc/profile + exec "$@" diff --git a/examples/CMakeLists.txt b/examples/CMakeLists.txt index 5033b3536..33545e293 100644 --- a/examples/CMakeLists.txt +++ b/examples/CMakeLists.txt @@ -2,7 +2,13 @@ add_executable(heat_equation heat_equation.cpp) target_link_libraries(heat_equation PUBLIC DDC::DDC) -add_executable(heat_equation_spectral heat_equation_spectral.cpp) -target_link_libraries(heat_equation_spectral PUBLIC DDC::DDC) +if("${BUILD_FFT_KERNEL}") + add_executable(heat_equation_spectral heat_equation_spectral.cpp) + target_link_libraries(heat_equation_spectral PUBLIC DDC::DDC) +endif() add_executable(game_of_life game_of_life.cpp) target_link_libraries(game_of_life PUBLIC DDC::DDC) +if("${BUILD_SPLINES_KERNEL}") + add_executable(characteristics_advection characteristics_advection.cpp) + target_link_libraries(characteristics_advection PUBLIC DDC::DDC) +endif() diff --git a/examples/characteristics_advection.cpp b/examples/characteristics_advection.cpp new file mode 100644 index 000000000..b25c54c71 --- /dev/null +++ b/examples/characteristics_advection.cpp @@ -0,0 +1,302 @@ +// SPDX-License-Identifier: MIT + +//! [includes] +#include +#include +#include +#include + +#include +#include + +#include +//! [includes] +static constexpr std::size_t s_degree_x = 3; + +//! [X-dimension] +/// Our first continuous dimension +struct X +{ + static constexpr bool PERIODIC = true; +}; +//! [X-dimension] + +//! [X-discretization] +/// A uniform discretization of X +using BSplinesX = ddc::UniformBSplines; +using GrevillePoints = ddc::GrevilleInterpolationPoints< + BSplinesX, + ddc::BoundCond::PERIODIC, + ddc::BoundCond::PERIODIC>; +using DDimX = GrevillePoints::interpolation_mesh_type; +//! [X-discretization] + +//! [Y-space] +// Our second continuous dimension +struct Y; +// Its uniform discretization +using DDimY = ddc::UniformPointSampling; +//! [Y-space] + +//! [time-space] +// Our simulated time dimension +struct T; +// Its uniform discretization +using DDimT = ddc::UniformPointSampling; +//! [time-space] + +//! [display] +/** A function to pretty print the density + * @param time the time at which the output is made + * @param density the density at this time-step + */ +template +void display(double time, ChunkType density) +{ + double const mean_density = ddc::transform_reduce( + density.domain(), + 0., + ddc::reducer::sum(), + density) + / density.domain().size(); + std::cout << std::fixed << std::setprecision(3); + std::cout << "At t = " << time << ",\n"; + std::cout << " * mean density = " << mean_density << "\n"; + // take a slice in the middle of the box + ddc::ChunkSpan density_slice = density + [ddc::get_domain(density).front() + + ddc::get_domain(density).size() / 2]; + std::cout << " * density[y:" + << ddc::get_domain(density).size() / 2 << "] = {"; + ddc::for_each( + ddc::policies::serial_host, + ddc::get_domain(density), + [=](ddc::DiscreteElement const ix) { + std::cout << std::setw(6) << density_slice(ix) << " "; + }); + std::cout << " }" << std::endl; +} +//! [display] + +//! [main-start] +int main(int argc, char** argv) +{ + ddc::ScopeGuard scope(argc, argv); + + // some parameters that would typically be read from some form of + // configuration file in a more realistic code + + //! [parameters] + // Start of the domain of interest in the X dimension + double const x_start = -1.; + // End of the domain of interest in the X dimension + double const x_end = 1.; + // Number of discretization points in the X dimension + size_t const nb_x_points = 100; + // Velocity along x dimension + double const vx = .2; + // Start of the domain of interest in the Y dimension + double const y_start = -1.; + // End of the domain of interest in the Y dimension + double const y_end = 1.; + // Number of discretization points in the Y dimension + size_t const nb_y_points = 100; + // Simulated time at which to start simulation + double const start_time = 0.; + // Simulated time to reach as target of the simulation + double const end_time = 10.; + // Number of time-steps between outputs + ptrdiff_t const t_output_period = 10; + // Maximum time-step + ddc::Coordinate const max_dt {0.1}; + //! [parameters] + + //! [main-start] + + //! [X-global-domain] + // Initialization of the global domain in X + ddc::init_discrete_space( + ddc::Coordinate(x_start), + ddc::Coordinate(x_end), + nb_x_points); + ddc::init_discrete_space( + ddc::GrevilleInterpolationPoints< + BSplinesX, + ddc::BoundCond::PERIODIC, + ddc::BoundCond::PERIODIC>::get_sampling()); + + auto const x_domain = ddc::GrevilleInterpolationPoints< + BSplinesX, + ddc::BoundCond::PERIODIC, + ddc::BoundCond::PERIODIC>::get_domain(); + //! [X-global-domain] + // Initialization of the global domain in Y + auto const y_domain = ddc::init_discrete_space( + DDimY:: + init(ddc::Coordinate(y_start), + ddc::Coordinate(y_end), + ddc::DiscreteVector(nb_y_points))); + + //! [time-domains] + + // number of time intervals required to reach the end time + ddc::DiscreteVector const nb_time_steps { + std::ceil((end_time - start_time) / max_dt) + .2}; + // Initialization of the global domain in time: + // - the number of discrete time-points is equal to the number of + // steps + 1 + ddc::DiscreteDomain const time_domain + = ddc::init_discrete_space( + DDimT:: + init(ddc::Coordinate(start_time), + ddc::Coordinate(end_time), + nb_time_steps + 1)); + //! [time-domains] + + //! [data allocation] + // Maps density into the full domain twice: + // - once for the last fully computed time-step + ddc::Chunk last_density_alloc( + ddc::DiscreteDomain(x_domain, y_domain), + ddc::DeviceAllocator()); + + // - once for time-step being computed + ddc::Chunk next_density_alloc( + ddc::DiscreteDomain(x_domain, y_domain), + ddc::DeviceAllocator()); + //! [data allocation] + + //! [initial-conditions] + ddc::ChunkSpan const initial_density + = last_density_alloc.span_view(); + // Initialize the density on the main domain + ddc::DiscreteDomain x_mesh + = ddc::DiscreteDomain(x_domain, y_domain); + ddc::for_each( + ddc::policies::parallel_device, + x_mesh, + DDC_LAMBDA(ddc::DiscreteElement const ixy) { + double const x + = ddc::coordinate(ddc::select(ixy)); + double const y + = ddc::coordinate(ddc::select(ixy)); + initial_density(ixy) + = 9.999 + * Kokkos::exp(-(x * x + y * y) / 0.1 / 2); + // initial_density(ixy) = 9.999 * ((x * x + y * y) < 0.25); + }); + //! [initial-conditions] + + ddc::Chunk host_density_alloc( + ddc::DiscreteDomain(x_domain, y_domain), + ddc::HostAllocator()); + + + //! [initial output] + // display the initial data + ddc::deepcopy(host_density_alloc, last_density_alloc); + display(ddc::coordinate(time_domain.front()), + host_density_alloc[x_domain][y_domain]); + // time of the iteration where the last output happened + ddc::DiscreteElement last_output = time_domain.front(); + //! [initial output] + + //! [instantiate solver] + ddc::SplineBuilderBatched< + ddc::SplineBuilder< + Kokkos::DefaultExecutionSpace, + Kokkos::DefaultExecutionSpace::memory_space, + BSplinesX, + DDimX, + ddc::BoundCond::PERIODIC, + ddc::BoundCond::PERIODIC>, + DDimX, + DDimY> + spline_builder(x_mesh); + ddc::SplineEvaluatorBatched< + ddc::SplineEvaluator< + Kokkos::DefaultExecutionSpace, + Kokkos::DefaultExecutionSpace::memory_space, + BSplinesX, + DDimX>, + DDimX, + DDimY> + spline_evaluator( + spline_builder.spline_domain(), + ddc::g_null_boundary, + ddc::g_null_boundary); + //! [instantiate solver] + + //! [instantiate intermediate chunks] + // Instantiate chunk of spline coefs to receive output of spline_builder + ddc::Chunk coef_alloc( + spline_builder.spline_domain(), + ddc::DeviceAllocator()); + ddc::ChunkSpan coef = coef_alloc.span_view(); + + // Instantiate chunk to receive feet coords + ddc::Chunk feet_coords_alloc( + spline_builder.vals_domain(), + ddc::DeviceAllocator>()); + ddc::ChunkSpan feet_coords = feet_coords_alloc.span_view(); + //! [instantiate intermediate chunks] + + + //! [time iteration] + for (auto const iter : + time_domain.remove_first(ddc::DiscreteVector(1))) { + //! [time iteration] + + //! [manipulated views] + // a span of the density at the time-step we + // will build + ddc::ChunkSpan const next_density { + next_density_alloc.span_view()}; + // a read-only view of the density at the previous time-step + ddc::ChunkSpan const last_density { + last_density_alloc.span_view()}; + //! [manipulated views] + + //! [numerical scheme] + // Stencil computation on the main domain + // Find the coordinates of the characteristics feet + ddc::for_each( + ddc::policies::parallel_device, + feet_coords.domain(), + DDC_LAMBDA(ddc::DiscreteElement const e) { + feet_coords(e) + = ddc::coordinate(ddc::select(e)) + - ddc::Coordinate( + vx * ddc::step()); + }); + // Interpolate the values at feets on the grid + spline_builder(coef, last_density); + spline_evaluator( + next_density, + feet_coords.span_cview(), + coef.span_cview()); + //! [numerical scheme] + + //! [output] + if (iter - last_output >= t_output_period) { + last_output = iter; + ddc::deepcopy(host_density_alloc, last_density_alloc); + display(ddc::coordinate(iter), + host_density_alloc[x_domain][y_domain]); + } + //! [output] + + //! [swap] + // Swap our two buffers + std::swap(last_density_alloc, next_density_alloc); + //! [swap] + } + + //! [final output] + if (last_output < time_domain.back()) { + ddc::deepcopy(host_density_alloc, last_density_alloc); + display(ddc::coordinate(time_domain.back()), + host_density_alloc[x_domain][y_domain]); + } + //! [final output] +} diff --git a/include/ddc/ddc.hpp b/include/ddc/ddc.hpp index aaeddfa76..e19ffb8b7 100644 --- a/include/ddc/ddc.hpp +++ b/include/ddc/ddc.hpp @@ -35,3 +35,7 @@ #if defined(DDC_BUILD_PDI_WRAPPER) #include "ddc/pdi.hpp" #endif + +#if ginkgo_AVAIL +#include "misc/ginkgo_executors.hpp" +#endif diff --git a/include/ddc/kernels/splines.hpp b/include/ddc/kernels/splines.hpp new file mode 100644 index 000000000..e58374a54 --- /dev/null +++ b/include/ddc/kernels/splines.hpp @@ -0,0 +1,19 @@ +#pragma once + +#include "splines/bspline.hpp" +#include "splines/bsplines_non_uniform.hpp" +#include "splines/bsplines_uniform.hpp" +#include "splines/greville_interpolation_points.hpp" +#include "splines/knots_as_interpolation_points.hpp" +#include "splines/math_tools.hpp" +#include "splines/matrix.hpp" +#include "splines/matrix_maker.hpp" +#include "splines/matrix_sparse.hpp" +#include "splines/null_boundary_value.hpp" +#include "splines/spline_boundary_conditions.hpp" +#include "splines/spline_boundary_value.hpp" +#include "splines/spline_builder.hpp" +#include "splines/spline_builder_batched.hpp" +#include "splines/spline_evaluator.hpp" +#include "splines/spline_evaluator_batched.hpp" +#include "splines/view.hpp" diff --git a/include/ddc/kernels/splines/AUTHORS b/include/ddc/kernels/splines/AUTHORS new file mode 100644 index 000000000..d8ec7de4d --- /dev/null +++ b/include/ddc/kernels/splines/AUTHORS @@ -0,0 +1,16 @@ +Multiple people have contributed to this splines solver. To show our appreciation for their +public spirit, we list here in alphabetical order a condensed list of their +contributions. + + +Julien Bigot +* Participation to DDC-based redisign + +Emily Bourne +* Work on splines and matrices + +Baptiste Legouix +* GPU port (batching & call to Ginkgo backend) + +Thomas Padioleau +* Redesign and reimplementation using DCC diff --git a/include/ddc/kernels/splines/bspline.hpp b/include/ddc/kernels/splines/bspline.hpp new file mode 100644 index 000000000..460c1dfbe --- /dev/null +++ b/include/ddc/kernels/splines/bspline.hpp @@ -0,0 +1,7 @@ +#pragma once +namespace ddc { +template +struct BSpline +{ +}; +} // namespace ddc diff --git a/include/ddc/kernels/splines/bsplines_non_uniform.hpp b/include/ddc/kernels/splines/bsplines_non_uniform.hpp new file mode 100644 index 000000000..d17b878e8 --- /dev/null +++ b/include/ddc/kernels/splines/bsplines_non_uniform.hpp @@ -0,0 +1,533 @@ +#pragma once + +#include +#include +#include +#include + +#include + +#include "bspline.hpp" +#include "view.hpp" + +namespace ddc { +/// NonUniformPointSampling specialization of BSplines +template +class NonUniformBSplines +{ + static_assert(D > 0, "Parameter `D` must be positive"); + +public: + // From nvcc: 'A type that is defined inside a class and has private or protected access cannot be used + // in the template argument type of a variable template instantiation' + template + struct InternalTagGenerator; + + /// An internal tag necessary to allocate an internal ddc::discrete_space function. + /// It must remain internal, for example it shall not be exposed when returning ddc::coordinates. Instead use `Tag` + using KnotDim = InternalTagGenerator; + + using mesh_type = ddc::NonUniformPointSampling; + + static KOKKOS_INLINE_FUNCTION ddc::Coordinate knot_from_coord( + ddc::Coordinate const& coord) + { + return ddc::Coordinate(ddc::get(coord)); + } + static KOKKOS_INLINE_FUNCTION ddc::Coordinate coord_from_knot( + ddc::Coordinate const& coord) + { + return ddc::Coordinate(ddc::get(coord)); + } + +public: + using tag_type = Tag; + + using continuous_dimension_type = BSpline; + + + using discrete_dimension_type = NonUniformBSplines; + + using discrete_element_type = ddc::DiscreteElement; + + using discrete_domain_type = ddc::DiscreteDomain; + + using discrete_vector_type = ddc::DiscreteVector; + +public: + static constexpr std::size_t rank() + { + return 1; + } + + static constexpr std::size_t degree() noexcept + { + return D; + } + + static constexpr bool is_periodic() noexcept + { + return Tag::PERIODIC; + } + + static constexpr bool is_radial() noexcept + { + return false; + } + + static constexpr bool is_uniform() noexcept + { + return false; + } + + template + class Impl + { + template + friend class Impl; + + private: + ddc::DiscreteDomain m_domain; + const int m_nknots; + + public: + using discrete_dimension_type = NonUniformBSplines; + + Impl() = default; + + template + explicit Impl(Impl const& impl) + : m_domain(impl.m_domain) + , m_nknots(impl.m_nknots) + { + } + + /// @brief Construct a `Impl` using a brace-list, i.e. `Impl bsplines({0., 1.})` + explicit Impl(std::initializer_list> breaks) + : Impl(breaks.begin(), breaks.end()) + { + } + + /// @brief Construct a `Impl` using a C++20 "common range". + explicit Impl(std::vector> const& breaks) + : Impl(breaks.begin(), breaks.end()) + { + } + + /// @brief Construct a `Impl` using a pair of iterators. + template + Impl(RandomIt breaks_begin, RandomIt breaks_end); + + Impl(Impl const& x) = default; + + Impl(Impl&& x) = default; + + ~Impl() = default; + + Impl& operator=(Impl const& x) = default; + + Impl& operator=(Impl&& x) = default; + + KOKKOS_INLINE_FUNCTION discrete_element_type + eval_basis(std::array& values, ddc::Coordinate const& x) const; + + KOKKOS_INLINE_FUNCTION discrete_element_type + eval_deriv(std::array& derivs, ddc::Coordinate const& x) const; + + KOKKOS_INLINE_FUNCTION discrete_element_type eval_basis_and_n_derivs( + ddc::DSpan2D derivs, + ddc::Coordinate const& x, + std::size_t n) const; + + template + KOKKOS_INLINE_FUNCTION ddc::ChunkSpan + integrals( + ddc::ChunkSpan int_vals) const; + + KOKKOS_INLINE_FUNCTION ddc::Coordinate get_knot(int knot_idx) const noexcept + { + return coord_from_knot( + ddc::coordinate(ddc::DiscreteElement(knot_idx + degree()))); + } + + KOKKOS_INLINE_FUNCTION ddc::Coordinate get_first_support_knot( + discrete_element_type const& ix) const + { + return coord_from_knot(ddc::coordinate(ddc::DiscreteElement(ix.uid()))); + } + + KOKKOS_INLINE_FUNCTION ddc::Coordinate get_last_support_knot( + discrete_element_type const& ix) const + { + return coord_from_knot( + ddc::coordinate(ddc::DiscreteElement(ix.uid() + degree() + 1))); + } + + KOKKOS_INLINE_FUNCTION ddc::Coordinate get_support_knot_n( + discrete_element_type const& ix, + int n) const + { + return coord_from_knot(ddc::coordinate(ddc::DiscreteElement(ix.uid() + n))); + } + + KOKKOS_INLINE_FUNCTION ddc::Coordinate rmin() const noexcept + { + return get_knot(0); + } + + KOKKOS_INLINE_FUNCTION ddc::Coordinate rmax() const noexcept + { + return get_knot(ncells()); + } + + KOKKOS_INLINE_FUNCTION double length() const noexcept + { + return rmax() - rmin(); + } + + KOKKOS_INLINE_FUNCTION std::size_t size() const noexcept + { + return degree() + ncells(); + } + + /// Returns the discrete domain including ghost bsplines + KOKKOS_INLINE_FUNCTION discrete_domain_type full_domain() const + { + return discrete_domain_type(discrete_element_type(0), discrete_vector_type(size())); + } + + KOKKOS_INLINE_FUNCTION std::size_t npoints() const noexcept + { + return m_nknots - 2 * degree(); + } + + KOKKOS_INLINE_FUNCTION std::size_t nbasis() const noexcept + { + return ncells() + !is_periodic() * degree(); + } + + KOKKOS_INLINE_FUNCTION std::size_t ncells() const noexcept + { + return npoints() - 1; + } + + private: + KOKKOS_INLINE_FUNCTION int find_cell(ddc::Coordinate const& x) const; + }; +}; + +template +template +template +NonUniformBSplines::Impl::Impl( + RandomIt const break_begin, + RandomIt const break_end) + : m_domain( + ddc::DiscreteElement(0), + ddc::DiscreteVector( + (break_end - break_begin) + + 2 * degree())) // Create a mesh including the eventual periodic point + , m_nknots((break_end - break_begin) + 2 * degree()) +{ + std::vector> knots((break_end - break_begin) + 2 * degree()); + // Fill the provided knots + int ii = 0; + for (RandomIt it = break_begin; it < break_end; ++it) { + knots[degree() + ii] = knot_from_coord(*it); + ++ii; + } + ddc::Coordinate const rmin = knots[degree()]; + ddc::Coordinate const rmax = knots[(break_end - break_begin) + degree() - 1]; + assert(rmin < rmax); + + // Fill out the extra knots + if constexpr (is_periodic()) { + double const period = rmax - rmin; + for (std::size_t i = 1; i < degree() + 1; ++i) { + knots[degree() + -i] + = ddc::Coordinate(knots[degree() + ncells() - i] - period); + knots[degree() + ncells() + i] = ddc::Coordinate(knots[degree() + i] + period); + } + } else // open + { + for (std::size_t i = 1; i < degree() + 1; ++i) { + knots[degree() + -i] = rmin; + knots[degree() + npoints() - 1 + i] = rmax; + } + } + ddc::init_discrete_space(knots); +} + +template +template +KOKKOS_INLINE_FUNCTION ddc::DiscreteElement> NonUniformBSplines:: + Impl::eval_basis( + std::array& values, + ddc::Coordinate const& x) const +{ + std::array left; + std::array right; + + assert(x >= rmin()); + assert(x <= rmax()); + assert(values.size() == degree() + 1); + + // 1. Compute cell index 'icell' + int const icell = find_cell(x); + + assert(icell >= 0); + assert(icell <= int(ncells() - 1)); + assert(get_knot(icell) <= x); + assert(get_knot(icell + 1) >= x); + + // 2. Compute values of B-splines with support over cell 'icell' + double temp; + values[0] = 1.0; + for (std::size_t j = 0; j < degree(); ++j) { + left[j] = x - get_knot(icell - j); + right[j] = get_knot(icell + j + 1) - x; + double saved = 0.0; + for (std::size_t r = 0; r < j + 1; ++r) { + temp = values[r] / (right[r] + left[j - r]); + values[r] = saved + right[r] * temp; + saved = left[j - r] * temp; + } + values[j + 1] = saved; + } + + return discrete_element_type(icell); +} + +template +template +KOKKOS_INLINE_FUNCTION ddc::DiscreteElement> NonUniformBSplines:: + Impl::eval_deriv( + std::array& derivs, + ddc::Coordinate const& x) const +{ + std::array left; + std::array right; + + assert(x >= rmin()); + assert(x <= rmax()); + assert(derivs.size() == degree() + 1); + + // 1. Compute cell index 'icell' + int const icell = find_cell(x); + + assert(icell >= 0); + assert(icell <= int(ncells() - 1)); + assert(get_knot(icell) <= x); + assert(get_knot(icell + 1) >= x); + + // 2. Compute values of derivatives of B-splines with support over cell 'icell' + + /* + * Compute nonzero basis functions and knot differences + * for splines up to degree degree-1 which are needed to compute derivative + * First part of Algorithm A3.2 of NURBS book + */ + double saved, temp; + derivs[0] = 1.0; + for (std::size_t j = 0; j < degree() - 1; ++j) { + left[j] = x - get_knot(icell - j); + right[j] = get_knot(icell + j + 1) - x; + saved = 0.0; + for (std::size_t r = 0; r < j + 1; ++r) { + temp = derivs[r] / (right[r] + left[j - r]); + derivs[r] = saved + right[r] * temp; + saved = left[j - r] * temp; + } + derivs[j + 1] = saved; + } + + /* + * Compute derivatives at x using values stored in bsdx and formula + * for spline derivative based on difference of splines of degree degree-1 + */ + saved = degree() * derivs[0] / (get_knot(icell + 1) - get_knot(icell + 1 - degree())); + derivs[0] = -saved; + for (std::size_t j = 1; j < degree(); ++j) { + temp = saved; + saved = degree() * derivs[j] + / (get_knot(icell + j + 1) - get_knot(icell + j + 1 - degree())); + derivs[j] = temp - saved; + } + derivs[degree()] = saved; + + return discrete_element_type(icell); +} + +template +template +KOKKOS_INLINE_FUNCTION ddc::DiscreteElement> NonUniformBSplines:: + Impl::eval_basis_and_n_derivs( + ddc::DSpan2D const derivs, + ddc::Coordinate const& x, + std::size_t const n) const +{ + std::array left; + std::array right; + + std::array a_ptr; + std::experimental:: + mdspan> const a( + a_ptr.data()); + + std::array ndu_ptr; + std::experimental::mdspan< + double, + std::experimental::extents> const + ndu(ndu_ptr.data()); + + assert(x >= rmin()); + assert(x <= rmax()); + // assert(n >= 0); as long as n is unsigned + assert(n <= degree()); + assert(derivs.extent(0) == 1 + degree()); + assert(derivs.extent(1) == 1 + n); + + // 1. Compute cell index 'icell' and x_offset + int const icell = find_cell(x); + + assert(icell >= 0); + assert(icell <= int(ncells() - 1)); + assert(get_knot(icell) <= x); + assert(get_knot(icell + 1) >= x); + + // 2. Compute nonzero basis functions and knot differences for splines + // up to degree (degree-1) which are needed to compute derivative + // Algorithm A2.3 of NURBS book + // + // 21.08.2017: save inverse of knot differences to avoid unnecessary + // divisions + // [Yaman Güçlü, Edoardo Zoni] + + double saved, temp; + ndu(0, 0) = 1.0; + for (std::size_t j = 0; j < degree(); ++j) { + left[j] = x - get_knot(icell - j); + right[j] = get_knot(icell + j + 1) - x; + saved = 0.0; + for (std::size_t r = 0; r < j + 1; ++r) { + // compute inverse of knot differences and save them into lower + // triangular part of ndu + ndu(r, j + 1) = 1.0 / (right[r] + left[j - r]); + // compute basis functions and save them into upper triangular part + // of ndu + temp = ndu(j, r) * ndu(r, j + 1); + ndu(j + 1, r) = saved + right[r] * temp; + saved = left[j - r] * temp; + } + ndu(j + 1, j + 1) = saved; + } + // Save 0-th derivative + for (std::size_t j = 0; j < degree() + 1; ++j) { + derivs(j, 0) = ndu(degree(), j); + } + + for (int r = 0; r < int(degree() + 1); ++r) { + int s1 = 0; + int s2 = 1; + a(0, 0) = 1.0; + for (int k = 1; k < int(n + 1); ++k) { + double d = 0.0; + int const rk = r - k; + int const pk = degree() - k; + if (r >= k) { + a(0, s2) = a(0, s1) * ndu(rk, pk + 1); + d = a(0, s2) * ndu(pk, rk); + } + int const j1 = rk > -1 ? 1 : (-rk); + int const j2 = (r - 1) <= pk ? k : (degree() - r + 1); + for (int j = j1; j < j2; ++j) { + a(j, s2) = (a(j, s1) - a(j - 1, s1)) * ndu(rk + j, pk + 1); + d += a(j, s2) * ndu(pk, rk + j); + } + if (r <= pk) { + a(k, s2) = -a(k - 1, s1) * ndu(r, pk + 1); + d += a(k, s2) * ndu(pk, r); + } + derivs(r, k) = d; + std::swap(s1, s2); + } + } + + int r = degree(); + for (int k = 1; k < int(n + 1); ++k) { + for (std::size_t i = 0; i < derivs.extent(0); i++) { + derivs(i, k) *= r; + } + r *= degree() - k; + } + + return discrete_element_type(icell); +} + +template +template +KOKKOS_INLINE_FUNCTION int NonUniformBSplines::Impl::find_cell( + ddc::Coordinate const& x) const +{ + if (x > rmax()) + return -1; + if (x < rmin()) + return -1; + + if (x == rmin()) + return 0; + if (x == rmax()) + return ncells() - 1; + + // Binary search + int low = 0, high = ncells(); + int icell = (low + high) / 2; + while (x < get_knot(icell) || x >= get_knot(icell + 1)) { + if (x < get_knot(icell)) { + high = icell; + } else { + low = icell; + } + icell = (low + high) / 2; + } + return icell; +} + +template +template +template +KOKKOS_INLINE_FUNCTION ddc:: + ChunkSpan>, Layout, MemorySpace2> + NonUniformBSplines::Impl::integrals( + ddc::ChunkSpan< + double, + ddc::DiscreteDomain>, + Layout, + MemorySpace2> int_vals) const +{ + if constexpr (is_periodic()) { + assert(int_vals.size() == nbasis() || int_vals.size() == size()); + } else { + assert(int_vals.size() == nbasis()); + } + + double const inv_deg = 1.0 / (degree() + 1); + + discrete_domain_type const dom_bsplines( + full_domain().take_first(discrete_vector_type {nbasis()})); + for (auto ix : dom_bsplines) { + int_vals(ix) = (get_last_support_knot(ix) - get_first_support_knot(ix)) * inv_deg; + } + + if constexpr (is_periodic()) { + if (int_vals.size() == size()) { + discrete_domain_type const dom_bsplines_wrap( + full_domain().take_last(discrete_vector_type {degree()})); + for (auto ix : dom_bsplines_wrap) { + int_vals(ix) = 0; + } + } + } + return int_vals; +} +} // namespace ddc diff --git a/include/ddc/kernels/splines/bsplines_uniform.hpp b/include/ddc/kernels/splines/bsplines_uniform.hpp new file mode 100644 index 000000000..b901e6554 --- /dev/null +++ b/include/ddc/kernels/splines/bsplines_uniform.hpp @@ -0,0 +1,490 @@ +#pragma once + +#include +#include +#include + +#include + +#include "bspline.hpp" +#include "math_tools.hpp" +#include "view.hpp" + +namespace ddc { +template +class UniformBSplines +{ + static_assert(D > 0, "Parameter `D` must be positive"); + +public: + // From nvcc: 'A type that is defined inside a class and has private or protected access cannot be used + // in the template argument type of a variable template instantiation' + template + struct InternalTagGenerator; + + /// An internal tag necessary to allocate an internal ddc::discrete_space function. + /// It must remain internal, for example it shall not be exposed when returning ddc::coordinates. Instead use `Tag` + using KnotDim = InternalTagGenerator; + + using mesh_type = ddc::UniformPointSampling; + + static KOKKOS_INLINE_FUNCTION ddc::Coordinate knot_from_coord( + ddc::Coordinate const& coord) + { + return ddc::Coordinate(ddc::get(coord)); + } + static KOKKOS_INLINE_FUNCTION ddc::Coordinate coord_from_knot( + ddc::Coordinate const& coord) + { + return ddc::Coordinate(ddc::get(coord)); + } + +public: + using tag_type = Tag; + + using continuous_dimension_type = BSpline; + + + using discrete_dimension_type = UniformBSplines; + + using discrete_element_type = ddc::DiscreteElement; + + using discrete_domain_type = ddc::DiscreteDomain; + + using discrete_vector_type = ddc::DiscreteVector; + +public: + static constexpr std::size_t rank() + { + return 1; + } + + static constexpr std::size_t degree() noexcept + { + return D; + } + + static constexpr bool is_periodic() noexcept + { + return Tag::PERIODIC; + } + + static constexpr bool is_radial() noexcept + { + return false; + } + + static constexpr bool is_uniform() noexcept + { + return true; + } + + template + class Impl + { + template + friend class Impl; + + private: + // In the periodic case, it contains twice the periodic point!!! + ddc::DiscreteDomain m_domain; + + public: + using discrete_dimension_type = UniformBSplines; + + Impl() = default; + + template + explicit Impl(Impl const& impl) : m_domain(impl.m_domain) + { + } + + /** Constructs a BSpline basis with n equidistant knots over \f$[a, b]\f$ + * + * @param rmin the real ddc::coordinate of the first knot + * @param rmax the real ddc::coordinate of the last knot + * @param n_knots the number of knots + */ + explicit Impl(ddc::Coordinate rmin, ddc::Coordinate rmax, std::size_t ncells) + : m_domain( + ddc::DiscreteElement(0), + ddc::DiscreteVector( + ncells + 1)) // Create a mesh including the eventual periodic point + { + assert(ncells > 0); + ddc::init_discrete_space(mesh_type:: + init(knot_from_coord(rmin), + knot_from_coord(rmax), + ddc::DiscreteVector(ncells + 1))); + } + + Impl(Impl const& x) = default; + + Impl(Impl&& x) = default; + + ~Impl() = default; + + Impl& operator=(Impl const& x) = default; + + Impl& operator=(Impl&& x) = default; + + KOKKOS_INLINE_FUNCTION discrete_element_type + eval_basis(std::array& values, ddc::Coordinate const& x) const + { + return eval_basis(values, x, degree()); + } + + KOKKOS_INLINE_FUNCTION discrete_element_type + eval_deriv(std::array& derivs, ddc::Coordinate const& x) const; + + KOKKOS_INLINE_FUNCTION discrete_element_type eval_basis_and_n_derivs( + ddc::DSpan2D derivs, + ddc::Coordinate const& x, + std::size_t n) const; + + template + KOKKOS_INLINE_FUNCTION ddc::ChunkSpan + integrals( + ddc::ChunkSpan int_vals) const; + + KOKKOS_INLINE_FUNCTION ddc::Coordinate get_knot(int idx) const noexcept + { + return ddc::Coordinate(rmin() + idx * ddc::step()); + } + + KOKKOS_INLINE_FUNCTION double get_first_support_knot(discrete_element_type const& ix) const + { + return get_knot(ix.uid() - degree()); + } + + KOKKOS_INLINE_FUNCTION double get_last_support_knot(discrete_element_type const& ix) const + { + return get_knot(ix.uid() + 1); + } + + KOKKOS_INLINE_FUNCTION double get_support_knot_n(discrete_element_type const& ix, int n) + const + { + return get_knot(ix.uid() + n - degree()); + } + + KOKKOS_INLINE_FUNCTION ddc::Coordinate rmin() const noexcept + { + return coord_from_knot(ddc::coordinate(m_domain.front())); + } + + KOKKOS_INLINE_FUNCTION ddc::Coordinate rmax() const noexcept + { + return coord_from_knot(ddc::coordinate(m_domain.back())); + } + + KOKKOS_INLINE_FUNCTION double length() const noexcept + { + return rmax() - rmin(); + } + + KOKKOS_INLINE_FUNCTION std::size_t size() const noexcept + { + return degree() + ncells(); + } + + /// Returns the discrete domain including ghost bsplines + KOKKOS_INLINE_FUNCTION discrete_domain_type full_domain() const + { + return discrete_domain_type(discrete_element_type(0), discrete_vector_type(size())); + } + + KOKKOS_INLINE_FUNCTION std::size_t nbasis() const noexcept + { + return ncells() + !is_periodic() * degree(); + } + + KOKKOS_INLINE_FUNCTION std::size_t ncells() const noexcept + { + return m_domain.size() - 1; + } + + private: + KOKKOS_INLINE_FUNCTION double inv_step() const noexcept + { + return 1.0 / ddc::step(); + } + + template + KOKKOS_INLINE_FUNCTION discrete_element_type eval_basis( + std::array& values, + ddc::Coordinate const& x, + std::size_t degree) const; + + KOKKOS_INLINE_FUNCTION void get_icell_and_offset( + int& icell, + double& offset, + ddc::Coordinate const& x) const; + }; +}; + +template +template +template +KOKKOS_INLINE_FUNCTION ddc::DiscreteElement> UniformBSplines:: + Impl::eval_basis( + std::array& values, + ddc::Coordinate const& x, + std::size_t const deg) const +{ + assert(values.size() == deg + 1); + + double offset; + int jmin; + // 1. Compute cell index 'icell' and x_offset + // 2. Compute index range of B-splines with support over cell 'icell' + get_icell_and_offset(jmin, offset, x); + + // 3. Compute values of aforementioned B-splines + double xx, temp, saved; + values[0] = 1.0; + for (std::size_t j = 1; j < deg + 1; ++j) { + xx = -offset; + saved = 0.0; + for (std::size_t r = 0; r < j; ++r) { + xx += 1; + temp = values[r] / j; + values[r] = saved + xx * temp; + saved = (j - xx) * temp; + } + values[j] = saved; + } + + return discrete_element_type(jmin); +} + +template +template +KOKKOS_INLINE_FUNCTION ddc::DiscreteElement> UniformBSplines::Impl< + MemorySpace>::eval_deriv(std::array& derivs, ddc::Coordinate const& x) + const +{ + assert(derivs.size() == degree() + 1); + + double offset; + int jmin; + // 1. Compute cell index 'icell' and x_offset + // 2. Compute index range of B-splines with support over cell 'icell' + get_icell_and_offset(jmin, offset, x); + + // 3. Compute derivatives of aforementioned B-splines + // Derivatives are normalized, hence they should be divided by dx + double xx, temp, saved; + derivs[0] = 1.0 / ddc::step(); + for (std::size_t j = 1; j < degree(); ++j) { + xx = -offset; + saved = 0.0; + for (std::size_t r = 0; r < j; ++r) { + xx += 1.0; + temp = derivs[r] / j; + derivs[r] = saved + xx * temp; + saved = (j - xx) * temp; + } + derivs[j] = saved; + } + + // Compute derivatives + double bjm1 = derivs[0]; + double bj = bjm1; + derivs[0] = -bjm1; + for (std::size_t j = 1; j < degree(); ++j) { + bj = derivs[j]; + derivs[j] = bjm1 - bj; + bjm1 = bj; + } + derivs[degree()] = bj; + + return discrete_element_type(jmin); +} + +template +template +KOKKOS_INLINE_FUNCTION ddc::DiscreteElement> UniformBSplines:: + Impl::eval_basis_and_n_derivs( + ddc::DSpan2D const derivs, + ddc::Coordinate const& x, + std::size_t const n) const +{ + std::array ndu_ptr; + std::experimental::mdspan< + double, + std::experimental::extents> const + ndu(ndu_ptr.data()); + std::array a_ptr; + std::experimental:: + mdspan> const a( + a_ptr.data()); + double offset; + int jmin; + + assert(x >= rmin()); + assert(x <= rmax()); + // assert(n >= 0); as long as n is unsigned + assert(n <= degree()); + assert(derivs.extent(0) == 1 + degree()); + assert(derivs.extent(1) == 1 + n); + + // 1. Compute cell index 'icell' and x_offset + // 2. Compute index range of B-splines with support over cell 'icell' + get_icell_and_offset(jmin, offset, x); + + // 3. Recursively evaluate B-splines (see + // "sll_s_uniform_BSplines_eval_basis") + // up to self%degree, and store them all in the upper-right triangle of + // ndu + double xx, temp, saved; + ndu(0, 0) = 1.0; + for (std::size_t j = 1; j < degree() + 1; ++j) { + xx = -offset; + saved = 0.0; + for (std::size_t r = 0; r < j; ++r) { + xx += 1.0; + temp = ndu(j - 1, r) / j; + ndu(j, r) = saved + xx * temp; + saved = (j - xx) * temp; + } + ndu(j, j) = saved; + } + for (std::size_t i = 0; i < ndu.extent(1); ++i) { + derivs(i, 0) = ndu(degree(), i); + } + + for (int r = 0; r < int(degree() + 1); ++r) { + int s1 = 0; + int s2 = 1; + a(0, 0) = 1.0; + for (int k = 1; k < int(n + 1); ++k) { + double d = 0.0; + int const rk = r - k; + int const pk = degree() - k; + if (r >= k) { + a(0, s2) = a(0, s1) / (pk + 1); + d = a(0, s2) * ndu(pk, rk); + } + int const j1 = rk > -1 ? 1 : (-rk); + int const j2 = (r - 1) <= pk ? k : (degree() - r + 1); + for (int j = j1; j < j2; ++j) { + a(j, s2) = (a(j, s1) - a(j - 1, s1)) / (pk + 1); + d += a(j, s2) * ndu(pk, rk + j); + } + if (r <= pk) { + a(k, s2) = -a(k - 1, s1) / (pk + 1); + d += a(k, s2) * ndu(pk, r); + } + derivs(r, k) = d; + std::swap(s1, s2); + } + } + + // Multiply result by correct factors: + // degree!/(degree-n)! = degree*(degree-1)*...*(degree-n+1) + // k-th derivatives are normalized, hence they should be divided by dx^k + double const inv_dx = inv_step(); + double d = degree() * inv_dx; + for (int k = 1; k < int(n + 1); ++k) { + for (std::size_t i = 0; i < derivs.extent(0); ++i) { + derivs(i, k) *= d; + } + d *= (degree() - k) * inv_dx; + } + + return discrete_element_type(jmin); +} + +template +template +KOKKOS_INLINE_FUNCTION void UniformBSplines::Impl::get_icell_and_offset( + int& icell, + double& offset, + ddc::Coordinate const& x) const +{ + assert(x >= rmin()); + assert(x <= rmax()); + + double const inv_dx = inv_step(); + if (x == rmin()) { + icell = 0; + offset = 0.0; + } else if (x == rmax()) { + icell = ncells() - 1; + offset = 1.0; + } else { + offset = (x - rmin()) * inv_dx; + icell = static_cast(offset); + offset = offset - icell; + + // When x is very close to xmax, round-off may cause the wrong answer + // icell=ncells and x_offset=0, which we convert to the case x=xmax: + if (icell == int(ncells()) && offset == 0.0) { + icell = ncells() - 1; + offset = 1.0; + } + } +} + +template +template +template +KOKKOS_INLINE_FUNCTION ddc::ChunkSpan< + double, + ddc::DiscreteDomain>, + Layout, + MemorySpace2> +UniformBSplines::Impl::integrals( + ddc::ChunkSpan>, Layout, MemorySpace2> + int_vals) const +{ + if constexpr (is_periodic()) { + assert(int_vals.size() == nbasis() || int_vals.size() == size()); + } else { + assert(int_vals.size() == nbasis()); + } + discrete_domain_type const full_dom_splines(full_domain()); + + if constexpr (is_periodic()) { + discrete_domain_type const dom_bsplines( + full_dom_splines.take_first(discrete_vector_type {nbasis()})); + for (auto ix : dom_bsplines) { + int_vals(ix) = ddc::step(); + } + if (int_vals.size() == size()) { + discrete_domain_type const dom_bsplines_repeated( + full_dom_splines.take_last(discrete_vector_type {degree()})); + for (auto ix : dom_bsplines_repeated) { + int_vals(ix) = 0; + } + } + } else { + discrete_domain_type const dom_bspline_entirely_in_domain + = full_dom_splines + .remove(discrete_vector_type(degree()), discrete_vector_type(degree())); + for (auto ix : dom_bspline_entirely_in_domain) { + int_vals(ix) = ddc::step(); + } + + std::array edge_vals_ptr; + std::experimental:: + mdspan> const + edge_vals(edge_vals_ptr.data()); + + eval_basis(edge_vals_ptr, rmin(), degree() + 1); + + double const d_eval = ddc::detail::sum(edge_vals); + + for (std::size_t i = 0; i < degree(); ++i) { + double const c_eval = ddc::detail::sum(edge_vals, 0, degree() - i); + + double const edge_value = ddc::step() * (d_eval - c_eval); + + int_vals(discrete_element_type(i)) = edge_value; + int_vals(discrete_element_type(nbasis() - 1 - i)) = edge_value; + } + } + return int_vals; +} +} // namespace ddc diff --git a/include/ddc/kernels/splines/greville_interpolation_points.hpp b/include/ddc/kernels/splines/greville_interpolation_points.hpp new file mode 100644 index 000000000..50ba18fd3 --- /dev/null +++ b/include/ddc/kernels/splines/greville_interpolation_points.hpp @@ -0,0 +1,196 @@ +#pragma once + +#include +#include +#include + +#include + +#include "spline_boundary_conditions.hpp" + +namespace ddc { +template +class GrevilleInterpolationPoints +{ + using tag_type = typename BSplines::tag_type; + + template > + static auto uniform_greville_points() + { + using Sampling = ddc::UniformPointSampling; + using SamplingImpl = typename Sampling::template Impl; + + double constexpr shift = (BSplines::degree() % 2 == 0) ? 0.5 : 0.0; + double dx + = (ddc::discrete_space().rmax() - ddc::discrete_space().rmin()) + / ddc::discrete_space().ncells(); + return SamplingImpl( + ddc::Coordinate(ddc::discrete_space().rmin() + shift * dx), + ddc::Coordinate(dx)); + } + + template > + static auto non_uniform_greville_points() + { + using Sampling = ddc::NonUniformPointSampling; + using SamplingImpl = typename Sampling::template Impl; + + std::vector greville_points(ddc::discrete_space().nbasis()); + ddc::DiscreteDomain bspline_domain + = ddc::discrete_space().full_domain().take_first( + ddc::DiscreteVector(ddc::discrete_space().nbasis())); + + ddc::for_each(bspline_domain, [&](ddc::DiscreteElement ib) { + // Define the Greville points from the bspline knots + greville_points[ib.uid()] = 0.0; + for (std::size_t i(0); i < BSplines::degree(); ++i) { + greville_points[ib.uid()] + += ddc::discrete_space().get_support_knot_n(ib, i + 1); + } + greville_points[ib.uid()] /= BSplines::degree(); + }); + + std::vector temp_knots(BSplines::degree()); + // Use periodicity to ensure all points are in the domain + if constexpr (U::is_periodic()) { + int npoints(0); + // Count the number of interpolation points that need shifting to preserve the ordering + while (greville_points[npoints] < ddc::discrete_space().rmin()) { + temp_knots[npoints] + = greville_points[npoints] + ddc::discrete_space().length(); + npoints++; + } + // Shift the points + for (std::size_t i = 0; i < ddc::discrete_space().nbasis() - npoints; ++i) { + greville_points[i] = greville_points[i + npoints]; + } + for (int i = 0; i < npoints; ++i) { + greville_points[ddc::discrete_space().nbasis() - npoints + i] + = temp_knots[i]; + } + } + + return SamplingImpl(greville_points); + } + + static constexpr int N_BE_MIN = n_boundary_equations(BcXmin, BSplines::degree()); + static constexpr int N_BE_MAX = n_boundary_equations(BcXmax, BSplines::degree()); + template + static constexpr bool is_uniform_mesh_v + = U::is_uniform() && ((N_BE_MIN != 0 && N_BE_MAX != 0) || U::is_periodic()); + +public: + template < + typename U = BSplines, + std::enable_if_t< + is_uniform_mesh_v, + bool> = true> // U must be in condition for SFINAE + static auto get_sampling() + { + return uniform_greville_points(); + } + + template < + typename U = BSplines, + std::enable_if_t< + !is_uniform_mesh_v, + bool> = true> // U must be in condition for SFINAE + static auto get_sampling() + { + using Sampling = ddc::NonUniformPointSampling; + using SamplingImpl = typename Sampling::template Impl; + if constexpr (U::is_uniform()) { + auto points_wo_bcs = uniform_greville_points(); + int const n_break_points = ddc::discrete_space().ncells() + 1; + int const npoints = ddc::discrete_space().nbasis() - N_BE_MIN - N_BE_MAX; + std::vector points_with_bcs(npoints); + + // Construct Greville-like points at the edge + if constexpr (BcXmin == ddc::BoundCond::GREVILLE) { + for (std::size_t i(0); i < BSplines::degree() / 2 + 1; ++i) { + points_with_bcs[i] + = (BSplines::degree() - i) * ddc::discrete_space().rmin(); + for (std::size_t j(0); j < i; ++j) { + points_with_bcs[i] += ddc::discrete_space().get_support_knot_n( + ddc::DiscreteElement(i), + BSplines::degree() - j); + } + points_with_bcs[i] /= BSplines::degree(); + } + } else { + points_with_bcs[0] = points_wo_bcs.coordinate( + ddc::DiscreteElement>(0)); + } + + int const n_start + = (BcXmin == ddc::BoundCond::GREVILLE) ? BSplines::degree() / 2 + 1 : 1; + int const domain_size = n_break_points - 2; + ddc::DiscreteDomain> const + domain(ddc::DiscreteElement>(1), + ddc::DiscreteVector>(domain_size)); + + // Copy central points + ddc::for_each(domain, [&](auto ip) { + points_with_bcs[ip.uid() + n_start - 1] = points_wo_bcs.coordinate(ip); + }); + + // Construct Greville-like points at the edge + if constexpr (BcXmax == ddc::BoundCond::GREVILLE) { + for (std::size_t i(0); i < BSplines::degree() / 2 + 1; ++i) { + points_with_bcs[npoints - 1 - i] + = (BSplines::degree() - i) * ddc::discrete_space().rmax(); + for (std::size_t j(0); j < i; ++j) { + points_with_bcs[npoints - 1 - i] + += ddc::discrete_space().get_support_knot_n( + ddc::DiscreteElement( + ddc::discrete_space().nbasis() - 1 - i), + j + 1); + } + points_with_bcs[npoints - 1 - i] /= BSplines::degree(); + } + } else { + points_with_bcs[npoints - 1] = points_wo_bcs.coordinate( + ddc::DiscreteElement>( + ddc::discrete_space().ncells() - 1 + + BSplines::degree() % 2)); + } + return SamplingImpl(points_with_bcs); + } else { + auto points_wo_bcs = non_uniform_greville_points(); + if constexpr (N_BE_MIN == 0 && N_BE_MAX == 0) { + return points_wo_bcs; + } else { + // All points are Greville points. Extract unnecessary points near the boundary + std::vector points_with_bcs(points_wo_bcs.size() - N_BE_MIN - N_BE_MAX); + int constexpr n_start = N_BE_MIN; + + using length = ddc::DiscreteVector>; + + ddc::DiscreteDomain> const + domain(ddc::DiscreteElement>( + n_start), + length(points_with_bcs.size())); + + points_with_bcs[0] = points_wo_bcs.coordinate(domain.front()); + ddc::for_each(domain.remove(length(1), length(1)), [&](auto ip) { + points_with_bcs[ip.uid() - n_start] = points_wo_bcs.coordinate(ip); + }); + points_with_bcs[points_with_bcs.size() - 1] + = points_wo_bcs.coordinate(domain.back()); + + return SamplingImpl(points_with_bcs); + } + } + } + + using interpolation_mesh_type = typename decltype(get_sampling())::discrete_dimension_type; + + static ddc::DiscreteDomain get_domain() + { + int const npoints = ddc::discrete_space().nbasis() - N_BE_MIN - N_BE_MAX; + return ddc::DiscreteDomain( + ddc::DiscreteElement(0), + ddc::DiscreteVector(npoints)); + } +}; +} // namespace ddc diff --git a/include/ddc/kernels/splines/knots_as_interpolation_points.hpp b/include/ddc/kernels/splines/knots_as_interpolation_points.hpp new file mode 100644 index 000000000..70563c07d --- /dev/null +++ b/include/ddc/kernels/splines/knots_as_interpolation_points.hpp @@ -0,0 +1,51 @@ +#pragma once + +#include + +#include + +#include "spline_boundary_conditions.hpp" + +namespace ddc { +template +class KnotsAsInterpolationPoints +{ + static_assert(BcXmin != ddc::BoundCond::GREVILLE); + static_assert(BcXmax != ddc::BoundCond::GREVILLE); + + using tag_type = typename BSplines::tag_type; + +public: + template + static auto get_sampling() + { + if constexpr (U::is_uniform()) { + using Sampling = ddc::UniformPointSampling; + using SamplingImpl = typename Sampling::template Impl; + return SamplingImpl( + ddc::discrete_space().rmin(), + ddc::discrete_space().rmax(), + ddc::DiscreteVector>( + ddc::discrete_space().ncells() + 1)); + } else { + using Sampling = ddc::NonUniformPointSampling; + using SamplingImpl = typename Sampling::template Impl; + std::vector knots(ddc::discrete_space().npoints()); + for (int i(0); i < ddc::discrete_space().npoints(); ++i) { + knots[i] = ddc::discrete_space().get_knot(i); + } + return SamplingImpl(knots); + } + } + + using interpolation_mesh_type = typename decltype(get_sampling())::discrete_dimension_type; + + static ddc::DiscreteDomain get_domain() + { + int const npoints = ddc::discrete_space().ncells() + !BSplines::is_periodic(); + return ddc::DiscreteDomain( + ddc::DiscreteElement(0), + ddc::DiscreteVector(npoints)); + } +}; +} // namespace ddc diff --git a/include/ddc/kernels/splines/math_tools.hpp b/include/ddc/kernels/splines/math_tools.hpp new file mode 100644 index 000000000..c051e97a9 --- /dev/null +++ b/include/ddc/kernels/splines/math_tools.hpp @@ -0,0 +1,111 @@ +#pragma once + +#include +#include + +#include + +namespace ddc::detail { +template +KOKKOS_INLINE_FUNCTION T sum(T* array, int size) +{ + T val(0.0); + for (int i(0); i < size; ++i) { + val += array[i]; + } + return val; +} + +template +KOKKOS_INLINE_FUNCTION ElementType sum(std::experimental::mdspan< + ElementType, + std::experimental::extents, + LayoutPolicy, + AccessorPolicy> const& array) +{ + ElementType val(0.0); + for (std::size_t i(0); i < array.extent(0); ++i) { + val += array[i]; + } + return val; +} + +template +KOKKOS_INLINE_FUNCTION ElementType +sum(std::experimental::mdspan< + ElementType, + std::experimental::extents, + LayoutPolicy, + AccessorPolicy> const& array, + int start, + int end) +{ + ElementType val(0.0); + for (int i(start); i < end; ++i) { + val += array[i]; + } + return val; +} + +template +KOKKOS_INLINE_FUNCTION T modulo(T x, T y) +{ + return x - y * Kokkos::floor(double(x) / y); +} + +KOKKOS_INLINE_FUNCTION double ipow(double a, std::size_t i) +{ + double r(1.0); + for (std::size_t j(0); j < i; ++j) { + r *= a; + } + return r; +} + +KOKKOS_INLINE_FUNCTION double ipow(double a, int i) +{ + double r(1.0); + if (i > 0) { + for (int j(0); j < i; ++j) { + r *= a; + } + } else if (i < 0) { + for (int j(0); j < -i; ++j) { + r *= a; + } + r = 1.0 / r; + } + return r; +} + +KOKKOS_INLINE_FUNCTION std::size_t factorial(std::size_t f) +{ + std::size_t r = 1; + for (std::size_t i(2); i < f + 1; ++i) { + r *= i; + } + return r; +} + +template +KOKKOS_INLINE_FUNCTION T dot_product(std::array const& a, std::array const& b) +{ + T result = 0; + for (std::size_t i(0); i < D; ++i) { + result += a[i] * b[i]; + } + return result; +} + +template +KOKKOS_INLINE_FUNCTION T min(T x, T y) +{ + return x < y ? x : y; +} + +template +KOKKOS_INLINE_FUNCTION T max(T x, T y) +{ + return x > y ? x : y; +} +} // namespace ddc::detail diff --git a/include/ddc/kernels/splines/matrix.hpp b/include/ddc/kernels/splines/matrix.hpp new file mode 100644 index 000000000..5d4f87583 --- /dev/null +++ b/include/ddc/kernels/splines/matrix.hpp @@ -0,0 +1,107 @@ +#pragma once +#include +#include +#include +#include +#include +#include + +#include +#include + +#include "view.hpp" + +namespace ddc::detail { +class Matrix +{ +public: + Matrix(const int mat_size) : n(mat_size) {} + virtual ~Matrix() = default; + int n; + virtual double get_element(int i, int j) const = 0; + virtual void set_element(int i, int j, double aij) = 0; + virtual void factorize() + { + int const info = factorize_method(); + + if (info < 0) { + std::cerr << -info << "-th argument had an illegal value" << std::endl; + // TODO: Add LOG_FATAL_ERROR + } else if (info > 0) { + std::cerr << "U(" << info << "," << info << ") is exactly zero."; + std::cerr << " The factorization has been completed, but the factor"; + std::cerr << " U is exactly singular, and division by zero will occur " + "if " + "it is used to"; + std::cerr << " solve a system of equations."; + // TODO: Add LOG_FATAL_ERROR + } + } + virtual ddc::DSpan1D solve_inplace(ddc::DSpan1D const b) const + { + assert(int(b.extent(0)) == n); + int const info = solve_inplace_method(b.data_handle(), 'N', 1); + + if (info < 0) { + std::cerr << -info << "-th argument had an illegal value" << std::endl; + // TODO: Add LOG_FATAL_ERROR + } + return b; + } + virtual ddc::DSpan1D solve_transpose_inplace(ddc::DSpan1D const b) const + { + assert(int(b.extent(0)) == n); + int const info = solve_inplace_method(b.data_handle(), 'T', 1); + + if (info < 0) { + std::cerr << -info << "-th argument had an illegal value" << std::endl; + // TODO: Add LOG_FATAL_ERROR + } + return b; + } + virtual ddc::DSpan2D solve_multiple_inplace(ddc::DSpan2D const bx) const + { + assert(int(bx.extent(1)) == n); + int const info = solve_inplace_method(bx.data_handle(), 'N', bx.extent(0)); + + if (info < 0) { + std::cerr << -info << "-th argument had an illegal value" << std::endl; + // TODO: Add LOG_FATAL_ERROR + } + return bx; + } + template + Kokkos::View solve_batch_inplace( + Kokkos::View const bx) const + { + assert(int(bx.extent(0)) == n); + int const info = solve_inplace_method(bx.data(), 'N', bx.extent(1)); + + if (info < 0) { + std::cerr << -info << "-th argument had an illegal value" << std::endl; + // TODO: Add LOG_FATAL_ERROR + } + return bx; + } + int get_size() const + { + return n; + } + + std::ostream& operator<<(std::ostream& os) + { + int const n = get_size(); + for (int i = 0; i < n; ++i) { + for (int j = 0; j < n; ++j) { + os << std::fixed << std::setprecision(3) << std::setw(10) << get_element(i, j); + } + os << std::endl; + } + return os; + } + +protected: + virtual int factorize_method() = 0; + virtual int solve_inplace_method(double* b, char transpose, int n_equations) const = 0; +}; +} // namespace ddc::detail diff --git a/include/ddc/kernels/splines/matrix_maker.hpp b/include/ddc/kernels/splines/matrix_maker.hpp new file mode 100644 index 000000000..8484d47ec --- /dev/null +++ b/include/ddc/kernels/splines/matrix_maker.hpp @@ -0,0 +1,25 @@ +#pragma once +#include + +#include "matrix_sparse.hpp" + + +namespace ddc::detail { +class MatrixMaker +{ +public: + template + static std::unique_ptr make_new_sparse( + int const n, + std::optional cols_per_par_chunk = std::nullopt, + std::optional par_chunks_per_seq_chunk = std::nullopt, + std::optional preconditionner_max_block_size = std::nullopt) + { + return std::make_unique>( + n, + cols_per_par_chunk, + par_chunks_per_seq_chunk, + preconditionner_max_block_size); + } +}; +} // namespace ddc::detail diff --git a/include/ddc/kernels/splines/matrix_sparse.hpp b/include/ddc/kernels/splines/matrix_sparse.hpp new file mode 100644 index 000000000..79975cce9 --- /dev/null +++ b/include/ddc/kernels/splines/matrix_sparse.hpp @@ -0,0 +1,325 @@ +#pragma once +#include +#include +#include +#include +#include +#include +#include + +#include + +#include + +#include "ginkgo/core/matrix/dense.hpp" + +#include "matrix.hpp" +#include "view.hpp" + +namespace ddc::detail { + +// Matrix class for Csr storage and iterative solve +template +class Matrix_Sparse : public Matrix +{ +private: + const int m_m; + const int m_n; + Kokkos::View m_rows; + Kokkos::View m_cols; + Kokkos::View m_data; + + std::unique_ptr< + gko::solver::Bicgstab::Factory, + std::default_delete::Factory>> + m_solver_factory; + + int m_cols_per_par_chunk; // Maximum number of columns of B to be passed to a Ginkgo solver + int m_par_chunks_per_seq_chunk; // Maximum number of teams to be executed in parallel + int m_preconditionner_max_block_size; // Maximum size of Jacobi-block preconditionner + +public: + // Constructor + explicit Matrix_Sparse( + const int mat_size, + std::optional cols_per_par_chunk = std::nullopt, + std::optional par_chunks_per_seq_chunk = std::nullopt, + std::optional preconditionner_max_block_size = std::nullopt) + : Matrix(mat_size) + , m_m(mat_size) + , m_n(mat_size) + , m_rows("rows", mat_size + 1) + , m_cols("cols", mat_size * mat_size) + , m_data("data", mat_size * mat_size) + { + // Fill the csr indexes as a dense matrix and initialize with zeros (zeros will be removed once non-zeros elements will be set) + for (int i = 0; i < m_m * m_n; i++) { + if (i < m_m + 1) { + m_rows(i) = i * m_n; //CSR + } + m_cols(i) = i % m_n; + m_data(i) = 0; + } + + if (cols_per_par_chunk.has_value()) { + m_cols_per_par_chunk = cols_per_par_chunk.value(); + } else { +#ifdef KOKKOS_ENABLE_SERIAL + if (std::is_same_v) { + m_cols_per_par_chunk = 512; + } +#endif +#ifdef KOKKOS_ENABLE_OPENMP + if (std::is_same_v) { + m_cols_per_par_chunk = 512; + } +#endif +#ifdef KOKKOS_ENABLE_CUDA + if (std::is_same_v) { + m_cols_per_par_chunk = 1024; + } +#endif +#ifdef KOKKOS_ENABLE_HIP + if (std::is_same_v) { + m_cols_per_par_chunk = 1024; + } +#endif + } + + if (par_chunks_per_seq_chunk.has_value()) { + m_par_chunks_per_seq_chunk = par_chunks_per_seq_chunk.value(); + } else { +#ifdef KOKKOS_ENABLE_SERIAL + if (std::is_same_v) { + m_par_chunks_per_seq_chunk = 1; + } +#endif +#ifdef KOKKOS_ENABLE_OPENMP + if (std::is_same_v) { + m_par_chunks_per_seq_chunk = Kokkos::DefaultHostExecutionSpace().concurrency(); + } +#endif +#ifdef KOKKOS_ENABLE_CUDA + if (std::is_same_v) { + m_par_chunks_per_seq_chunk = Kokkos::DefaultHostExecutionSpace().concurrency(); + } +#endif +#ifdef KOKKOS_ENABLE_HIP + if (std::is_same_v) { + m_par_chunks_per_seq_chunk = Kokkos::DefaultHostExecutionSpace().concurrency(); + } +#endif + } + + if (preconditionner_max_block_size.has_value()) { + m_preconditionner_max_block_size = preconditionner_max_block_size.value(); + } else { +#ifdef KOKKOS_ENABLE_SERIAL + if (std::is_same_v) { + m_preconditionner_max_block_size = 8u; + } +#endif +#ifdef KOKKOS_ENABLE_OPENMP + if (std::is_same_v) { + m_preconditionner_max_block_size = 8u; + } +#endif +#ifdef KOKKOS_ENABLE_CUDA + if (std::is_same_v) { + m_preconditionner_max_block_size = 1u; + } +#endif +#ifdef KOKKOS_ENABLE_HIP + if (std::is_same_v) { + m_preconditionner_max_block_size = 1u; + } +#endif + } + + // Create the solver factory + std::shared_ptr gko_exec; + if (false) { + } +#ifdef KOKKOS_ENABLE_OPENMP + else if (std::is_same_v) { + gko_exec = create_gko_exec(); + } +#endif + else { + gko_exec = create_gko_exec(); + } + std::shared_ptr::Factory> residual_criterion + = gko::stop::ResidualNorm<>::build().with_reduction_factor(1e-20).on(gko_exec); + m_solver_factory + = gko::solver::Bicgstab<>::build() + .with_preconditioner( + gko::preconditioner::Jacobi<>::build() + .with_max_block_size(m_preconditionner_max_block_size) + .on(gko_exec)) + .with_criteria( + residual_criterion, + gko::stop::Iteration::build().with_max_iters(1000u).on(gko_exec)) + .on(gko_exec); + } + + std::unique_ptr, std::default_delete>> to_gko_vec( + double* vec_ptr, + size_t n, + size_t n_equations, + std::shared_ptr gko_exec) const + { + auto v = gko::matrix::Dense<>:: + create(gko_exec, + gko::dim<2>(n, n_equations), + gko::array::view(gko_exec, n * n_equations, vec_ptr), + n_equations); + return v; + } + + std::unique_ptr, std::default_delete>> to_gko_mat( + double* mat_ptr, + size_t n_nonzero_rows, + size_t n_nonzeros, + std::shared_ptr gko_exec) const + { + auto M = gko::matrix::Csr<>:: + create(gko_exec, + gko::dim<2>(m_m, m_n), + gko::array::view(gko_exec, n_nonzeros, mat_ptr), + gko::array::view(gko_exec, n_nonzeros, m_cols.data()), + gko::array::view(gko_exec, n_nonzero_rows + 1, m_rows.data())); + return M; + } + + virtual double get_element(int i, int j) const override + { + throw std::runtime_error("MatrixSparse::get_element() is not implemented because no API is " + "provided by Ginkgo"); + } + + virtual void set_element(int i, int j, double aij) override + { + m_data(i * m_n + j) = aij; + } + + int factorize_method() override + { + std::shared_ptr gko_exec = create_gko_exec(); + // Remove zeros + auto data_mat + = gko::share(to_gko_mat(m_data.data(), m_m, m_m * m_n, gko_exec->get_master())); + auto data_mat_ = gko::matrix_data<>(gko::dim<2>(m_m, m_n)); + data_mat->write(data_mat_); + data_mat_.remove_zeros(); + data_mat->read(data_mat_); + + // Realloc Kokkos::Views without zeros + Kokkos::realloc(Kokkos::WithoutInitializing, m_cols, data_mat_.nonzeros.size()); + Kokkos::realloc(Kokkos::WithoutInitializing, m_data, data_mat_.nonzeros.size()); + Kokkos::deep_copy( + m_rows, + Kokkos::View< + int*, + Kokkos::HostSpace, + Kokkos::MemoryTraits< + Kokkos::Unmanaged>>(data_mat->get_row_ptrs(), m_m + 1)); + Kokkos::deep_copy( + m_cols, + Kokkos::View>( + data_mat->get_col_idxs(), + data_mat->get_num_stored_elements())); + Kokkos::deep_copy( + m_data, + Kokkos::View>( + data_mat->get_values(), + data_mat->get_num_stored_elements())); + + return 0; + } + + virtual int solve_inplace_method(double* b, char transpose, int n_equations) const override + { + std::shared_ptr gko_exec = create_gko_exec(); + auto data_mat = gko::share(to_gko_mat( + m_data.data(), + m_rows.size() - 1, + m_cols.size(), + gko_exec->get_master())); + auto data_mat_device = gko::share(gko::clone(gko_exec, data_mat)); + Kokkos::View< + double**, + Kokkos::LayoutRight, + ExecSpace, + Kokkos::MemoryTraits> + b_view(b, m_n, n_equations); + + const int n_seq_chunks + = n_equations / m_cols_per_par_chunk / m_par_chunks_per_seq_chunk + 1; + const int par_chunks_per_last_seq_chunk + = (n_equations % (m_cols_per_par_chunk * m_par_chunks_per_seq_chunk)) + / m_cols_per_par_chunk + + 1; + const int cols_per_last_par_chunk + = (n_equations % (m_cols_per_par_chunk * m_par_chunks_per_seq_chunk * n_seq_chunks)) + % m_cols_per_par_chunk; + + Kokkos::View b_buffer( + "b_buffer", + std::min(n_equations / m_cols_per_par_chunk, m_par_chunks_per_seq_chunk), + m_n, + std::min(n_equations, m_cols_per_par_chunk)); + // Last par_chunk of last seq_chunk do not have same number of columns than the others. To get proper layout (because we passe the pointers to Ginkgo), we need a dedicated allocation + Kokkos::View + b_last_buffer("b_last_buffer", m_n, cols_per_last_par_chunk); + + for (int i = 0; i < n_seq_chunks; i++) { + int n_par_chunks_in_seq_chunk = i < n_seq_chunks - 1 ? m_par_chunks_per_seq_chunk + : par_chunks_per_last_seq_chunk; + Kokkos::parallel_for( + Kokkos::RangePolicy< + Kokkos::DefaultHostExecutionSpace>(0, n_par_chunks_in_seq_chunk), + [&](int const j) { + int n_equations_in_par_chunk + = (i < n_seq_chunks - 1 || j < n_par_chunks_in_seq_chunk - 1) + ? m_cols_per_par_chunk + : cols_per_last_par_chunk; + if (n_equations_in_par_chunk != 0) { + auto solver = m_solver_factory->generate(data_mat_device); + std::pair par_chunk_window( + (i * m_par_chunks_per_seq_chunk + j) * m_cols_per_par_chunk, + (i * m_par_chunks_per_seq_chunk + j) * m_cols_per_par_chunk + + n_equations_in_par_chunk); + Kokkos::View b_par_chunk; + if (i < n_seq_chunks - 1 || j < n_par_chunks_in_seq_chunk - 1) { + b_par_chunk = Kokkos:: + subview(b_buffer, + j, + Kokkos::ALL, + std::pair(0, n_equations_in_par_chunk)); + } else { + b_par_chunk = Kokkos:: + subview(b_last_buffer, + Kokkos::ALL, + std::pair(0, n_equations_in_par_chunk)); + } + Kokkos::deep_copy( + b_par_chunk, + Kokkos::subview(b_view, Kokkos::ALL, par_chunk_window)); + auto b_vec_batch = to_gko_vec( + b_par_chunk.data(), + m_n, + n_equations_in_par_chunk, + gko_exec); + + solver->apply(b_vec_batch, b_vec_batch); // inplace solve + Kokkos::deep_copy( + Kokkos::subview(b_view, Kokkos::ALL, par_chunk_window), + b_par_chunk); + } + }); + } + return 1; + } +}; + +} // namespace ddc::detail diff --git a/include/ddc/kernels/splines/null_boundary_value.hpp b/include/ddc/kernels/splines/null_boundary_value.hpp new file mode 100644 index 000000000..03cbf1238 --- /dev/null +++ b/include/ddc/kernels/splines/null_boundary_value.hpp @@ -0,0 +1,24 @@ +#pragma once + +#include "spline_boundary_value.hpp" + +namespace ddc { +template +class NullBoundaryValue : public SplineBoundaryValue +{ +public: + NullBoundaryValue() = default; + + ~NullBoundaryValue() override = default; + + double operator()( + ddc::Coordinate, + ddc::ChunkSpan>) const final + { + return 0.0; + } +}; + +template +inline NullBoundaryValue const g_null_boundary; +} // namespace ddc diff --git a/include/ddc/kernels/splines/spline_boundary_conditions.hpp b/include/ddc/kernels/splines/spline_boundary_conditions.hpp new file mode 100644 index 000000000..70c142ea5 --- /dev/null +++ b/include/ddc/kernels/splines/spline_boundary_conditions.hpp @@ -0,0 +1,64 @@ +#pragma once + +#include +#include + +namespace ddc { +enum class BoundCond { + // Periodic boundary condition u(1)=u(n) + PERIODIC, + // Hermite boundary condition + HERMITE, + // Use Greville points instead of conditions on derivative for B-Spline + // interpolation + GREVILLE, + // Natural boundary condition + NATURAL +}; + +static inline std::ostream& operator<<(std::ostream& out, ddc::BoundCond const bc) +{ + switch (bc) { + case ddc::BoundCond::PERIODIC: + return out << "PERIODIC"; + case ddc::BoundCond::HERMITE: + return out << "HERMITE"; + case ddc::BoundCond::GREVILLE: + return out << "GREVILLE"; + case ddc::BoundCond::NATURAL: + return out << "NATURAL"; + default: + throw std::runtime_error("ddc::BoundCond not handled"); + } +} + +constexpr int n_boundary_equations(ddc::BoundCond const bc, std::size_t const degree) +{ + if (bc == ddc::BoundCond::PERIODIC) { + return 0; + } else if (bc == ddc::BoundCond::HERMITE) { + return degree / 2; + } else if (bc == ddc::BoundCond::GREVILLE) { + return 0; + } else if (bc == ddc::BoundCond::NATURAL) { + return degree / 2; + } else { + throw std::runtime_error("ddc::BoundCond not handled"); + } +} + +constexpr int n_user_input(ddc::BoundCond const bc, std::size_t const degree) +{ + if (bc == ddc::BoundCond::PERIODIC) { + return 0; + } else if (bc == ddc::BoundCond::HERMITE) { + return degree / 2; + } else if (bc == ddc::BoundCond::GREVILLE) { + return 0; + } else if (bc == ddc::BoundCond::NATURAL) { + return 0; + } else { + throw std::runtime_error("ddc::BoundCond not handled"); + } +} +} // namespace ddc diff --git a/include/ddc/kernels/splines/spline_boundary_value.hpp b/include/ddc/kernels/splines/spline_boundary_value.hpp new file mode 100644 index 000000000..63bdd03e8 --- /dev/null +++ b/include/ddc/kernels/splines/spline_boundary_value.hpp @@ -0,0 +1,30 @@ +#pragma once +#include + +#include + +namespace ddc { +template +class SplineBoundaryValue +{ +public: + virtual ~SplineBoundaryValue() = default; + + virtual double operator()( + ddc::Coordinate x, + ddc::ChunkSpan>) const = 0; +}; + +template +class SplineBoundaryValue2D +{ +public: + virtual ~SplineBoundaryValue2D() = default; + + virtual double operator()( + ddc::Coordinate x, + ddc::Coordinate y, + ddc::ChunkSpan>) const = 0; +}; + +} // namespace ddc diff --git a/include/ddc/kernels/splines/spline_builder.hpp b/include/ddc/kernels/splines/spline_builder.hpp new file mode 100644 index 000000000..f9363196d --- /dev/null +++ b/include/ddc/kernels/splines/spline_builder.hpp @@ -0,0 +1,611 @@ +#pragma once + +#include +#include +#include +#include +#include +#include + +#include + +#include "math_tools.hpp" +#include "matrix.hpp" +#include "matrix_maker.hpp" +#include "spline_boundary_conditions.hpp" +#include "view.hpp" + +namespace ddc { + +enum class SplineSolver { + GINKGO +}; // Only GINKGO available atm, other solvers will be implemented in the futur + +constexpr bool is_spline_interpolation_mesh_uniform( + bool const is_uniform, + ddc::BoundCond const BcXmin, + ddc::BoundCond const BcXmax, + int degree) +{ + int N_BE_MIN = n_boundary_equations(BcXmin, degree); + int N_BE_MAX = n_boundary_equations(BcXmax, degree); + bool is_periodic = (BcXmin == ddc::BoundCond::PERIODIC) && (BcXmax == ddc::BoundCond::PERIODIC); + return is_uniform && ((N_BE_MIN != 0 && N_BE_MAX != 0) || is_periodic); +} + +template < + class ExecSpace, + class MemorySpace, + class BSplines, + class interpolation_mesh_type, + ddc::BoundCond BcXmin, + ddc::BoundCond BcXmax, + SplineSolver Solver = ddc::SplineSolver::GINKGO> +class SplineBuilder +{ + static_assert( + (BSplines::is_periodic() && (BcXmin == ddc::BoundCond::PERIODIC) + && (BcXmax == ddc::BoundCond::PERIODIC)) + || (!BSplines::is_periodic() && (BcXmin != ddc::BoundCond::PERIODIC) + && (BcXmax != ddc::BoundCond::PERIODIC))); + static_assert(!BSplines::is_radial()); + +private: + using tag_type = typename interpolation_mesh_type::continuous_dimension_type; + +public: + using exec_space = ExecSpace; + + using memory_space = MemorySpace; + + using bsplines_type = BSplines; + + using mesh_type = interpolation_mesh_type; + + using interpolation_domain_type = ddc::DiscreteDomain; + +public: + static constexpr bool s_odd = BSplines::degree() % 2; + + static constexpr int s_nbe_xmin = n_boundary_equations(BcXmin, BSplines::degree()); + + static constexpr int s_nbe_xmax = n_boundary_equations(BcXmax, BSplines::degree()); + + static constexpr int s_nbc_xmin = n_user_input(BcXmin, BSplines::degree()); + + static constexpr int s_nbc_xmax = n_user_input(BcXmax, BSplines::degree()); + + static constexpr ddc::BoundCond s_bc_xmin = BcXmin; + static constexpr ddc::BoundCond s_bc_xmax = BcXmax; + + // interpolator specific + std::unique_ptr matrix; + +private: + const int m_offset; + + interpolation_domain_type m_interpolation_domain; + + double m_dx; // average cell size for normalization of derivatives + +public: + int compute_offset(interpolation_domain_type const& interpolation_domain); + + SplineBuilder( + interpolation_domain_type const& interpolation_domain, + std::optional cols_per_par_chunk = std::nullopt, + std::optional par_chunks_per_seq_chunk = std::nullopt, + std::optional preconditionner_max_block_size = std::nullopt) + : m_interpolation_domain(interpolation_domain) + , m_dx((ddc::discrete_space().rmax() - ddc::discrete_space().rmin()) + / ddc::discrete_space().ncells()) + , matrix(nullptr) + , m_offset(compute_offset(interpolation_domain)) + { + // Calculate block sizes + int lower_block_size, upper_block_size; + if constexpr (bsplines_type::is_uniform()) { + compute_block_sizes_uniform(lower_block_size, upper_block_size); + } else { + compute_block_sizes_non_uniform(lower_block_size, upper_block_size); + } + allocate_matrix( + lower_block_size, + upper_block_size, + cols_per_par_chunk, + par_chunks_per_seq_chunk, + preconditionner_max_block_size); + } + + SplineBuilder(SplineBuilder const& x) = delete; + + SplineBuilder(SplineBuilder&& x) = default; + + ~SplineBuilder() = default; + + SplineBuilder& operator=(SplineBuilder const& x) = delete; + + SplineBuilder& operator=(SplineBuilder&& x) = default; + + template + void operator()( + ddc::ChunkSpan, Layout, MemorySpace> spline, + ddc::ChunkSpan vals, + std::optional const derivs_xmin = std::nullopt, + std::optional const derivs_xmax = std::nullopt) const; + + interpolation_domain_type const& interpolation_domain() const noexcept + { + return m_interpolation_domain; + } + + int offset() const noexcept + { + return m_offset; + } + + ddc::DiscreteDomain spline_domain() const noexcept + { + return ddc::discrete_space().full_domain(); + } + + template + void compute_interpolant_degree1( // Seems to need to be public for GPU compiling + ddc::ChunkSpan, Layout, MemorySpace> spline, + ddc::ChunkSpan vals) const; + + +private: + void compute_block_sizes_uniform(int& lower_block_size, int& upper_block_size) const; + + void compute_block_sizes_non_uniform(int& lower_block_size, int& upper_block_size) const; + + void allocate_matrix( + int lower_block_size, + int upper_block_size, + std::optional cols_per_par_chunk = std::nullopt, + std::optional par_chunks_per_seq_chunk = std::nullopt, + std::optional preconditionner_max_block_size = std::nullopt); + + void build_matrix_system(); +}; + +template < + class ExecSpace, + class MemorySpace, + class BSplines, + class interpolation_mesh_type, + ddc::BoundCond BcXmin, + ddc::BoundCond BcXmax, + SplineSolver Solver> +int SplineBuilder< + ExecSpace, + MemorySpace, + BSplines, + interpolation_mesh_type, + BcXmin, + BcXmax, + Solver>::compute_offset(interpolation_domain_type const& interpolation_domain) +{ + int offset; + if constexpr (bsplines_type::is_periodic()) { + // Calculate offset so that the matrix is diagonally dominant + std::array values; + ddc::DiscreteElement start(interpolation_domain.front()); + auto jmin = ddc::discrete_space() + .eval_basis(values, ddc::coordinate(start + BSplines::degree())); + if constexpr (bsplines_type::degree() % 2 == 0) { + offset = jmin.uid() - start.uid() + bsplines_type::degree() / 2 - BSplines::degree(); + } else { + int const mid = bsplines_type::degree() / 2; + offset = jmin.uid() - start.uid() + (values[mid] > values[mid + 1] ? mid : mid + 1) + - BSplines::degree(); + } + } else { + offset = 0; + } + return offset; +} + +//------------------------------------------------------------------------------------------------- +/************************************************************************************ + * Compute interpolant functions * + ************************************************************************************/ + +template < + class ExecSpace, + class MemorySpace, + class BSplines, + class interpolation_mesh_type, + ddc::BoundCond BcXmin, + ddc::BoundCond BcXmax, + SplineSolver Solver> +template +void SplineBuilder< + ExecSpace, + MemorySpace, + BSplines, + interpolation_mesh_type, + BcXmin, + BcXmax, + Solver>:: + compute_interpolant_degree1( + ddc::ChunkSpan, Layout, MemorySpace> + spline, + ddc::ChunkSpan vals) const +{ + auto const& nbasis_proxy = ddc::discrete_space().nbasis(); + Kokkos::parallel_for( + Kokkos::RangePolicy(0, 1), + KOKKOS_LAMBDA(const int unused_index) { + for (std::size_t i = 0; i < nbasis_proxy; ++i) { + spline(ddc::DiscreteElement(i)) + = vals(ddc::DiscreteElement(i)); + } + }); + if constexpr (bsplines_type::is_periodic()) { + Kokkos::parallel_for( + Kokkos::RangePolicy(0, 1), + KOKKOS_LAMBDA(const int unused_index) { + spline(ddc::DiscreteElement(nbasis_proxy)) + = spline(ddc::DiscreteElement(0)); + }); + } +} + +//------------------------------------------------------------------------------------------------- + +template < + class ExecSpace, + class MemorySpace, + class BSplines, + class interpolation_mesh_type, + ddc::BoundCond BcXmin, + ddc::BoundCond BcXmax, + SplineSolver Solver> +template +void SplineBuilder< + ExecSpace, + MemorySpace, + BSplines, + interpolation_mesh_type, + BcXmin, + BcXmax, + Solver>:: +operator()( + ddc::ChunkSpan, Layout, MemorySpace> spline, + ddc::ChunkSpan vals, + std::optional const derivs_xmin, + std::optional const derivs_xmax) const +{ + assert(vals.template extent() + == ddc::discrete_space().nbasis() - s_nbe_xmin - s_nbe_xmax); + // assert(spline.belongs_to_space(ddc::discrete_space())); + // TODO: LOG Errors + if constexpr (bsplines_type::degree() == 1) + return compute_interpolant_degree1(spline, vals); + + assert((BcXmin == ddc::BoundCond::HERMITE) + != (!derivs_xmin.has_value() || derivs_xmin->extent(0) == 0)); + assert((BcXmax == ddc::BoundCond::HERMITE) + != (!derivs_xmax.has_value() || derivs_xmax->extent(0) == 0)); + + // Hermite boundary conditions at xmin, if any + // NOTE: For consistency with the linear system, the i-th derivative + // provided by the user must be multiplied by dx^i + if constexpr (BcXmin == ddc::BoundCond::HERMITE) { + assert(derivs_xmin->extent(0) == s_nbc_xmin); + for (int i = s_nbc_xmin; i > 0; --i) { + spline(ddc::DiscreteElement(s_nbc_xmin - i)) + = (*derivs_xmin)(i - 1) * ddc::detail::ipow(m_dx, i + s_odd - 1); + } + } + auto const& offset_proxy = offset(); + auto const& interp_size_proxy = interpolation_domain().extents(); + auto const& nbasis_proxy = ddc::discrete_space().nbasis(); + Kokkos::parallel_for( + Kokkos::RangePolicy(0, 1), + KOKKOS_LAMBDA(const int unused_index) { + for (int i = s_nbc_xmin; i < s_nbc_xmin + offset_proxy; ++i) { + spline(ddc::DiscreteElement(i)) = 0.0; + } + for (int i = 0; i < interp_size_proxy; ++i) { + spline(ddc::DiscreteElement(s_nbc_xmin + i + offset_proxy)) + = vals(ddc::DiscreteElement(i)); + } + }); + + // Hermite boundary conditions at xmax, if any + // NOTE: For consistency with the linear system, the i-th derivative + // provided by the user must be multiplied by dx^i + if constexpr (BcXmax == ddc::BoundCond::HERMITE) { + assert(derivs_xmax->extent(0) == s_nbc_xmax); + for (int i = 0; i < s_nbc_xmax; ++i) { + spline(ddc::DiscreteElement( + ddc::discrete_space().nbasis() - s_nbc_xmax + i)) + = (*derivs_xmax)(i)*ddc::detail::ipow(m_dx, i + s_odd); + } + } + + Kokkos::View bcoef_section( + spline.data_handle() + m_offset, + ddc::discrete_space().nbasis()); + matrix->solve_batch_inplace(bcoef_section); + + if constexpr (bsplines_type::is_periodic()) { + Kokkos::parallel_for( + Kokkos::RangePolicy(0, 1), + KOKKOS_LAMBDA(const int unused_index) { + if (offset_proxy != 0) { + for (int i = 0; i < offset_proxy; ++i) { + spline(ddc::DiscreteElement(i)) + = spline(ddc::DiscreteElement(nbasis_proxy + i)); + } + for (std::size_t i = offset_proxy; i < bsplines_type::degree(); ++i) { + spline(ddc::DiscreteElement(nbasis_proxy + i)) + = spline(ddc::DiscreteElement(i)); + } + } + }); + } +} + +//------------------------------------------------------------------------------------------------- +/************************************************************************************ + * Compute num diags functions * + ************************************************************************************/ + +template < + class ExecSpace, + class MemorySpace, + class BSplines, + class interpolation_mesh_type, + ddc::BoundCond BcXmin, + ddc::BoundCond BcXmax, + SplineSolver Solver> +void SplineBuilder< + ExecSpace, + MemorySpace, + BSplines, + interpolation_mesh_type, + BcXmin, + BcXmax, + Solver>::compute_block_sizes_uniform(int& lower_block_size, int& upper_block_size) const +{ + switch (BcXmin) { + case ddc::BoundCond::PERIODIC: + upper_block_size = (bsplines_type::degree()) / 2; + break; + case ddc::BoundCond::NATURAL: + case ddc::BoundCond::HERMITE: + upper_block_size = s_nbc_xmin; + break; + case ddc::BoundCond::GREVILLE: + upper_block_size = bsplines_type::degree() - 1; + break; + default: + throw std::runtime_error("ddc::BoundCond not handled"); + } + switch (BcXmax) { + case ddc::BoundCond::PERIODIC: + lower_block_size = (bsplines_type::degree()) / 2; + break; + case ddc::BoundCond::NATURAL: + case ddc::BoundCond::HERMITE: + lower_block_size = s_nbc_xmax; + break; + case ddc::BoundCond::GREVILLE: + lower_block_size = bsplines_type::degree() - 1; + break; + default: + throw std::runtime_error("ddc::BoundCond not handled"); + } +} + +//------------------------------------------------------------------------------------------------- + +template < + class ExecSpace, + class MemorySpace, + class BSplines, + class interpolation_mesh_type, + ddc::BoundCond BcXmin, + ddc::BoundCond BcXmax, + SplineSolver Solver> +void SplineBuilder< + ExecSpace, + MemorySpace, + BSplines, + interpolation_mesh_type, + BcXmin, + BcXmax, + Solver>::compute_block_sizes_non_uniform(int& lower_block_size, int& upper_block_size) const +{ + switch (BcXmin) { + case ddc::BoundCond::PERIODIC: + upper_block_size = bsplines_type::degree() - 1; + break; + case ddc::BoundCond::HERMITE: + upper_block_size = s_nbc_xmin + 1; + break; + case ddc::BoundCond::GREVILLE: + upper_block_size = bsplines_type::degree() - 1; + break; + default: + throw std::runtime_error("ddc::BoundCond not handled"); + } + switch (BcXmax) { + case ddc::BoundCond::PERIODIC: + lower_block_size = bsplines_type::degree() - 1; + break; + case ddc::BoundCond::HERMITE: + lower_block_size = s_nbc_xmax + 1; + break; + case ddc::BoundCond::GREVILLE: + lower_block_size = bsplines_type::degree() - 1; + break; + default: + throw std::runtime_error("ddc::BoundCond not handled"); + } +} + +//------------------------------------------------------------------------------------------------- +/************************************************************************************ + * Initialize matrix functions * + ************************************************************************************/ + +template < + class ExecSpace, + class MemorySpace, + class BSplines, + class interpolation_mesh_type, + ddc::BoundCond BcXmin, + ddc::BoundCond BcXmax, + SplineSolver Solver> +void SplineBuilder< + ExecSpace, + MemorySpace, + BSplines, + interpolation_mesh_type, + BcXmin, + BcXmax, + Solver>:: + allocate_matrix( + int lower_block_size, + int upper_block_size, + std::optional cols_per_par_chunk, + std::optional par_chunks_per_seq_chunk, + std::optional preconditionner_max_block_size) +{ + // Special case: linear spline + // No need for matrix assembly + if constexpr (bsplines_type::degree() == 1) + return; + + int upper_band_width; + if (bsplines_type::is_uniform()) { + upper_band_width = bsplines_type::degree() / 2; + } else { + upper_band_width = bsplines_type::degree() - 1; + } + + if constexpr (bsplines_type::is_periodic()) { + if (Solver == SplineSolver::GINKGO) { + matrix = ddc::detail::MatrixMaker::make_new_sparse( + ddc::discrete_space().nbasis(), + cols_per_par_chunk, + par_chunks_per_seq_chunk, + preconditionner_max_block_size); + } + } else { + if (Solver == SplineSolver::GINKGO) { + matrix = ddc::detail::MatrixMaker::make_new_sparse( + ddc::discrete_space().nbasis(), + cols_per_par_chunk, + par_chunks_per_seq_chunk, + preconditionner_max_block_size); + } + } + + build_matrix_system(); + + matrix->factorize(); +} + +//------------------------------------------------------------------------------------------------- + +template < + class ExecSpace, + class MemorySpace, + class BSplines, + class interpolation_mesh_type, + ddc::BoundCond BcXmin, + ddc::BoundCond BcXmax, + SplineSolver Solver> +void SplineBuilder< + ExecSpace, + MemorySpace, + BSplines, + interpolation_mesh_type, + BcXmin, + BcXmax, + Solver>::build_matrix_system() +{ + // Hermite boundary conditions at xmin, if any + if constexpr (BcXmin == ddc::BoundCond::HERMITE) { + std::array + derivs_ptr; + ddc::DSpan2D + derivs(derivs_ptr.data(), + bsplines_type::degree() + 1, + bsplines_type::degree() / 2 + 1); + ddc::discrete_space().eval_basis_and_n_derivs( + derivs, + ddc::discrete_space().rmin(), + s_nbc_xmin); + + // In order to improve the condition number of the matrix, we normalize + // all derivatives by multiplying the i-th derivative by dx^i + for (std::size_t i = 0; i < bsplines_type::degree() + 1; ++i) { + for (std::size_t j = 1; j < bsplines_type::degree() / 2 + 1; ++j) { + derivs(i, j) *= ddc::detail::ipow(m_dx, j); + } + } + + // iterate only to deg as last bspline is 0 + for (std::size_t i = 0; i < s_nbc_xmin; ++i) { + for (std::size_t j = 0; j < bsplines_type::degree(); ++j) { + matrix->set_element(i, j, derivs(j, s_nbc_xmin - i - 1 + s_odd)); + } + } + } + + // Interpolation points + std::array values; + int start = m_interpolation_domain.front().uid(); + ddc::for_each(m_interpolation_domain, [&](auto ix) { + auto jmin = ddc::discrete_space().eval_basis( + values, + ddc::coordinate(ddc::DiscreteElement(ix))); + for (std::size_t s = 0; s < bsplines_type::degree() + 1; ++s) { + int const j = ddc::detail:: + modulo(int(jmin.uid() - m_offset + s), + (int)ddc::discrete_space().nbasis()); + matrix->set_element(ix.uid() - start + s_nbc_xmin, j, values[s]); + } + }); + + // Hermite boundary conditions at xmax, if any + if constexpr (BcXmax == ddc::BoundCond::HERMITE) { + std::array + derivs_ptr; + std::experimental::mdspan< + double, + std::experimental::extents< + std::size_t, + bsplines_type::degree() + 1, + bsplines_type::degree() / 2 + 1>> const derivs(derivs_ptr.data()); + + ddc::discrete_space().eval_basis_and_n_derivs( + derivs, + ddc::discrete_space().rmax(), + s_nbc_xmax); + + // In order to improve the condition number of the matrix, we normalize + // all derivatives by multiplying the i-th derivative by dx^i + for (std::size_t i = 0; i < bsplines_type::degree() + 1; ++i) { + for (std::size_t j = 1; j < bsplines_type::degree() / 2 + 1; ++j) { + derivs(i, j) *= ddc::detail::ipow(m_dx, j); + } + } + + int const i0 = ddc::discrete_space().nbasis() - s_nbc_xmax; + int const j0 = ddc::discrete_space().nbasis() - bsplines_type::degree(); + for (std::size_t j = 0; j < bsplines_type::degree(); ++j) { + for (std::size_t i = 0; i < s_nbc_xmax; ++i) { + matrix->set_element(i0 + i, j0 + j, derivs(j + 1, i + s_odd)); + } + } + } +} +} // namespace ddc diff --git a/include/ddc/kernels/splines/spline_builder_batched.hpp b/include/ddc/kernels/splines/spline_builder_batched.hpp new file mode 100644 index 000000000..f581dc80c --- /dev/null +++ b/include/ddc/kernels/splines/spline_builder_batched.hpp @@ -0,0 +1,217 @@ +#pragma once +#include "ddc/chunk_span.hpp" +#include "ddc/deepcopy.hpp" +#include "ddc/discrete_domain.hpp" +#include "ddc/kokkos_allocator.hpp" + +#include "spline_builder.hpp" + +namespace ddc { +template +class SplineBuilderBatched +{ +private: + using tag_type = typename SplineBuilder::bsplines_type::tag_type; + +public: + using exec_space = typename SplineBuilder::exec_space; + + using memory_space = typename SplineBuilder::memory_space; + + using bsplines_type = typename SplineBuilder::bsplines_type; + + using builder_type = SplineBuilder; + + using interpolation_mesh_type = typename SplineBuilder::mesh_type; + + using interpolation_domain_type = ddc::DiscreteDomain; + + using vals_domain_type = ddc::DiscreteDomain; + + using batch_domain_type = + typename ddc::detail::convert_type_seq_to_discrete_domain, + ddc::detail::TypeSeq>>; + + template + using spline_dim_type + = std::conditional_t, bsplines_type, Tag>; + + using spline_domain_type = + typename ddc::detail::convert_type_seq_to_discrete_domain, + ddc::detail::TypeSeq, + ddc::detail::TypeSeq>>; + + using spline_tr_domain_type = + typename ddc::detail::convert_type_seq_to_discrete_domain, + ddc::type_seq_remove_t< + ddc::detail::TypeSeq, + ddc::detail::TypeSeq>>>; + + static constexpr ddc::BoundCond BcXmin = SplineBuilder::s_bc_xmin; + static constexpr ddc::BoundCond BcXmax = SplineBuilder::s_bc_xmax; + +private: + builder_type spline_builder; + const vals_domain_type m_vals_domain; + +public: + SplineBuilderBatched( + vals_domain_type const& vals_domain, + std::optional cols_per_par_chunk = std::nullopt, + std::optional par_chunks_per_seq_chunk = std::nullopt, + std::optional preconditionner_max_block_size = std::nullopt) + : spline_builder( + ddc::select(vals_domain), + cols_per_par_chunk, + par_chunks_per_seq_chunk, + preconditionner_max_block_size) + , m_vals_domain(vals_domain) + { + static_assert( + BcXmin == BoundCond::PERIODIC && BcXmax == BoundCond::PERIODIC, + "Boundary conditions other than PERIODIC are not supported yet in " + "SpSplineBuilderBatched"); + }; + + SplineBuilderBatched(SplineBuilderBatched const& x) = delete; + + SplineBuilderBatched(SplineBuilderBatched&& x) = default; + + ~SplineBuilderBatched() = default; + + SplineBuilderBatched& operator=(SplineBuilderBatched const& x) = delete; + + SplineBuilderBatched& operator=(SplineBuilderBatched&& x) = default; + + vals_domain_type const vals_domain() const noexcept + { + return m_vals_domain; + } + + interpolation_domain_type const interpolation_domain() const noexcept + { + return spline_builder.interpolation_domain(); + } + + batch_domain_type const batch_domain() const noexcept + { + return ddc::remove_dims_of(vals_domain(), interpolation_domain()); + } + + ddc::DiscreteDomain const bsplines_domain() const noexcept // TODO : clarify name + { + return ddc::discrete_space().full_domain(); + } + + spline_domain_type const spline_domain() const noexcept + { + return ddc::replace_dim_of< + interpolation_mesh_type, + bsplines_type>(vals_domain(), bsplines_domain()); + } + + spline_tr_domain_type const spline_tr_domain() const noexcept + { + return spline_tr_domain_type(bsplines_domain(), batch_domain()); + } + + int offset() const noexcept + { + return spline_builder.offset(); + } + + template + void operator()( + ddc::ChunkSpan spline, + ddc::ChunkSpan vals) const; +}; + +template +template +void SplineBuilderBatched::operator()( + ddc::ChunkSpan spline, + ddc::ChunkSpan vals) const +{ + const std::size_t nbc_xmin = spline_builder.s_nbc_xmin; + const std::size_t nbc_xmax = spline_builder.s_nbc_xmax; + using IMesh = ddc::DiscreteElement; + + // TODO : Consider optimizing + // Fill spline with vals (to work in spline afterward and preserve vals) + auto const& offset_proxy = spline_builder.offset(); + auto const& interp_size_proxy = interpolation_domain().extents(); + auto const& nbasis_proxy = ddc::discrete_space().nbasis(); + ddc::for_each( + ddc::policies::policy(exec_space()), + batch_domain(), + DDC_LAMBDA(typename batch_domain_type::discrete_element_type j) { + for (int i = nbc_xmin; i < nbc_xmin + offset_proxy; ++i) { + spline(ddc::DiscreteElement(i), j) = 0.0; + } + for (int i = 0; i < interp_size_proxy; ++i) { + spline(ddc::DiscreteElement(nbc_xmin + i + offset_proxy), j) + = vals(ddc::DiscreteElement(i), j); + } + }); + + // TODO : Consider optimizing + // Allocate and fill a transposed version of spline in order to get dimension of interest as last dimension (optimal for GPU, necessary for Ginkgo) + ddc::Chunk spline_tr_alloc(spline_tr_domain(), ddc::KokkosAllocator()); + ddc::ChunkSpan spline_tr = spline_tr_alloc.span_view(); + ddc::for_each( + ddc::policies::policy(exec_space()), + batch_domain(), + DDC_LAMBDA(typename batch_domain_type::discrete_element_type const j) { + for (int i = 0; i < nbasis_proxy; i++) { + spline_tr(ddc::DiscreteElement(i), j) + = spline(ddc::DiscreteElement(i + offset_proxy), j); + } + }); + // Create a 2D Kokkos::View to manage spline_tr as a matrix + Kokkos::View bcoef_section( + spline_tr.data_handle(), + ddc::discrete_space().nbasis(), + batch_domain().size()); + // Compute spline coef + spline_builder.matrix->solve_batch_inplace(bcoef_section); + // Transpose back spline_tr in spline + ddc::for_each( + ddc::policies::policy(exec_space()), + batch_domain(), + DDC_LAMBDA(typename batch_domain_type::discrete_element_type const j) { + for (int i = 0; i < nbasis_proxy; i++) { + spline(ddc::DiscreteElement(i + offset_proxy), j) + = spline_tr(ddc::DiscreteElement(i), j); + } + }); + + // Not sure yet of what this part do + if (bsplines_type::is_periodic()) { + ddc::for_each( + ddc::policies::policy(exec_space()), + batch_domain(), + DDC_LAMBDA(typename batch_domain_type::discrete_element_type const j) { + if (offset_proxy != 0) { + for (int i = 0; i < offset_proxy; ++i) { + spline(ddc::DiscreteElement(i), j) = spline( + ddc::DiscreteElement(nbasis_proxy + i), + j); + } + for (std::size_t i = offset_proxy; i < bsplines_type::degree(); ++i) { + spline(ddc::DiscreteElement(nbasis_proxy + i), j) + = spline(ddc::DiscreteElement(i), j); + } + } + for (std::size_t i(0); i < bsplines_type::degree(); ++i) { + const ddc::DiscreteElement i_start(i); + const ddc::DiscreteElement i_end(nbasis_proxy + i); + + spline(i_end, j) = spline(i_start, j); + } + }); + } +} +} // namespace ddc diff --git a/include/ddc/kernels/splines/spline_evaluator.hpp b/include/ddc/kernels/splines/spline_evaluator.hpp new file mode 100644 index 000000000..882fc9c4b --- /dev/null +++ b/include/ddc/kernels/splines/spline_evaluator.hpp @@ -0,0 +1,210 @@ +#pragma once + +#include +#include + +#include + +#include "spline_boundary_value.hpp" +#include "view.hpp" + +namespace ddc { +template +class SplineEvaluator +{ +private: + // Tags to determine what to evaluate + struct eval_type + { + }; + + struct eval_deriv_type + { + }; + +public: + using exec_space = ExecSpace; + + using memory_space = MemorySpace; + + using bsplines_type = BSplinesType; + + using tag_type = typename BSplinesType::tag_type; + + using mesh_type = interpolation_mesh_type; + + using interpolation_domain_type = ddc::DiscreteDomain; // Use ? + +private: + SplineBoundaryValue const& m_left_bc; + + SplineBoundaryValue const& m_right_bc; + +public: + SplineEvaluator() = delete; + + explicit SplineEvaluator( + SplineBoundaryValue const& left_bc, + SplineBoundaryValue const& right_bc) + : m_left_bc(left_bc) + , m_right_bc(right_bc) + { + } + + SplineEvaluator(SplineEvaluator const& x) = default; + + SplineEvaluator(SplineEvaluator&& x) = default; + + ~SplineEvaluator() = default; + + SplineEvaluator& operator=(SplineEvaluator const& x) = default; + + SplineEvaluator& operator=(SplineEvaluator&& x) = default; + + /* + SplineBoundaryValue left_bc() const noexcept + { + return m_left_bc; + } + + SplineBoundaryValue right_bc() const noexcept + { + return m_right_bc; + } + */ + + double operator()( + ddc::Coordinate const& coord_eval, + ddc::ChunkSpan> const spline_coef) const + { + std::array values; + ddc::DSpan1D const vals = as_span(values); + + return eval(coord_eval, spline_coef, vals); + } + + template + void operator()( + ddc::ChunkSpan const spline_eval, + ddc::ChunkSpan, Domain, Layout2, MemorySpace> const + coords_eval, + ddc::ChunkSpan< + double const, + ddc::DiscreteDomain, + Layout3, + MemorySpace> const spline_coef) const + { + for (auto i : coords_eval.domain()) { + spline_eval(i) = eval(coords_eval(i), spline_coef); + } + } + + template + double deriv( + ddc::Coordinate const& coord_eval, + ddc::ChunkSpan< + double const, + ddc::DiscreteDomain, + Layout, + MemorySpace> const spline_coef) const + { + std::array values; + ddc::DSpan1D const vals = as_span(values); + + return eval_no_bc(coord_eval, spline_coef, vals, eval_deriv_type()); + } + + template + void deriv( + ddc::ChunkSpan const spline_eval, + ddc::ChunkSpan, Domain, Layout2, MemorySpace> const + coords_eval, + ddc::ChunkSpan< + double const, + ddc::DiscreteDomain, + Layout3, + MemorySpace> const spline_coef) const + { + for (auto i : coords_eval.domain()) { + spline_eval(i) = eval_no_bc(coords_eval(i), spline_coef, eval_deriv_type()); + } + } + + template + double integrate(ddc::ChunkSpan< + double const, + ddc::DiscreteDomain, + Layout, + MemorySpace> const spline_coef) const + { + ddc::Chunk values(spline_coef.domain(), ddc::KokkosAllocator()); + + ddc::discrete_space().integrals(values.span_view()); + + return ddc::transform_reduce( + spline_coef.domain(), + 0.0, + ddc::reducer::sum(), + [&](ddc::DiscreteElement const ibspl) { + return spline_coef(ibspl) * values(ibspl); + }); + } + +private: + template + double eval( + ddc::Coordinate coord_eval, + ddc::ChunkSpan< + double const, + ddc::DiscreteDomain, + Layout, + MemorySpace> const spline_coef) const + { + if constexpr (bsplines_type::is_periodic()) { + if (coord_eval < ddc::discrete_space().rmin() + || coord_eval > ddc::discrete_space().rmax()) { + coord_eval -= std::floor( + (coord_eval - ddc::discrete_space().rmin()) + / ddc::discrete_space().length()) + * ddc::discrete_space().length(); + } + } else { + if (coord_eval < ddc::discrete_space().rmin()) { + return m_left_bc(coord_eval, spline_coef); + } + if (coord_eval > ddc::discrete_space().rmax()) { + return m_right_bc(coord_eval, spline_coef); + } + } + return eval_no_bc(coord_eval, spline_coef, eval_type()); + } + + template + double eval_no_bc( + ddc::Coordinate const& coord_eval, + ddc::ChunkSpan< + double const, + ddc::DiscreteDomain, + Layout, + MemorySpace> const spline_coef, + EvalType const) const + { + static_assert( + std::is_same_v || std::is_same_v); + ddc::DiscreteElement jmin; + + std::array vals; + if constexpr (std::is_same_v) { + jmin = ddc::discrete_space().eval_basis(vals, coord_eval); + } else if constexpr (std::is_same_v) { + jmin = ddc::discrete_space().eval_deriv(vals, coord_eval); + } + + double y = 0.0; + for (std::size_t i = 0; i < bsplines_type::degree() + 1; ++i) { + y += spline_coef(jmin + i) * vals[i]; + } + return y; + } +}; +} // namespace ddc diff --git a/include/ddc/kernels/splines/spline_evaluator_batched.hpp b/include/ddc/kernels/splines/spline_evaluator_batched.hpp new file mode 100644 index 000000000..83de0afd9 --- /dev/null +++ b/include/ddc/kernels/splines/spline_evaluator_batched.hpp @@ -0,0 +1,253 @@ +#pragma once + +#include + +#include + +#include "Kokkos_Macros.hpp" +#include "spline_boundary_value.hpp" +#include "spline_evaluator.hpp" +#include "view.hpp" + +namespace ddc { + +template +class SplineEvaluatorBatched +{ +private: + // Tags to determine what to evaluate + struct eval_type + { + }; + + struct eval_deriv_type + { + }; + + using tag_type = typename SplineEvaluator::tag_type; + +public: + using exec_space = typename SplineEvaluator::exec_space; + + using memory_space = typename SplineEvaluator::memory_space; + + using bsplines_type = typename SplineEvaluator::bsplines_type; + + using evaluator_type = SplineEvaluator; + + using interpolation_mesh_type = typename SplineEvaluator::mesh_type; + + using interpolation_domain_type = ddc::DiscreteDomain; + + using vals_domain_type = ddc::DiscreteDomain; + + using bsplines_domain_type = ddc::DiscreteDomain; + + using batch_domain_type = + typename ddc::detail::convert_type_seq_to_discrete_domain, + ddc::detail::TypeSeq>>; + + template + using spline_dim_type + = std::conditional_t, bsplines_type, Tag>; + + using spline_domain_type = + typename ddc::detail::convert_type_seq_to_discrete_domain, + ddc::detail::TypeSeq, + ddc::detail::TypeSeq>>; + + +private: + SplineEvaluator spline_evaluator; + const spline_domain_type m_spline_domain; // Necessary ? + + +public: + SplineEvaluatorBatched() = delete; + + explicit SplineEvaluatorBatched( + spline_domain_type const& spline_domain, + SplineBoundaryValue const& left_bc, + SplineBoundaryValue const& right_bc) + : spline_evaluator(left_bc, right_bc) + , m_spline_domain(spline_domain) // Necessary ? + { + } + + SplineEvaluatorBatched(SplineEvaluatorBatched const& x) = default; + + SplineEvaluatorBatched(SplineEvaluatorBatched&& x) = default; + + ~SplineEvaluatorBatched() = default; + + SplineEvaluatorBatched& operator=(SplineEvaluatorBatched const& x) = default; + + SplineEvaluatorBatched& operator=(SplineEvaluatorBatched&& x) = default; + + + + KOKKOS_FUNCTION spline_domain_type spline_domain() const noexcept + { + return m_spline_domain; + } + + KOKKOS_FUNCTION bsplines_domain_type bsplines_domain() const noexcept // TODO : clarify name + { + return ddc::discrete_space().full_domain(); + } + + KOKKOS_FUNCTION batch_domain_type batch_domain() const noexcept + { + return ddc::remove_dims_of(spline_domain(), bsplines_domain()); + } + + template + double operator()( + ddc::Coordinate const& coord_eval, + ddc::ChunkSpan const + spline_coef) const + { + return eval(coord_eval, spline_coef); + } + + template + void operator()( + ddc::ChunkSpan const spline_eval, + ddc::ChunkSpan< + ddc::Coordinate const, + vals_domain_type, + Layout2, + memory_space> const coords_eval, + ddc::ChunkSpan const + spline_coef) const + { + interpolation_domain_type const interpolation_domain(spline_eval.domain()); + ddc::for_each( + ddc::policies::policy(exec_space()), + batch_domain(), + KOKKOS_CLASS_LAMBDA(typename batch_domain_type::discrete_element_type const j) { + const auto spline_eval_1D = spline_eval[j]; + const auto coords_eval_1D = coords_eval[j]; + const auto spline_coef_1D = spline_coef[j]; + for (auto const i : interpolation_domain) { + spline_eval_1D(i) = eval(coords_eval_1D(i), spline_coef_1D); + } + }); + } + + template + double deriv( + ddc::Coordinate const& coord_eval, + ddc::ChunkSpan const + spline_coef) const + { + return eval_no_bc(coord_eval, spline_coef); + } + + template + void deriv( + ddc::ChunkSpan const spline_eval, + ddc::ChunkSpan< + ddc::Coordinate const, + vals_domain_type, + Layout2, + memory_space> const coords_eval, + ddc::ChunkSpan const + spline_coef) const + { + interpolation_domain_type const interpolation_domain(spline_eval.domain()); + ddc::for_each( + ddc::policies::policy(exec_space()), + batch_domain(), + KOKKOS_CLASS_LAMBDA(typename batch_domain_type::discrete_element_type const j) { + const auto spline_eval_1D = spline_eval[j]; + const auto coords_eval_1D = coords_eval[j]; + const auto spline_coef_1D = spline_coef[j]; + for (auto const i : interpolation_domain) { + spline_eval_1D(i) + = eval_no_bc(coords_eval_1D(i), spline_coef_1D); + } + }); + } + + template + void integrate( + ddc::ChunkSpan const integrals, + ddc::ChunkSpan const + spline_coef) const + { + ddc::Chunk values_alloc( + ddc::DiscreteDomain(spline_coef.domain()), + ddc::KokkosAllocator()); + ddc::ChunkSpan values = values_alloc.span_view(); + Kokkos::parallel_for( + Kokkos::RangePolicy(0, 1), + KOKKOS_LAMBDA(const int unused_index) { + ddc::discrete_space().integrals(values); + }); + + ddc::for_each( + ddc::policies::policy(exec_space()), + batch_domain(), + DDC_LAMBDA(typename batch_domain_type::discrete_element_type const j) { + integrals(j) = 0; + for (typename bsplines_domain_type::discrete_element_type const i : + values.domain()) { + integrals(j) += spline_coef(i, j) * values(i); + } + }); + } + +private: + template + KOKKOS_INLINE_FUNCTION double eval( + ddc::Coordinate const& coord_eval, + ddc::ChunkSpan const + spline_coef) const + { + ddc::Coordinate + coord_eval_interpolation + = ddc::select( + coord_eval); + if constexpr (bsplines_type::is_periodic()) { + if (coord_eval_interpolation < ddc::discrete_space().rmin() + || coord_eval_interpolation > ddc::discrete_space().rmax()) { + coord_eval_interpolation -= Kokkos::floor( + (coord_eval_interpolation + - ddc::discrete_space().rmin()) + / ddc::discrete_space().length()) + * ddc::discrete_space().length(); + } + } + return eval_no_bc(coord_eval_interpolation, spline_coef); + } + + template + KOKKOS_INLINE_FUNCTION double eval_no_bc( + ddc::Coordinate const& coord_eval, + ddc::ChunkSpan const + spline_coef) const + { + static_assert( + std::is_same_v || std::is_same_v); + ddc::DiscreteElement jmin; + std::array vals; + ddc::Coordinate + coord_eval_interpolation + = ddc::select( + coord_eval); + if constexpr (std::is_same_v) { + jmin = ddc::discrete_space().eval_basis(vals, coord_eval_interpolation); + } else if constexpr (std::is_same_v) { + jmin = ddc::discrete_space().eval_deriv(vals, coord_eval_interpolation); + } + double y = 0.0; + for (std::size_t i = 0; i < bsplines_type::degree() + 1; ++i) { + y += spline_coef(ddc::DiscreteElement(jmin + i)) * vals[i]; + } + return y; + } +}; +} // namespace ddc diff --git a/include/ddc/kernels/splines/view.hpp b/include/ddc/kernels/splines/view.hpp new file mode 100644 index 000000000..5ad3fc4ef --- /dev/null +++ b/include/ddc/kernels/splines/view.hpp @@ -0,0 +1,146 @@ +#pragma once + +#include +#include +#include + +#include + +namespace ddc::detail { + +template +struct ViewNDMaker; + +template +struct ViewNDMaker +{ + using type = std::experimental::mdspan< + ElementType, + std::experimental::dextents, + std::experimental::layout_right>; +}; + +template +struct ViewNDMaker +{ + using type = std::experimental::mdspan< + ElementType, + std::experimental::dextents, + std::experimental::layout_stride>; +}; + +/// Note: We use the comma operator to fill the input parameters +/// +/// If Is=[1, 2], `subspan(s, i0, (Is, all)...)` will be expanded as +/// `subspan(s, i0, (1, all), (2, all))` which is equivalent to +/// `subspan(s, i0, all, all)` +template < + class ElementType, + class Extents, + class Layout, + class Accessor, + std::size_t I0, + std::size_t... Is> +std::ostream& stream_impl( + std::ostream& os, + std::experimental::mdspan const& s, + std::index_sequence) +{ + if constexpr (sizeof...(Is) > 0) { + os << '['; + for (std::size_t i0 = 0; i0 < s.extent(I0); ++i0) { + stream_impl( + os, + std::experimental:: + submdspan(s, i0, ((void)Is, std::experimental::full_extent)...), + std::make_index_sequence()); + } + os << ']'; + } else { + os << '['; + for (std::size_t i0 = 0; i0 < s.extent(I0) - 1; ++i0) { + os << s(i0) << ','; + } + os << s(s.extent(I0) - 1) << ']'; + } + return os; +} + + +// template +// struct IsContiguous; +// template <> +// struct IsContiguous +// { +// static constexpr bool val = false; +// }; +// template <> +// struct IsContiguous +// { +// static constexpr bool val = true; +// }; +// template +// struct IsContiguous> +// { +// static constexpr bool val = IsContiguous::val; +// }; + +} // namespace ddc::detail + + +namespace ddc { +template +using SpanND = std::experimental::mdspan>; + +template +using ViewND = SpanND; + +template +using Span1D = SpanND<1, ElementType>; + +template +using Span2D = SpanND<2, ElementType>; + +template +using View1D = ViewND<1, ElementType>; + +template +using View2D = ViewND<2, ElementType>; + +template +Span1D as_span(std::array& arr) noexcept +{ + return Span1D(arr.data(), N); +} + +template +Span1D as_span(std::array const& arr) noexcept +{ + return Span1D(arr.data(), N); +} + +using DSpan1D = ddc::Span1D; + +using DSpan2D = ddc::Span2D; + +using CDSpan1D = ddc::Span1D; + +using CDSpan2D = ddc::Span2D; + +using DView1D = View1D; + +using DView2D = View2D; + +// template +// constexpr bool is_contiguous_v = detail::IsContiguous::val; + +/// Convenient function to dump a mdspan, it recursively prints all dimensions. +/// Disclaimer: use with caution for large arrays +template +std::ostream& operator<<( + std::ostream& os, + std::experimental::mdspan const& s) +{ + return ddc::detail::stream_impl(os, s, std::make_index_sequence()); +} +} // namespace ddc diff --git a/include/ddc/misc/ginkgo_executors.hpp b/include/ddc/misc/ginkgo_executors.hpp new file mode 100644 index 000000000..659126667 --- /dev/null +++ b/include/ddc/misc/ginkgo_executors.hpp @@ -0,0 +1,45 @@ +#pragma once + +#include +#include + +#include + +inline std::shared_ptr create_default_host_executor() +{ +#ifdef KOKKOS_ENABLE_SERIAL + if constexpr (std::is_same_v) { + return gko::ReferenceExecutor::create(); + } +#endif +#ifdef KOKKOS_ENABLE_OPENMP + if constexpr (std::is_same_v) { + return gko::OmpExecutor::create(); + } +#endif +} // Comes from "Basic Kokkos Extension" Ginkgo MR + +template +inline std::shared_ptr create_gko_exec() +{ +#ifdef KOKKOS_ENABLE_SERIAL + if (std::is_same_v) { + return gko::ReferenceExecutor::create(); + } +#endif +#ifdef KOKKOS_ENABLE_OPENMP + if (std::is_same_v) { + return gko::OmpExecutor::create(); + } +#endif +#ifdef KOKKOS_ENABLE_CUDA + if (std::is_same_v) { + return gko::CudaExecutor::create(0, create_default_host_executor()); + } +#endif +#ifdef KOKKOS_ENABLE_HIP + if (std::is_same_v) { + return gko::CudaExecutor::create(0, create_default_host_executor()); + } +#endif +} diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt index 2a4b8a315..be6fb2e72 100644 --- a/tests/CMakeLists.txt +++ b/tests/CMakeLists.txt @@ -21,7 +21,6 @@ add_executable(ddc_tests for_each.cpp fill.cpp discrete_element.cpp - fft.cpp discrete_vector.cpp ) target_compile_features(ddc_tests PUBLIC cxx_std_17) @@ -31,3 +30,17 @@ target_link_libraries(ddc_tests DDC::DDC ) gtest_discover_tests(ddc_tests) +if("${BUILD_FFT_KERNEL}") + add_executable(fft_tests main.cpp fft.cpp) + target_compile_features(fft_tests PUBLIC cxx_std_17) + target_link_libraries(fft_tests + PUBLIC + GTest::gtest + DDC::DDC + ) + gtest_discover_tests(fft_tests) +endif() +if("${BUILD_SPLINES_KERNEL}") + add_subdirectory(splines) +endif() + diff --git a/tests/splines/CMakeLists.txt b/tests/splines/CMakeLists.txt new file mode 100644 index 000000000..aaf35554f --- /dev/null +++ b/tests/splines/CMakeLists.txt @@ -0,0 +1,102 @@ +cmake_minimum_required(VERSION 3.15) + +include(GoogleTest) + +set(SPLINES_TEST_DEGREE_MIN 3 CACHE STRING "Minimum degree to test splines.") +set(SPLINES_TEST_DEGREE_MAX 3 CACHE STRING "Maximum degree to test splines.") + +add_executable(splines_tests + ../main.cpp + view.cpp +) + +target_compile_features(splines_tests PUBLIC cxx_std_17) +target_link_libraries(splines_tests + PUBLIC + GTest::gtest + DDC::DDC +) +gtest_discover_tests(splines_tests) + +add_executable(bsplines_tests + ../main.cpp + bsplines.cpp +) +target_compile_features(bsplines_tests PUBLIC cxx_std_17) +target_link_libraries(bsplines_tests + PUBLIC + GTest::gtest + DDC::DDC +) + +add_executable(matrix_tests + ../main.cpp + matrix.cpp +) +target_compile_features(matrix_tests PUBLIC cxx_std_17) +target_link_libraries(matrix_tests + PUBLIC + GTest::gtest + DDC::DDC +) +gtest_discover_tests(matrix_tests) + +foreach(DEGREE_X RANGE "${SPLINES_TEST_DEGREE_MIN}" "${SPLINES_TEST_DEGREE_MAX}") + foreach(BSPLINES_TYPE "BSPLINES_TYPE_UNIFORM" "BSPLINES_TYPE_NON_UNIFORM") + set(test_name "splines_tests_DEGREE_X_${DEGREE_X}_${BSPLINES_TYPE}") + add_executable("${test_name}" ../main.cpp periodic_spline_builder.cpp) + target_compile_features("${test_name}" PUBLIC cxx_std_17) + target_link_libraries("${test_name}" + PUBLIC + GTest::gtest + DDC::DDC + ) + target_compile_definitions("${test_name}" PUBLIC -DDEGREE_X=${DEGREE_X} -D${BSPLINES_TYPE}) + add_test("${test_name}" "${test_name}") + endforeach() +endforeach() + +foreach(DEGREE_X RANGE "${SPLINES_TEST_DEGREE_MIN}" "${SPLINES_TEST_DEGREE_MAX}") + foreach(BSPLINES_TYPE "BSPLINES_TYPE_UNIFORM" "BSPLINES_TYPE_NON_UNIFORM") + set(test_name "splines_tests_BATCHED_DEGREE_X_${DEGREE_X}_${BSPLINES_TYPE}") + add_executable("${test_name}" ../main.cpp batched_spline_builder.cpp) + target_compile_features("${test_name}" PUBLIC cxx_std_17) + target_link_libraries("${test_name}" + PUBLIC + GTest::gtest + DDC::DDC + ) + target_compile_definitions("${test_name}" PUBLIC -DDEGREE_X=${DEGREE_X} -D${BSPLINES_TYPE}) + # add_test("${test_name}" "${test_name}") + gtest_discover_tests("${test_name}") + endforeach() +endforeach() + +foreach(DEGREE_X RANGE "${SPLINES_TEST_DEGREE_MIN}" "${SPLINES_TEST_DEGREE_MAX}") + set(test_name "splines_ordered_points_DEGREE_X_${DEGREE_X}") + add_executable("${test_name}" ../main.cpp periodic_spline_builder_ordered_points.cpp) + target_compile_features("${test_name}" PUBLIC cxx_std_17) + target_link_libraries("${test_name}" + PUBLIC + GTest::gtest + DDC::DDC + ) + target_compile_definitions("${test_name}" PUBLIC -DDEGREE_X=${DEGREE_X}) + add_test("${test_name}" "${test_name}") +endforeach() + +foreach(DEGREE_X RANGE "${SPLINES_TEST_DEGREE_MIN}" "${SPLINES_TEST_DEGREE_MAX}") + foreach(BSPLINES_TYPE "BSPLINES_TYPE_UNIFORM" "BSPLINES_TYPE_NON_UNIFORM") + set(test_name "splines_tests_PERIODICITY_DEGREE_X_${DEGREE_X}_${BSPLINES_TYPE}") + add_executable("${test_name}" ../main.cpp periodicity_spline_builder.cpp) + target_compile_features("${test_name}" PUBLIC cxx_std_17) + target_link_libraries("${test_name}" + PUBLIC + GTest::gtest + DDC::DDC + ) + target_compile_definitions("${test_name}" PUBLIC -DDEGREE_X=${DEGREE_X} -D${BSPLINES_TYPE}) + # add_test("${test_name}" "${test_name}") + gtest_discover_tests("${test_name}") + endforeach() +endforeach() diff --git a/tests/splines/batched_spline_builder.cpp b/tests/splines/batched_spline_builder.cpp new file mode 100644 index 000000000..5ef527baf --- /dev/null +++ b/tests/splines/batched_spline_builder.cpp @@ -0,0 +1,514 @@ +#include +#include +#include +#include +#include + +#include + +#include +#include + +#include + +#include "ddc/coordinate.hpp" +#include "ddc/detail/macros.hpp" +#include "ddc/discrete_domain.hpp" +#include "ddc/for_each.hpp" +#include "ddc/uniform_point_sampling.hpp" + +#include "cosine_evaluator.hpp" +#include "polynomial_evaluator.hpp" +#include "spline_error_bounds.hpp" + +struct DimX +{ + static constexpr bool PERIODIC = true; +}; + +struct DimY +{ + static constexpr bool PERIODIC = true; +}; + +struct DimZ +{ + static constexpr bool PERIODIC = true; +}; + +struct DimT +{ + static constexpr bool PERIODIC = true; +}; + +static constexpr std::size_t s_degree_x = DEGREE_X; + +template +using GrevillePoints = ddc:: + GrevilleInterpolationPoints; + +#if defined(BSPLINES_TYPE_UNIFORM) +template +using BSplines = ddc::UniformBSplines; + +// Gives discrete dimension. In the dimension of interest, it is deduced from the BSplines type. In the other dimensions, it has to be newly defined. In practice both types coincide in the test, but it may not be the case. +template +using IDim = std::conditional_t< + std::is_same_v, + typename GrevillePoints>::interpolation_mesh_type, + ddc::UniformPointSampling>; + +#elif defined(BSPLINES_TYPE_NON_UNIFORM) +template +using BSplines = ddc::NonUniformBSplines; + +template +using IDim = std::conditional_t< + std::is_same_v, + typename GrevillePoints>::interpolation_mesh_type, + ddc::NonUniformPointSampling>; +#endif +template +using evaluator_type = CosineEvaluator::Evaluator; + +template +using Index = ddc::DiscreteElement; +template +using DVect = ddc::DiscreteVector; +template +using Coord = ddc::Coordinate; + +// Extract batch dimensions from IDim (remove dimension of interest). Usefull +template +using BatchDims = ddc::type_seq_remove_t, ddc::detail::TypeSeq>; + +// Templated function giving first coordinate of the mesh in given dimension. +template +static constexpr Coord x0() +{ + return Coord(0.); +} + +// Templated function giving last coordinate of the mesh in given dimension. +template +static constexpr Coord xN() +{ + return Coord(1.); +} + +// Templated function giving step of the mesh in given dimension. +template +static constexpr double dx(double ncells) +{ + return (xN() - x0()) / ncells; +} + +// Templated function giving break points of mesh in given dimension for non-uniform case. +template +static constexpr std::vector> breaks(double ncells) +{ + std::vector> out(ncells + 1); + for (int i(0); i < ncells + 1; ++i) { + out[i] = x0() + i * dx(ncells); + } + return out; +} + +// Helper to initialize space +template +struct DimsInitializer; + +template +struct DimsInitializer> +{ + void operator()(std::size_t const ncells) + { +#if defined(BSPLINES_TYPE_UNIFORM) + (ddc::init_discrete_space(IDimX:: + init(x0(), + xN(), + DVect(ncells))), + ...); + ddc::init_discrete_space>( + x0(), + xN(), + ncells); +#elif defined(BSPLINES_TYPE_NON_UNIFORM) + (ddc::init_discrete_space(breaks(ncells)), + ...); + ddc::init_discrete_space>( + breaks(ncells)); +#endif + ddc::init_discrete_space( + GrevillePoints< + BSplines>::get_sampling()); + } +}; + +// Checks that when evaluating the spline at interpolation points one +// recovers values that were used to build the spline +template +static void BatchedPeriodicSplineTest() +{ + // Instantiate execution spaces and initialize spaces + Kokkos::DefaultHostExecutionSpace host_exec_space = Kokkos::DefaultHostExecutionSpace(); + ExecSpace exec_space = ExecSpace(); + + std::size_t constexpr ncells = 10; + DimsInitializer, BatchDims, IDim...>> dims_initializer; + dims_initializer(ncells); + + // Create the values domain (mesh) + ddc::DiscreteDomain...> const dom_vals = ddc::DiscreteDomain...>( + (std::is_same_v + ? GrevillePoints>::get_domain() + : ddc::DiscreteDomain< + IDim>(Index>(0), DVect>(ncells)))...); + + // Create a SplineBuilderBatched over BSplines and batched along other dimensions using some boundary conditions + ddc::SplineBuilderBatched< + ddc::SplineBuilder< + ExecSpace, + MemorySpace, + BSplines, + IDim, + ddc::BoundCond::PERIODIC, + ddc::BoundCond::PERIODIC>, + IDim...> + spline_builder(dom_vals); + + // Compute usefull domains (dom_interpolation, dom_batch, dom_bsplines and dom_spline) + ddc::DiscreteDomain> const dom_interpolation = spline_builder.interpolation_domain(); + auto const dom_batch = spline_builder.batch_domain(); + ddc::DiscreteDomain> const dom_bsplines = spline_builder.bsplines_domain(); + auto const dom_spline = spline_builder.spline_domain(); + + // Allocate and fill a chunk containing values to be passed as input to spline_builder. Those are values of cosine along interest dimension duplicated along batch dimensions + ddc::Chunk vals1_cpu_alloc( + dom_interpolation, + ddc::KokkosAllocator()); + ddc::ChunkSpan vals1_cpu = vals1_cpu_alloc.span_view(); + evaluator_type> evaluator(dom_interpolation); + evaluator(vals1_cpu); + ddc::Chunk vals1_alloc(dom_interpolation, ddc::KokkosAllocator()); + ddc::ChunkSpan vals1 = vals1_alloc.span_view(); + ddc::deepcopy(vals1, vals1_cpu); + + ddc::Chunk vals_alloc(dom_vals, ddc::KokkosAllocator()); + ddc::ChunkSpan vals = vals_alloc.span_view(); + ddc::for_each( + ddc::policies::policy(exec_space), + vals.domain(), + DDC_LAMBDA(Index...> const e) { + vals(e) = vals1(ddc::select>(e)); + }); + + // Instantiate chunk of spline coefs to receive output of spline_builder + ddc::Chunk coef_alloc(dom_spline, ddc::KokkosAllocator()); + ddc::ChunkSpan coef = coef_alloc.span_view(); + + // Finally compute the spline by filling `coef` + spline_builder(coef, vals); + + // Instantiate a SplineEvaluator over interest dimension and batched along other dimensions + ddc::SplineEvaluatorBatched< + ddc::SplineEvaluator, IDim>, + IDim...> + spline_evaluator_batched( + coef.domain(), + ddc::g_null_boundary>, + ddc::g_null_boundary>); + + // Instantiate chunk of coordinates of dom_interpolation + ddc::Chunk coords_eval_alloc(dom_vals, ddc::KokkosAllocator, MemorySpace>()); + ddc::ChunkSpan coords_eval = coords_eval_alloc.span_view(); + ddc::for_each( + ddc::policies::policy(exec_space), + coords_eval.domain(), + DDC_LAMBDA(Index...> const e) { coords_eval(e) = ddc::coordinate(e); }); + + + // Instantiate chunks to receive outputs of spline_evaluator + ddc::Chunk spline_eval_alloc(dom_vals, ddc::KokkosAllocator()); + ddc::ChunkSpan spline_eval = spline_eval_alloc.span_view(); + ddc::Chunk spline_eval_deriv_alloc(dom_vals, ddc::KokkosAllocator()); + ddc::ChunkSpan spline_eval_deriv = spline_eval_deriv_alloc.span_view(); + ddc::Chunk spline_eval_integrals_alloc(dom_batch, ddc::KokkosAllocator()); + ddc::ChunkSpan spline_eval_integrals = spline_eval_integrals_alloc.span_view(); + + // Call spline_evaluator on the same mesh we started with + spline_evaluator_batched(spline_eval, coords_eval.span_cview(), coef.span_cview()); + spline_evaluator_batched.deriv(spline_eval_deriv, coords_eval.span_cview(), coef.span_cview()); + spline_evaluator_batched.integrate(spline_eval_integrals, coef.span_cview()); + + // Checking errors (we recover the initial values) + double max_norm_error = ddc::transform_reduce( + ddc::policies::policy(exec_space), + spline_eval.domain(), + 0., + ddc::reducer::max(), + DDC_LAMBDA(Index...> const e) { + return Kokkos::abs(spline_eval(e) - vals(e)); + }); + + double max_norm_error_diff = ddc::transform_reduce( + ddc::policies::policy(exec_space), + spline_eval_deriv.domain(), + 0., + ddc::reducer::max(), + DDC_LAMBDA(Index...> const e) { + Coord const x = ddc::coordinate(ddc::select>(e)); + return Kokkos::abs(spline_eval_deriv(e) - evaluator.deriv(x, 1)); + }); + double max_norm_error_integ = ddc::transform_reduce( + ddc::policies::policy(exec_space), + spline_eval_integrals.domain(), + 0., + ddc::reducer::max(), + DDC_LAMBDA(typename decltype(spline_builder)::batch_domain_type:: + discrete_element_type const e) { + return Kokkos::abs( + spline_eval_integrals(e) - evaluator.deriv(xN(), -1) + + evaluator.deriv(x0(), -1)); + }); + + double const max_norm = evaluator.max_norm(); + double const max_norm_diff = evaluator.max_norm(1); + double const max_norm_int = evaluator.max_norm(-1); + + SplineErrorBounds>> error_bounds(evaluator); + EXPECT_LE( + max_norm_error, + std::max(error_bounds.error_bound(dx(ncells), s_degree_x), 1.0e-14 * max_norm)); + EXPECT_LE( + max_norm_error_diff, + std:: + max(error_bounds.error_bound_on_deriv(dx(ncells), s_degree_x), + 1e-12 * max_norm_diff)); + EXPECT_LE( + max_norm_error_integ, + std:: + max(error_bounds.error_bound_on_int(dx(ncells), s_degree_x), + 1.0e-14 * max_norm_int)); +} + +TEST(BatchedPeriodicSplineHost, 1DX) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultHostExecutionSpace, + Kokkos::DefaultHostExecutionSpace::memory_space, + DimX, + DimX>(); +} + +TEST(BatchedPeriodicSplineDevice, 1DX) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultExecutionSpace, + Kokkos::DefaultExecutionSpace::memory_space, + DimX, + DimX>(); +} + +TEST(BatchedPeriodicSplineHost, 2DX) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultHostExecutionSpace, + Kokkos::DefaultHostExecutionSpace::memory_space, + DimX, + DimX, + DimY>(); +} + +TEST(BatchedPeriodicSplineHost, 2DY) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultHostExecutionSpace, + Kokkos::DefaultHostExecutionSpace::memory_space, + DimY, + DimX, + DimY>(); +} + +TEST(BatchedPeriodicSplineDevice, 2DX) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultExecutionSpace, + Kokkos::DefaultExecutionSpace::memory_space, + DimX, + DimX, + DimY>(); +} + +TEST(BatchedPeriodicSplineDevice, 2DY) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultExecutionSpace, + Kokkos::DefaultExecutionSpace::memory_space, + DimY, + DimX, + DimY>(); +} + +TEST(BatchedPeriodicSplineHost, 3DX) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultHostExecutionSpace, + Kokkos::DefaultHostExecutionSpace::memory_space, + DimX, + DimX, + DimY, + DimZ>(); +} + +TEST(BatchedPeriodicSplineHost, 3DY) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultHostExecutionSpace, + Kokkos::DefaultHostExecutionSpace::memory_space, + DimY, + DimX, + DimY, + DimZ>(); +} + +TEST(BatchedPeriodicSplineHost, 3DZ) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultHostExecutionSpace, + Kokkos::DefaultHostExecutionSpace::memory_space, + DimZ, + DimX, + DimY, + DimZ>(); +} + +TEST(BatchedPeriodicSplineDevice, 3DX) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultExecutionSpace, + Kokkos::DefaultExecutionSpace::memory_space, + DimX, + DimX, + DimY, + DimZ>(); +} + +TEST(BatchedPeriodicSplineDevice, 3DY) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultExecutionSpace, + Kokkos::DefaultExecutionSpace::memory_space, + DimY, + DimX, + DimY, + DimZ>(); +} + +TEST(BatchedPeriodicSplineDevice, 3DZ) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultExecutionSpace, + Kokkos::DefaultExecutionSpace::memory_space, + DimZ, + DimX, + DimY, + DimZ>(); +} + + +TEST(BatchedPeriodicSplineHost, 4DX) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultHostExecutionSpace, + Kokkos::DefaultHostExecutionSpace::memory_space, + DimX, + DimX, + DimY, + DimZ, + DimT>(); +} + +TEST(BatchedPeriodicSplineHost, 4DY) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultHostExecutionSpace, + Kokkos::DefaultHostExecutionSpace::memory_space, + DimY, + DimX, + DimY, + DimZ, + DimT>(); +} + +TEST(BatchedPeriodicSplineHost, 4DZ) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultHostExecutionSpace, + Kokkos::DefaultHostExecutionSpace::memory_space, + DimZ, + DimX, + DimY, + DimZ, + DimT>(); +} + +TEST(BatchedPeriodicSplineHost, 4DT) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultHostExecutionSpace, + Kokkos::DefaultHostExecutionSpace::memory_space, + DimT, + DimX, + DimY, + DimZ, + DimT>(); +} + +TEST(BatchedPeriodicSplineDevice, 4DX) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultExecutionSpace, + Kokkos::DefaultExecutionSpace::memory_space, + DimX, + DimX, + DimY, + DimZ, + DimT>(); +} + +TEST(BatchedPeriodicSplineDevice, 4DY) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultExecutionSpace, + Kokkos::DefaultExecutionSpace::memory_space, + DimY, + DimX, + DimY, + DimZ, + DimT>(); +} + +TEST(BatchedPeriodicSplineDevice, 4DZ) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultExecutionSpace, + Kokkos::DefaultExecutionSpace::memory_space, + DimZ, + DimX, + DimY, + DimZ, + DimT>(); +} + +TEST(BatchedPeriodicSplineDevice, 4DT) +{ + BatchedPeriodicSplineTest< + Kokkos::DefaultExecutionSpace, + Kokkos::DefaultExecutionSpace::memory_space, + DimT, + DimX, + DimY, + DimZ, + DimT>(); +} diff --git a/tests/splines/bsplines.cpp b/tests/splines/bsplines.cpp new file mode 100644 index 000000000..c998bcac6 --- /dev/null +++ b/tests/splines/bsplines.cpp @@ -0,0 +1,91 @@ +#include +#include + +#include +#include + +#include + +#include "test_utils.hpp" + +template +struct BSplinesFixture; + +template +struct BSplinesFixture, + std::integral_constant, + std::integral_constant>> : public testing::Test +{ + struct DimX + { + static constexpr bool PERIODIC = periodic; + }; + static constexpr std::size_t spline_degree = D; + static constexpr std::size_t ncells = Nc; +}; + +using degrees = std::integer_sequence; +using ncells = std::integer_sequence; +using periodicity = std::integer_sequence; + +using Cases = tuple_to_types_t>; + +TYPED_TEST_SUITE(BSplinesFixture, Cases); + +TYPED_TEST(BSplinesFixture, PartitionOfUnity_Uniform) +{ + std::size_t constexpr degree = TestFixture::spline_degree; + using DimX = typename TestFixture::DimX; + using CoordX = ddc::Coordinate; + static constexpr CoordX xmin = CoordX(0.0); + static constexpr CoordX xmax = CoordX(0.2); + static constexpr std::size_t ncells = TestFixture::ncells; + ddc::init_discrete_space>(xmin, xmax, ncells); + + std::array values; + + std::size_t const n_test_points = ncells * 30; + double const dx = (xmax - xmin) / (n_test_points - 1); + + for (std::size_t i(0); i < n_test_points; ++i) { + CoordX test_point(xmin + dx * i); + ddc::discrete_space>().eval_basis(values, test_point); + double sum = 0.0; + for (std::size_t j(0); j < degree + 1; ++j) { + sum += values[j]; + } + EXPECT_LE(fabs(sum - 1.0), 1.0e-15); + } +} + +TYPED_TEST(BSplinesFixture, PartitionOfUnity_NonUniform) +{ + std::size_t constexpr degree = TestFixture::spline_degree; + using DimX = typename TestFixture::DimX; + using CoordX = ddc::Coordinate; + static constexpr CoordX xmin = CoordX(0.0); + static constexpr CoordX xmax = CoordX(0.2); + static constexpr std::size_t ncells = TestFixture::ncells; + std::vector breaks(ncells + 1); + double dx = (xmax - xmin) / ncells; + for (std::size_t i(0); i < ncells + 1; ++i) { + breaks[i] = CoordX(xmin + i * dx); + } + ddc::init_discrete_space>(breaks); + + std::array values; + + std::size_t n_test_points = ncells * 30; + dx = (xmax - xmin) / (n_test_points - 1); + + for (std::size_t i(0); i < n_test_points; ++i) { + CoordX test_point(xmin + dx * i); + ddc::discrete_space>().eval_basis(values, test_point); + double sum = 0.0; + for (std::size_t j(0); j < degree + 1; ++j) { + sum += values[j]; + } + EXPECT_LE(fabs(sum - 1.0), 1.0e-15); + } +} diff --git a/tests/splines/cosine_evaluator.hpp b/tests/splines/cosine_evaluator.hpp new file mode 100644 index 000000000..e00f19d5a --- /dev/null +++ b/tests/splines/cosine_evaluator.hpp @@ -0,0 +1,76 @@ +#pragma once + +#include + +#include +#include + +struct CosineEvaluator +{ + template + class Evaluator + { + public: + using Dim = DDim; + + private: + static inline constexpr double s_2_pi = 2. * M_PI; + + private: + double m_c0; + + double m_c1; + + public: + template + Evaluator(Domain domain) : m_c0(1.0) + , m_c1(0.0) + { + } + + Evaluator(double c0, double c1) : m_c0(c0), m_c1(c1) {} + + KOKKOS_FUNCTION double operator()(double const x) const noexcept + { + return eval(x, 0); + } + + KOKKOS_FUNCTION void operator()( + ddc::ChunkSpan> chunk) const + { + auto const& domain = chunk.domain(); + + for (ddc::DiscreteElement const i : domain) { + chunk(i) = eval(ddc::coordinate(i), 0); + } + } + + KOKKOS_FUNCTION double deriv(double const x, int const derivative) const noexcept + { + return eval(x, derivative); + } + + KOKKOS_FUNCTION void deriv( + ddc::ChunkSpan> chunk, + int const derivative) const + { + auto const& domain = chunk.domain(); + + for (ddc::DiscreteElement const i : domain) { + chunk(i) = eval(ddc::coordinate(i), derivative); + } + } + + KOKKOS_FUNCTION double max_norm(int diff = 0) const + { + return ddc::detail::ipow(s_2_pi * m_c0, diff); + } + + private: + KOKKOS_FUNCTION double eval(double const x, int const derivative) const noexcept + { + return ddc::detail::ipow(s_2_pi * m_c0, derivative) + * Kokkos::cos(M_PI_2 * derivative + s_2_pi * (m_c0 * x + m_c1)); + } + }; +}; diff --git a/tests/splines/matrix.cpp b/tests/splines/matrix.cpp new file mode 100644 index 000000000..5618eaf46 --- /dev/null +++ b/tests/splines/matrix.cpp @@ -0,0 +1,97 @@ +#include +#include +#include + +#include +#include + +#include + +#include "ddc/kernels/splines/view.hpp" + +#include "test_utils.hpp" + +namespace { + +void fill_identity(ddc::DSpan2D mat) +{ + assert(mat.extent(0) == mat.extent(1)); + for (std::size_t i(0); i < mat.extent(0); ++i) { + for (std::size_t j(0); j < mat.extent(1); ++j) { + mat(i, j) = int(i == j); + } + } +} + +/* +void copy_matrix(ddc::DSpan2D copy, std::unique_ptr& mat) +{ + assert(mat->get_size() == int(copy.extent(0))); + assert(mat->get_size() == int(copy.extent(1))); + + for (std::size_t i(0); i < copy.extent(0); ++i) { + for (std::size_t j(0); j < copy.extent(1); ++j) { + copy(i, j) = mat->get_element(i, j); + } + } +} +*/ + +void check_inverse(ddc::DSpan2D matrix, ddc::DSpan2D inv) +{ + double TOL = 1e-10; + std::size_t N = matrix.extent(0); + + for (std::size_t i(0); i < N; ++i) { + for (std::size_t j(0); j < N; ++j) { + double id_val = 0.0; + for (std::size_t k(0); k < N; ++k) { + id_val += matrix(i, k) * inv(j, k); + } + EXPECT_NEAR(id_val, static_cast(i == j), TOL); + } + } +} +} // namespace + +class MatrixSizesFixture : public testing::TestWithParam> +{ +}; + +TEST_P(MatrixSizesFixture, Sparse) +{ + auto const [N, k] = GetParam(); + + + std::unique_ptr matrix + = ddc::detail::MatrixMaker::make_new_sparse(N); + + std::vector val_ptr(N * N); + ddc::DSpan2D val(val_ptr.data(), N, N); + for (int i(0); i < N; ++i) { + for (int j(0); j < N; ++j) { + if (i == j) { + matrix->set_element(i, j, 3. / 4); + val(i, j) = 3. / 4; + } else if (std::abs(j - i) <= k) { + matrix->set_element(i, j, -(1. / 4) / k); + val(i, j) = -(1. / 4) / k; + } else { + val(i, j) = 0.; + } + } + } + // copy_matrix(val, matrix); // copy_matrix is not available for sparse matrix because of a limitation of Ginkgo API (get_element is not implemented). The workaround is to fill val directly in the loop + + std::vector inv_ptr(N * N); + ddc::DSpan2D inv(inv_ptr.data(), N, N); + fill_identity(inv); + matrix->factorize(); + matrix->solve_multiple_inplace(inv); + check_inverse(val, inv); +} + +INSTANTIATE_TEST_SUITE_P( + MyGroup, + MatrixSizesFixture, + testing::Combine(testing::Values(10, 20), testing::Range(1, 7))); diff --git a/tests/splines/periodic_spline_builder.cpp b/tests/splines/periodic_spline_builder.cpp new file mode 100644 index 000000000..bb29c9066 --- /dev/null +++ b/tests/splines/periodic_spline_builder.cpp @@ -0,0 +1,146 @@ +#include +#include +#include +#include +#include + +#include + +#include +#include + +#include + +#include "cosine_evaluator.hpp" +#include "polynomial_evaluator.hpp" +#include "spline_error_bounds.hpp" + +struct DimX +{ + static constexpr bool PERIODIC = true; +}; + +static constexpr std::size_t s_degree_x = DEGREE_X; + +#if defined(BSPLINES_TYPE_UNIFORM) +using BSplinesX = ddc::UniformBSplines; +#elif defined(BSPLINES_TYPE_NON_UNIFORM) +using BSplinesX = ddc::NonUniformBSplines; +#endif + +using GrevillePoints = ddc:: + GrevilleInterpolationPoints; + +using IDimX = GrevillePoints::interpolation_mesh_type; + +using evaluator_type = CosineEvaluator::Evaluator; + +using IndexX = ddc::DiscreteElement; +using DVectX = ddc::DiscreteVector; +using BsplIndexX = ddc::DiscreteElement; +using CoordX = ddc::Coordinate; + +// Checks that when evaluating the spline at interpolation points one +// recovers values that were used to build the spline +TEST(PeriodicSplineBuilderTest, Identity) +{ + CoordX constexpr x0(0.); + CoordX constexpr xN(1.); + std::size_t constexpr ncells = 10; // TODO : restore 10 + + // 1. Create BSplines + { +#if defined(BSPLINES_TYPE_UNIFORM) + ddc::init_discrete_space(x0, xN, ncells); +#elif defined(BSPLINES_TYPE_NON_UNIFORM) + DVectX constexpr npoints(ncells + 1); + std::vector breaks(npoints); + double dx = (xN - x0) / ncells; + for (int i(0); i < npoints; ++i) { + breaks[i] = CoordX(x0 + i * dx); + } + ddc::init_discrete_space(breaks); +#endif + } + ddc::DiscreteDomain const dom_bsplines_x( + ddc::discrete_space().full_domain()); + + // 2. Create a Spline represented by a chunk over BSplines + // The chunk is filled with garbage data, we need to initialize it + ddc::Chunk coef(dom_bsplines_x, ddc::KokkosAllocator()); + + // 3. Create the interpolation domain + ddc::init_discrete_space(GrevillePoints::get_sampling()); + ddc::DiscreteDomain interpolation_domain(GrevillePoints::get_domain()); + + // 4. Create a SplineBuilder over BSplines using some boundary conditions + ddc::SplineBuilder< + Kokkos::DefaultHostExecutionSpace, + Kokkos::HostSpace, + BSplinesX, + IDimX, + ddc::BoundCond::PERIODIC, + ddc::BoundCond::PERIODIC> + spline_builder(interpolation_domain); + + // 5. Allocate and fill a chunk over the interpolation domain + ddc::Chunk yvals(interpolation_domain, ddc::KokkosAllocator()); + evaluator_type evaluator(interpolation_domain); + evaluator(yvals); + + // 6. Finally build the spline by filling `coef` + spline_builder(coef.span_view(), yvals.span_view()); + + // 7. Create a SplineEvaluator to evaluate the spline at any point in the domain of the BSplines + ddc::SplineEvaluator + spline_evaluator(ddc::g_null_boundary, ddc::g_null_boundary); + + ddc::Chunk> coords_eval(interpolation_domain); + for (IndexX const ix : interpolation_domain) { + coords_eval(ix) = ddc::coordinate(ix); + } + + ddc::Chunk spline_eval(interpolation_domain, ddc::KokkosAllocator()); + spline_evaluator(spline_eval.span_view(), coords_eval.span_cview(), coef.span_cview()); + + ddc::Chunk spline_eval_deriv( + interpolation_domain, + ddc::KokkosAllocator()); + spline_evaluator + .deriv(spline_eval_deriv.span_view(), coords_eval.span_cview(), coef.span_cview()); + + // 8. Checking errors + std::cout << "---------- TEST ----------\n"; + double max_norm_error = 0.; + double max_norm_error_diff = 0.; + for (IndexX const ix : interpolation_domain) { + CoordX const x = ddc::coordinate(ix); + + // Compute error + double const error = spline_eval(ix) - yvals(ix); + max_norm_error = std::fmax(max_norm_error, std::fabs(error)); + + // Compute error + double const error_deriv = spline_eval_deriv(ix) - evaluator.deriv(x, 1); + max_norm_error_diff = std::fmax(max_norm_error_diff, std::fabs(error_deriv)); + } + double const max_norm_error_integ = std::fabs( + spline_evaluator.integrate(coef.span_cview()) - evaluator.deriv(xN, -1) + + evaluator.deriv(x0, -1)); + + double const max_norm = evaluator.max_norm(); + double const max_norm_diff = evaluator.max_norm(1); + double const max_norm_int = evaluator.max_norm(-1); + + SplineErrorBounds error_bounds(evaluator); + const double h = (xN - x0) / ncells; + EXPECT_LE( + max_norm_error, + std::max(error_bounds.error_bound(h, s_degree_x), 1.0e-14 * max_norm)); + EXPECT_LE( + max_norm_error_diff, + std::max(error_bounds.error_bound_on_deriv(h, s_degree_x), 1e-12 * max_norm_diff)); + EXPECT_LE( + max_norm_error_integ, + std::max(error_bounds.error_bound_on_int(h, s_degree_x), 1.0e-14 * max_norm_int)); +} diff --git a/tests/splines/periodic_spline_builder_ordered_points.cpp b/tests/splines/periodic_spline_builder_ordered_points.cpp new file mode 100644 index 000000000..21e83d31c --- /dev/null +++ b/tests/splines/periodic_spline_builder_ordered_points.cpp @@ -0,0 +1,61 @@ +#include +#include +#include +#include +#include + +#include + +#include +#include + +#include + +struct DimX +{ + static constexpr bool PERIODIC = true; +}; + +static constexpr std::size_t s_degree_x = DEGREE_X; + +using BSplinesX = ddc::NonUniformBSplines; + +using GrevillePoints = ddc:: + GrevilleInterpolationPoints; + +using IDimX = GrevillePoints::interpolation_mesh_type; + +using IndexX = ddc::DiscreteElement; +using DVectX = ddc::DiscreteVector; +using BsplIndexX = ddc::DiscreteElement; +using SplineX = ddc::Chunk>; +using FieldX = ddc::Chunk>; +using CoordX = ddc::Coordinate; + +TEST(PeriodicSplineBuilderOrderTest, OrderedPoints) +{ + std::size_t constexpr ncells = 10; + + // 1. Create BSplines + int constexpr npoints(ncells + 1); + std::vector d_breaks({0, 0.01, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}); + std::vector breaks(npoints); + for (std::size_t i(0); i < npoints; ++i) { + breaks[i] = CoordX(d_breaks[i]); + } + ddc::init_discrete_space(breaks); + + // 2. Create the interpolation domain + ddc::init_discrete_space(GrevillePoints::get_sampling()); + ddc::DiscreteDomain interpolation_domain(GrevillePoints::get_domain()); + + double last(ddc::coordinate(interpolation_domain.front())); + double current; + for (IndexX const ix : interpolation_domain) { + current = ddc::coordinate(ix); + ASSERT_LE(current, ddc::discrete_space().rmax()); + ASSERT_GE(current, ddc::discrete_space().rmin()); + ASSERT_LE(last, current); + last = current; + } +} diff --git a/tests/splines/periodicity_spline_builder.cpp b/tests/splines/periodicity_spline_builder.cpp new file mode 100644 index 000000000..d016cc4f1 --- /dev/null +++ b/tests/splines/periodicity_spline_builder.cpp @@ -0,0 +1,219 @@ +#include +#include +#include +#include +#include + +#include + +#include +#include + +#include + +#include "cosine_evaluator.hpp" +#include "polynomial_evaluator.hpp" +#include "spline_error_bounds.hpp" + +struct DimX +{ + static constexpr bool PERIODIC = true; +}; + +static constexpr std::size_t s_degree_x = DEGREE_X; + +template +using GrevillePoints = ddc:: + GrevilleInterpolationPoints; + +#if defined(BSPLINES_TYPE_UNIFORM) +template +using BSplines = ddc::UniformBSplines; + +// Gives discrete dimension. In the dimension of interest, it is deduced from the BSplines type. In the other dimensions, it has to be newly defined. In practice both types coincide in the test, but it may not be the case. +template +using IDim = typename GrevillePoints>::interpolation_mesh_type; + +#elif defined(BSPLINES_TYPE_NON_UNIFORM) +template +using BSplines = ddc::NonUniformBSplines; + +template +using IDim = typename GrevillePoints>::interpolation_mesh_type; + +#endif +template +using evaluator_type = CosineEvaluator::Evaluator; + +template +using Index = ddc::DiscreteElement; +template +using DVect = ddc::DiscreteVector; +template +using Coord = ddc::Coordinate; + +// Templated function giving first coordinate of the mesh in given dimension. +template +static constexpr Coord x0() +{ + return Coord(0.); +} + +// Templated function giving last coordinate of the mesh in given dimension. +template +static constexpr Coord xN() +{ + return Coord(1.); +} + +// Templated function giving step of the mesh in given dimension. +template +static constexpr double dx(double ncells) +{ + return (xN() - x0()) / ncells; +} + +// Templated function giving break points of mesh in given dimension for non-uniform case. +template +static constexpr std::vector> breaks(double ncells) +{ + std::vector> out(ncells + 1); + for (int i(0); i < ncells + 1; ++i) { + out[i] = x0() + i * dx(ncells); + } + return out; +} + +// Helper to initialize space +template +struct DimsInitializer +{ + void operator()(std::size_t const ncells) + { +#if defined(BSPLINES_TYPE_UNIFORM) + ddc::init_discrete_space>( + x0(), + xN(), + ncells); +#elif defined(BSPLINES_TYPE_NON_UNIFORM) + ddc::init_discrete_space>( + breaks(ncells)); +#endif + ddc::init_discrete_space( + GrevillePoints< + BSplines>::get_sampling()); + } +}; + +// Checks that when evaluating the spline at interpolation points one +// recovers values that were used to build the spline +template +static void PeriodicitySplineBuilderTest() +{ + // Instantiate execution spaces and initialize spaces + Kokkos::DefaultHostExecutionSpace host_exec_space = Kokkos::DefaultHostExecutionSpace(); + ExecSpace exec_space = ExecSpace(); + + std::size_t constexpr ncells = 10; + DimsInitializer> dims_initializer; + dims_initializer(ncells); + + // Create the values domain (mesh) + ddc::DiscreteDomain> const dom_vals + = ddc::DiscreteDomain>(GrevillePoints>::get_domain()); + + // Create a SplineBuilderBatched over BSplines and batched along other dimensions using some boundary conditions + ddc::SplineBuilderBatched< + ddc::SplineBuilder< + ExecSpace, + MemorySpace, + BSplines, + IDim, + ddc::BoundCond::PERIODIC, + ddc::BoundCond::PERIODIC>, + IDim> + spline_builder(dom_vals); + + // Compute usefull domains (dom_interpolation, dom_batch, dom_bsplines and dom_spline) + ddc::DiscreteDomain> const dom_bsplines = spline_builder.bsplines_domain(); + + // Allocate and fill a chunk containing values to be passed as input to spline_builder. Those are values of cosine along interest dimension duplicated along batch dimensions + ddc::Chunk vals1_cpu_alloc( + dom_vals, + ddc::KokkosAllocator()); + ddc::ChunkSpan vals1_cpu = vals1_cpu_alloc.span_view(); + evaluator_type> evaluator(dom_vals); + evaluator(vals1_cpu); + ddc::Chunk vals_alloc(dom_vals, ddc::KokkosAllocator()); + ddc::ChunkSpan vals = vals_alloc.span_view(); + ddc::deepcopy(vals, vals1_cpu); + + // Instantiate chunk of spline coefs to receive output of spline_builder + ddc::Chunk coef_alloc(dom_bsplines, ddc::KokkosAllocator()); + ddc::ChunkSpan coef = coef_alloc.span_view(); + + // Finally compute the spline by filling `coef` + spline_builder(coef, vals); + + // Instantiate a SplineEvaluator over interest dimension and batched along other dimensions + ddc::SplineEvaluatorBatched< + ddc::SplineEvaluator, IDim>, + IDim> + spline_evaluator_batched( + coef.domain(), + ddc::g_null_boundary>, + ddc::g_null_boundary>); + + // Instantiate chunk of coordinates of dom_interpolation + ddc::Chunk coords_eval_alloc(dom_vals, ddc::KokkosAllocator, MemorySpace>()); + ddc::ChunkSpan coords_eval = coords_eval_alloc.span_view(); + ddc::for_each( + ddc::policies::policy(exec_space), + coords_eval.domain(), + DDC_LAMBDA(Index> const e) { + coords_eval(e) = ddc::coordinate(e) + Coord(1.5); + }); // Translate function 1.5x domain width to the right. + + + // Instantiate chunks to receive outputs of spline_evaluator + ddc::Chunk spline_eval_alloc(dom_vals, ddc::KokkosAllocator()); + ddc::ChunkSpan spline_eval = spline_eval_alloc.span_view(); + + // Call spline_evaluator on the same mesh we started with + spline_evaluator_batched(spline_eval, coords_eval.span_cview(), coef.span_cview()); + + // Checking errors (we recover the initial values) + double max_norm_error = ddc::transform_reduce( + ddc::policies::policy(exec_space), + spline_eval.domain(), + 0., + ddc::reducer::max(), + DDC_LAMBDA(Index> const e) { + return Kokkos::abs( + spline_eval(e) + - (-vals(e))); // Because function is even, we get f_eval = -f + }); + + double const max_norm = evaluator.max_norm(); + + SplineErrorBounds>> error_bounds(evaluator); + EXPECT_LE( + max_norm_error, + std::max(error_bounds.error_bound(dx(ncells), s_degree_x), 1.0e-14 * max_norm)); +} + +TEST(PeriodicitySplineBuilderHost, 1D) +{ + PeriodicitySplineBuilderTest< + Kokkos::DefaultHostExecutionSpace, + Kokkos::DefaultHostExecutionSpace::memory_space, + DimX>(); +} + +TEST(PeriodicitySplineBuilderDevice, 1D) +{ + PeriodicitySplineBuilderTest< + Kokkos::DefaultExecutionSpace, + Kokkos::DefaultExecutionSpace::memory_space, + DimX>(); +} diff --git a/tests/splines/polynomial_evaluator.hpp b/tests/splines/polynomial_evaluator.hpp new file mode 100644 index 000000000..32e0fa447 --- /dev/null +++ b/tests/splines/polynomial_evaluator.hpp @@ -0,0 +1,96 @@ +#pragma once + +#include +#include +#include +#include + +#include +#include + +struct PolynomialEvaluator +{ + template + class Evaluator + { + public: + using Dim = DDim; + + private: + std::array m_coeffs; + int const m_degree; + double const m_xN; + + public: + template + Evaluator(Domain domain) + : m_degree(Degree) + , m_xN(std::max(std::abs(rmin(domain)), std::abs(rmax(domain)))) + { + for (int i(0); i < m_degree + 1; ++i) { + m_coeffs[i] = double(rand() % 100) / 100.0; + } + } + + double operator()(double const x) const noexcept + { + return eval(x, 0); + } + + void operator()(ddc::ChunkSpan> chunk) const + { + auto const& domain = chunk.domain(); + + for (ddc::DiscreteElement const i : domain) { + chunk(i) = eval(ddc::coordinate(i), 0); + } + } + + double deriv(double const x, int const derivative) const noexcept + { + return eval(x, derivative); + } + + void deriv(ddc::ChunkSpan> chunk, int const derivative) + const + { + auto const& domain = chunk.domain(); + + for (ddc::DiscreteElement const i : domain) { + chunk(i) = eval(ddc::coordinate(i), derivative); + } + } + + double max_norm(int diff = 0) const + { + return std::abs(deriv(m_xN, diff)); + } + + private: + double eval(double const x, int const derivative) const + { + double result(0.0); + int start = derivative < 0 ? 0 : derivative; + for (int i(start); i < m_degree + 1; ++i) { + double v = double(falling_factorial(i, derivative)) * std::pow(x, i - derivative); + result += m_coeffs[i] * v; + } + return result; + } + + double falling_factorial(int i, int d) const + { + double c = 1.0; + if (d >= 0) { + for (int k(0); k < d; ++k) { + c *= (i - k); + } + } else { + for (int k(-1); k > d - 1; --k) { + c /= (i - k); + } + } + return c; + } + }; +}; diff --git a/tests/splines/spline_error_bounds.hpp b/tests/splines/spline_error_bounds.hpp new file mode 100644 index 000000000..7a9a59e9e --- /dev/null +++ b/tests/splines/spline_error_bounds.hpp @@ -0,0 +1,97 @@ +#pragma once + +#include +#include + +#include + +template +class SplineErrorBounds +{ +private: + static constexpr std::array tihomirov_error_bound_array = std::array( + {1.0 / 2.0, + 1.0 / 8.0, + 1.0 / 24.0, + 5.0 / 384.0, + 1.0 / 240.0, + 61.0 / 46080.0, + 17.0 / 40320.0, + 277.0 / 2064384.0, + 31.0 / 725760.0, + 50521.0 / 3715891200.0}); + +private: + Evaluator const& m_evaluator; + +private: + /******************************************************************************* + * Error bound in max norm for spline interpolation of periodic functions from: + * + * V M Tihomirov 1969 Math. USSR Sb. 9 275 + * https://doi.org/10.1070/SM1969v009n02ABEH002052 (page 286, bottom) + * + * Yu. S. Volkov and Yu. N. Subbotin + * https://doi.org/10.1134/S0081543815020236 (equation 14) + * + * Also applicable to first derivative by passing deg-1 instead of deg + * Volkov & Subbotin 2015, eq. 15 + *******************************************************************************/ + static double tihomirov_error_bound(double cell_width, int degree, double max_norm) + { + degree = std::min(degree, 9); + return tihomirov_error_bound_array[degree] * ddc::detail::ipow(cell_width, degree + 1) + * max_norm; + } + +public: + SplineErrorBounds(Evaluator const& evaluator) : m_evaluator(evaluator) {} + + double error_bound(double cell_width, int degree) + { + return tihomirov_error_bound(cell_width, degree, m_evaluator.max_norm(degree + 1)); + } + double error_bound(double cell_width1, double cell_width2, int degree1, int degree2) + { + double norm1 = m_evaluator.max_norm(degree1 + 1, 0); + double norm2 = m_evaluator.max_norm(0, degree2 + 1); + return tihomirov_error_bound(cell_width1, degree1, norm1) + + tihomirov_error_bound(cell_width2, degree2, norm2); + } + double error_bound_on_deriv(double cell_width, int degree) + { + return tihomirov_error_bound(cell_width, degree - 1, m_evaluator.max_norm(degree + 1)); + } + double error_bound_on_deriv_1(double cell_width1, double cell_width2, int degree1, int degree2) + { + double norm1 = m_evaluator.max_norm(degree1 + 1, 0); + double norm2 = m_evaluator.max_norm(0, degree2 + 1); + return tihomirov_error_bound(cell_width1, degree1 - 1, norm1) + + tihomirov_error_bound(cell_width2, degree2, norm2); + } + double error_bound_on_deriv_2(double cell_width1, double cell_width2, int degree1, int degree2) + { + double norm1 = m_evaluator.max_norm(degree1 + 1, 0); + double norm2 = m_evaluator.max_norm(0, degree2 + 1); + return tihomirov_error_bound(cell_width1, degree1, norm1) + + tihomirov_error_bound(cell_width2, degree2 - 1, norm2); + } + + /******************************************************************************* + * NOTE: The following estimates have no theoretical justification but capture + * the correct asympthotic rate of convergence. + * The error constant may be overestimated. + *******************************************************************************/ + double error_bound_on_deriv_12(double cell_width1, double cell_width2, int degree1, int degree2) + { + double norm1 = m_evaluator.max_norm(degree1 + 1, 1); + double norm2 = m_evaluator.max_norm(1, degree2 + 1); + return tihomirov_error_bound(cell_width1, degree1 - 1, norm1) + + tihomirov_error_bound(cell_width2, degree2 - 1, norm2); + } + + double error_bound_on_int(double cell_width, int degree) + { + return tihomirov_error_bound(cell_width, degree + 1, m_evaluator.max_norm(degree + 1)); + } +}; diff --git a/tests/splines/test_utils.hpp b/tests/splines/test_utils.hpp new file mode 100644 index 000000000..b24986bd6 --- /dev/null +++ b/tests/splines/test_utils.hpp @@ -0,0 +1,102 @@ +#pragma once + +#include +#include +#include + +#include + +/// Transform a sequence S to a tuple: +/// - a std::integer_sequence to a std::tuple...> +/// - a std::pair to a std::tuple +/// - identity otherwise (std::tuple) +template +struct to_tuple +{ + using type = S; +}; + +template +struct to_tuple> +{ + using type = std::tuple...>; +}; + +template +struct to_tuple> +{ + using type = std::tuple; +}; + +template +using to_tuple_t = typename to_tuple::type; + +template +struct for_each_tuple_cat; + +template +struct for_each_tuple_cat, Tuple> +{ + using type = std::tuple< + decltype(std::tuple_cat(std::declval(), std::declval()))...>; +}; + +/// Construct a tuple of tuples that is the result of the concatenation of the tuples in TupleOfTuples with Tuple. +template +using for_each_tuple_cat_t = typename for_each_tuple_cat::type; + +static_assert(std::is_same_v< + for_each_tuple_cat_t< + std::tuple, std::tuple>, + std::tuple>, + std::tuple, std::tuple>>); + +static_assert(std::is_same_v< + for_each_tuple_cat_t>, std::tuple>, + std::tuple>>); + +template +struct cartesian_product_impl; + +template +struct cartesian_product_impl, TailTuples...>, OutTupleOfTuples> + : cartesian_product_impl< + std::tuple, + decltype(std::tuple_cat( + std::declval< + for_each_tuple_cat_t>>()...))> +{ +}; + +template +struct cartesian_product_impl, OutTupleOfTuples> +{ + using type = OutTupleOfTuples; +}; + +/// Generate a std::tuple cartesian product from multiple tuple-like structures (std::tuple, std::integer_sequence and std::pair) +/// Do not rely on the ordering result. +template +using cartesian_product_t = typename cartesian_product_impl< + std::tuple...>, + std::tuple>>::type; + +static_assert(std::is_same_v< + cartesian_product_t, std::tuple>, + std::tuple, std::tuple>>); + +/// Transform a std::tuple to a testing::Types, identity otherwise +template +struct tuple_to_types +{ + using type = T; +}; + +template +struct tuple_to_types> +{ + using type = testing::Types; +}; + +template +using tuple_to_types_t = typename tuple_to_types::type; diff --git a/tests/splines/view.cpp b/tests/splines/view.cpp new file mode 100644 index 000000000..3c1945062 --- /dev/null +++ b/tests/splines/view.cpp @@ -0,0 +1,30 @@ +#include +#include +#include + +#include + +#include + +#include + +using namespace std; +using namespace std::experimental; + +TEST(View1DTest, Constructor) +{ + std::array x = {0}; + std::array cx = {0}; + std::array const ccx = {0}; + std::array const ccx2 = {0}; + ddc::Span1D xv(x.data(), x.size()); + [[maybe_unused]] ddc::Span1D xv_(xv); + [[maybe_unused]] ddc::Span1D xcv(xv); + [[maybe_unused]] ddc::Span1D xcv_(xcv); + ddc::Span1D cxcv(cx.data(), cx.size()); + [[maybe_unused]] ddc::Span1D cxcv_(cxcv); + ddc::Span1D ccxcv(ccx.data(), ccx.size()); + [[maybe_unused]] ddc::Span1D ccxcv_(cxcv); + ddc::Span1D ccx2cv(ccx2.data(), ccx2.size()); + [[maybe_unused]] ddc::Span1D ccx2cv_(ccx2cv); +} From 764bcc689ba850eba476488350db94bc52c77bf1 Mon Sep 17 00:00:00 2001 From: Julien Bigot Date: Sat, 11 Nov 2023 07:54:06 -0700 Subject: [PATCH 2/3] Update header.html (#220) --- docs/_template/header.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/_template/header.html b/docs/_template/header.html index 37cc1f313..deae9be63 100644 --- a/docs/_template/header.html +++ b/docs/_template/header.html @@ -23,5 +23,5 @@

a discrete About
  • Commented example
  • API reference -
  • Contribute on Github
  • +
  • Contribute on Github
  • From 06911af153f92703a60e26e8c5986da00dfa2c0b Mon Sep 17 00:00:00 2001 From: Thomas Padioleau Date: Sat, 11 Nov 2023 19:54:08 +0100 Subject: [PATCH 3/3] Remove HIP_FOR_NVIDIA option (#221) --- CMakeLists.txt | 30 +----------------------------- include/ddc/kernels/fft.hpp | 2 +- 2 files changed, 2 insertions(+), 30 deletions(-) diff --git a/CMakeLists.txt b/CMakeLists.txt index 24db769b9..7734ccbdc 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -12,10 +12,6 @@ option(BUILD_DOCUMENTATION "Build DDC documentation/website" OFF) option(BUILD_EXAMPLES "Build DDC examples" ON) option(DDC_BUILD_PDI_WRAPPER "Build DDC PDI wrapper" ON) option(DDC_ENABLE_DOUBLE "Build DDC with double precision support, float is used otherwise" ON) -option(HIP_FOR_NVIDIA "Use the HIP wrapper for CUDA on NVIDIA plateforms, for development purpose" OFF) -if(NOT(Kokkos_ENABLE_CUDA) AND HIP_FOR_NVIDIA) - message(FATAL_ERROR "Kokkos_ENABLE_CUDA has to be ON to use HIP wrapper on NVIDIA plateforms") -endif() # Dependencies @@ -101,7 +97,7 @@ if("${BUILD_FFT_KERNEL}" AND NOT FFTW_FOUND) endif() ## CUDA + CUDAToolkit -if("${BUILD_FFT_KERNEL}" AND "${Kokkos_ENABLE_CUDA}" AND NOT("${HIP_FOR_NVIDIA}")) +if("${BUILD_FFT_KERNEL}" AND "${Kokkos_ENABLE_CUDA}") find_package( CUDAToolkit MODULE REQUIRED ) if( NOT(CUDAToolkit_FOUND) ) message(FATAL_ERROR "CUDAToolkit not found." ) @@ -239,30 +235,6 @@ if("${Kokkos_ENABLE_HIP}") target_compile_definitions(DDC INTERFACE hipfft_AVAIL) endif() -if("${BUILD_FFT_KERNEL}" AND "${HIP_FOR_NVIDIA}") - find_package( HIP REQUIRED ) - - target_include_directories(DDC - SYSTEM INTERFACE - "$" - ) - target_compile_definitions(DDC INTERFACE hip_AVAIL) - target_compile_definitions(DDC INTERFACE HIP_FOR_NVIDIA) - - if( DEFINED HIP_TOOLKIT_PATH ) # Usually called ROCM_PATH in the HIP documentation. By default : /opt/rocm - list( APPEND CMAKE_PREFIX_PATH ${HIP_TOOLKIT_PATH}/hipfft/lib ) - find_library(HIPFFT_LIB hipfft REQUIRED) - target_link_libraries( DDC INTERFACE ${HIPFFT_LIB} ) - target_include_directories(DDC - SYSTEM INTERFACE - "$" - ) - target_compile_definitions(DDC INTERFACE hipfft_AVAIL) - else() - message( "HIP_TOOLKIT_PATH is not defined. Kernels functions may be unaccessible. To get them, add -DHIP_TOOLKIT_PATH=\"path_to_hip_toolkit\" in your cmake line" ) - endif() -endif() - if("${BUILD_SPLINES_KERNEL}") # Ginkgo find_package(Ginkgo 1.6.0 EXACT REQUIRED) diff --git a/include/ddc/kernels/fft.hpp b/include/ddc/kernels/fft.hpp index 5db1e2664..5199f2c4c 100644 --- a/include/ddc/kernels/fft.hpp +++ b/include/ddc/kernels/fft.hpp @@ -470,7 +470,7 @@ void core( // std::cout << "performed with cufft"; } #endif -#if hipfft_AVAIL && !HIP_FOR_NVIDIA +#if hipfft_AVAIL else if constexpr (std::is_same_v) { hipStream_t stream = execSpace.hip_stream();