Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Use vecLibFort instead of manual __ACCELERATE wrappers #650

Draft
wants to merge 7 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/testing-gcc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ jobs:

- name: Test
run: |
export LSAN_OPTIONS=suppressions=$PWD/tools/docker/lsan.supp
cd build
ctest --output-on-failure

Expand Down
20 changes: 9 additions & 11 deletions .github/workflows/testing-macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ on:
- 'develop'
pull_request:

# Workaround issue in Xcode 14.1/2
env:
DEVELOPER_DIR: /Applications/Xcode_14.0.1.app/Contents/Developer

jobs:
build-and-test:
runs-on: macos-latest
Expand All @@ -16,7 +20,7 @@ jobs:
use_openmp: [OPENMP=ON]
use_smm: [SMM=blas]
blas_impl: [accelerate,openblas]
mpi_suffix: [openmpi,mpich]
mpi_suffix: [openmpi]
exclude:
- use_mpi: MPI=OFF
mpi_suffix: mpich
Expand All @@ -27,19 +31,14 @@ jobs:
fetch-depth: 0
submodules: true

- name: Install dependencies
- name: Install common dependencies
run: |
env HOMEBREW_NO_AUTO_UPDATE=1 brew install \
ninja \
openmpi

- name: Unlink OpenMPI
run: |
brew unlink openmpi
ninja

- name: Install MPICH
- name: Install ${{ matrix.mpi_suffix }}
run: |
env HOMEBREW_NO_AUTO_UPDATE=1 brew install mpich
env HOMEBREW_NO_AUTO_UPDATE=1 brew install ${{ matrix.mpi_suffix }}

- name: Configure
run: |
Expand All @@ -53,7 +52,6 @@ jobs:
-DUSE_${{ matrix.use_openmp }} \
-DUSE_${{ matrix.use_smm }} \
$([ "${{ matrix.blas_impl }}" = "openblas" ] && echo '-DCMAKE_PREFIX_PATH=/usr/local/opt/openblas') \
-DMPIEXEC_EXECUTABLE="$([ "${{ matrix.mpi_suffix }}" = "openmpi" ] && command -v /usr/local/Cellar/open-mpi/*/bin/mpiexec || command -v /usr/local/Cellar/mpich/*/bin/mpiexec)" \
-DMPIEXEC_PREFLAGS="$([ "${{ matrix.mpi_suffix }}" = "openmpi" ] && echo "-mca btl ^openib --allow-run-as-root")" \
-DTEST_MPI_RANKS=1 \
..
Expand Down
13 changes: 13 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,18 @@ repos:
- id: check-yaml
- id: check-symlinks
- id: trailing-whitespace
exclude: >-
(?x)^(
tools/vecLibFort/.*|
)$
- repo: https://github.com/pseewald/fprettify
rev: v0.3.7
hooks:
- id: fprettify
exclude: >-
(?x)^(
tools/vecLibFort/.*|
)$
- repo: https://github.com/cheshirekow/cmake-format-precommit
rev: v0.6.13
hooks:
Expand Down Expand Up @@ -64,3 +72,8 @@ repos:
files: \.(c|cc|cxx|cpp|cl|frag|glsl|h|hpp|hxx|ih|ispc|ipp|java|js|m|mm|proto|textproto|vert)$
args: ['-i', '-fallback-style=none', '--style=file']
additional_dependencies: ['clang-format']
exclude: >-
(?x)^(
tools/vecLibFort/.*|
)$

17 changes: 17 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,23 @@ endif ()
find_package(LAPACK REQUIRED) # needed for some of the integrated test routines,
# also calls find_package(BLAS)

if (APPLE
AND (BLAS_LIBRARIES MATCHES "Accelerate"
OR BLAS_LIBRARIES MATCHES "vecLib" # automated search
OR BLA_VENDOR STREQUAL "Accelerate"
OR BLA_VENDOR STREQUAL "NAS" # user override
))
message(CHECK_START "Looking for vecLibFort library")
find_library(VECLIBFORT_LIBRARY vecLibFort)
if (NOT VECLIBFORT_LIBRARY)
message(CHECK_FAIL "not found, building it")
add_subdirectory(tools/vecLibFort)
set(VECLIBFORT_LIBRARY vecLibFort)
else ()
message(CHECK_PASS "found at " ${VECLIBFORT_LIBRARY})
endif ()
endif ()

# =================================== Python this module looks preferably for
# version 3 of Python. If not found, version 2 is searched. In CMake 3.15, if a
# python virtual environment is activated, it will search the virtual
Expand Down
8 changes: 6 additions & 2 deletions docs/guide/2-user-guide/1-installation/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,12 @@ You need:
* [CMake](https://cmake.org/) (3.22+)
* GNU make or Ninja
* Fortran compiler which supports at least Fortran 2008 (including the TS 29113 when using the C-bindings)
* BLAS+LAPACK implementation (reference, OpenBLAS and MKL have been tested. Note: DBCSR linked to OpenBLAS 0.3.6 gives wrong results on Power9 architectures.)
* Python version installed (2.7 or 3.6+ have been tested)
* BLAS+LAPACK implementation
* Reference BLAS/LAPACK, OpenBLAS and MKL have been tested and can be considered supported.
* On macOS [vecLibFort](https://github.com/mcg1969/vecLibFort) is required to use Accelerate and/or vecLib.
The build system will automatically build a bundled version if not found on the system.
* DBCSR linked to OpenBLAS 0.3.6 gives wrong results on Power9 architectures.
* Python version installed (3.6+ have been tested)

Optional:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ Assumed square matrix with 20x20 matrix with 5x5 blocks and a 2x2 processor grid
| `__NO_STATM_ACCESS`, `__STATM_RESIDENT` or `__STATM_TOTAL` | Toggle memory usage reporting between resident memory and total memory. In particular, macOS users must use `-D__NO_STATM_ACCESS` | Fortran |
| `__NO_ABORT` | Avoid calling abort, but STOP instead (useful for coverage testing, and to avoid core dumps on some systems) | Fortran |
| `__LIBXSMM` | Enable [LIBXSMM](https://github.com/hfp/libxsmm/) link for optimized small matrix multiplications on CPU | Fortran |
| `__ACCELERATE` | Must be defined on macOS when Apple's Accelerate framework is used for BLAS and LAPACK (this is due to some interface incompatibilities between Accelerate and reference BLAS/LAPACK) | Fortran |
| `NDEBUG` | Assertions are stripped ("compiled out"), `NDEBUG` is the ANSI-conforming symbol name (not `__NDEBUG`). Regular release builds may carry assertions for safety | Fortran, C, C++ |
| `__CRAY_PM_ACCEL_ENERGY` or `__CRAY_PM_ENERGY` | Switch on collectin energy profiling on Cray systems | Fortran |
| `__DBCSR_ACC` | Enable Accelerator compilation | Fortran, C, C++ |
Expand Down
5 changes: 3 additions & 2 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -176,8 +176,8 @@ if (APPLE)
# fix /proc/self/statm can not be opened on macOS
target_compile_definitions(dbcsr PRIVATE __NO_STATM_ACCESS)

if (BLAS_LIBRARIES MATCHES "Accelerate")
target_compile_definitions(dbcsr PRIVATE __ACCELERATE)
if (VECLIBFORT_LIBRARY)
target_link_libraries(dbcsr PRIVATE ${VECLIBFORT_LIBRARY})
endif ()
endif ()

Expand Down Expand Up @@ -243,6 +243,7 @@ if (USE_ACCEL)
target_link_libraries(
dbcsr
PRIVATE $<$<STREQUAL:${USE_ACCEL},cuda>:CUDA::cudart>
$<$<STREQUAL:${USE_ACCEL},cuda>:CUDA::cuda_driver>
$<$<STREQUAL:${USE_ACCEL},cuda>:CUDA::cublas>
$<$<STREQUAL:${USE_ACCEL},cuda>:CUDA::nvrtc>
$<$<BOOL:${WITH_CUDA_PROFILING}>:CUDA::nvToolsExt>
Expand Down
6 changes: 5 additions & 1 deletion src/acc/hip/acc_hip.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,11 @@

#include <hip/hip_runtime.h>
#include <hip/hip_runtime_api.h>
#include <hipblas.h>
#if __has_include(<hipblas/hipblas.h>)
# include <hipblas/hipblas.h>
#else
# include <hipblas.h>
#endif
#include <hip/hiprtc.h>

#define ACC(x) hip##x
Expand Down
4 changes: 0 additions & 4 deletions src/mm/dbcsr_mm_common.F
Original file line number Diff line number Diff line change
Expand Up @@ -579,11 +579,7 @@ SUBROUTINE calc_norms_${nametype1}$ (norms, nblks, &
INTEGER :: blk, bp, bpe, row, col

REAL(KIND=real_8), EXTERNAL :: DDOT
#if defined (__ACCELERATE)
REAL(KIND=real_8), EXTERNAL :: SDOT
#else
REAL(KIND=real_4), EXTERNAL :: SDOT
#endif

! ---------------------------------------------------------------------------

Expand Down
4 changes: 0 additions & 4 deletions src/mm/dbcsr_mm_multrec.F
Original file line number Diff line number Diff line change
Expand Up @@ -707,11 +707,7 @@ SUBROUTINE multrec_filtering_${nametype1}$ (filter_eps, nblks, rowi, coli, blkp,
REAL(kind=real_8) :: nrm

REAL(KIND=real_8), EXTERNAL :: DZNRM2, DDOT
#if defined (__ACCELERATE)
REAL(KIND=real_8), EXTERNAL :: SCNRM2, SDOT
#else
REAL(KIND=real_4), EXTERNAL :: SCNRM2, SDOT
#endif

REAL(kind=real_8) :: filter_eps_opt

Expand Down
4 changes: 0 additions & 4 deletions src/ops/dbcsr_operations.F
Original file line number Diff line number Diff line change
Expand Up @@ -1910,11 +1910,7 @@ SUBROUTINE dbcsr_filter_anytype(matrix, eps, method, &
TYPE(dbcsr_iterator) :: iter

REAL(KIND=real_8), EXTERNAL :: DZNRM2
#if defined (__ACCELERATE)
REAL(KIND=real_8), EXTERNAL :: SCNRM2
#else
REAL(KIND=real_4), EXTERNAL :: SCNRM2
#endif

! ---------------------------------------------------------------------------

Expand Down
12 changes: 4 additions & 8 deletions tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -95,23 +95,20 @@ set(dbcsr_unittest_common_SRCS dbcsr_test_add.F dbcsr_test_multiply.F)
# instead of building a full-blown lib, it would be better to simply build an
# OBJECT lib, but we would need cmake 3.12 to be able to specify
# target_link_libraries on those to get the proper compile flags
add_library(dbcsr_unittest_common STATIC ${dbcsr_unittest_common_SRCS})
add_library(dbcsr_unittest_common OBJECT ${dbcsr_unittest_common_SRCS})
target_link_libraries(dbcsr_unittest_common PUBLIC dbcsr)
target_link_libraries(dbcsr_unittest_common PUBLIC ${BLAS_LIBRARIES}
${LAPACK_LIBRARIES})
if (OpenMP_FOUND)
target_link_libraries(dbcsr_unittest_common PUBLIC OpenMP::OpenMP_Fortran)
endif ()

if (APPLE AND BLAS_LIBRARIES MATCHES "Accelerate")
target_compile_definitions(dbcsr_unittest_common PRIVATE __ACCELERATE)
endif ()
target_link_libraries(dbcsr_unittest_common PUBLIC dbcsr)

# Compile Fortran tests
foreach (dbcsr_test ${DBCSR_TESTS_FTN})
add_executable(${dbcsr_test} ${${dbcsr_test}_SRCS})
target_link_libraries(${dbcsr_test} dbcsr_unittest_common)
target_link_libraries(${dbcsr_test} PUBLIC dbcsr_unittest_common)
set_target_properties(${dbcsr_test} PROPERTIES LINKER_LANGUAGE Fortran)

# register unittest executable with CMake
if (USE_MPI)
separate_arguments(MPIEXEC_PREFLAGS)
Expand All @@ -124,7 +121,6 @@ foreach (dbcsr_test ${DBCSR_TESTS_FTN})
add_test(NAME ${dbcsr_test} COMMAND ${dbcsr_test})
endif ()
if (OpenMP_FOUND)
target_link_libraries(${dbcsr_test} OpenMP::OpenMP_Fortran)
set_tests_properties(
${dbcsr_test} PROPERTIES ENVIRONMENT OMP_NUM_THREADS=${TEST_OMP_THREADS})
endif ()
Expand Down
4 changes: 0 additions & 4 deletions tests/dbcsr_test_add.F
Original file line number Diff line number Diff line change
Expand Up @@ -377,11 +377,7 @@ SUBROUTINE dbcsr_check_add(test_name, matrix_a, dense_a_dbcsr, dense_a, dense_b,

LOGICAL :: valid
REAL(real_4), ALLOCATABLE, DIMENSION(:) :: work_sp
#if defined (__ACCELERATE)
REAL(real_8), EXTERNAL :: clange, slamch, slange
#else
REAL(real_4), EXTERNAL :: clange, slamch, slange
#endif
REAL(real_8) :: a_norm_dbcsr, a_norm_in, a_norm_out, &
b_norm, eps, residual
REAL(real_8), ALLOCATABLE, DIMENSION(:) :: work
Expand Down
4 changes: 0 additions & 4 deletions tests/dbcsr_test_multiply.F
Original file line number Diff line number Diff line change
Expand Up @@ -553,11 +553,7 @@ SUBROUTINE dbcsr_check_multiply(test_name, matrix_c, dense_c_dbcsr, dense_a, den

LOGICAL :: valid
REAL(real_4), ALLOCATABLE, DIMENSION(:) :: work_sp
#if defined (__ACCELERATE)
REAL(real_8), EXTERNAL :: clange, slamch, slange
#else
REAL(real_4), EXTERNAL :: clange, slamch, slange
#endif
REAL(real_8) :: a_norm, b_norm, c_norm_dbcsr, c_norm_in, &
c_norm_out, eps, eps_norm, residual
REAL(real_8), ALLOCATABLE, DIMENSION(:) :: work
Expand Down
2 changes: 2 additions & 0 deletions tools/docker/lsan.supp
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# leak due to compiler bug triggered by combination of OOP and ALLOCATABLE
leak:__dbcsr_tensor_types_MOD___copy_dbcsr_tensor_types_Dbcsr_tas_dist_t
leak:__dbcsr_tensor_types_MOD___copy_dbcsr_tensor_types_Dbcsr_tas_blk_size_t
# similar case, for gcc-13+
leak:__dbcsr_tas_global_MOD___copy_dbcsr_tas_global_Dbcsr_tas_blk_size_arb
11 changes: 11 additions & 0 deletions tools/vecLibFort/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
add_library(vecLibFort STATIC vecLibFort.c)

if (CMAKE_C_COMPILER_ID STREQUAL "GNU")
target_compile_options(vecLibFort PRIVATE -flax-vector-conversions)
endif ()

install(
TARGETS vecLibFort
EXPORT DBCSRTargets
LIBRARY DESTINATION "${CMAKE_INSTALL_LIBDIR}"
ARCHIVE DESTINATION "${CMAKE_INSTALL_LIBDIR}")
23 changes: 23 additions & 0 deletions tools/vecLibFort/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
Boost Software License - Version 1.0 - August 17th, 2003

Permission is hereby granted, free of charge, to any person or organization
obtaining a copy of the software and accompanying documentation covered by
this license (the "Software") to use, reproduce, display, distribute,
execute, and transmit the Software, and to prepare derivative works of the
Software, and to permit third-parties to whom the Software is furnished to
do so, all subject to the following:

The copyright notices in the Software and this entire statement, including
the above license grant, this restriction and the following disclaimer,
must be included in all copies of the Software, in whole or in part, and
all derivative works of the Software, unless such copies or derivative
works are solely in the form of machine-executable object code generated by
a source language processor.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
49 changes: 49 additions & 0 deletions tools/vecLibFort/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
PREFIX=/usr/local
LIBDIR=$(PREFIX)/lib

CFLAGS=-O

NAME=vecLibFort
SOURCE=$(NAME).c
OBJECT=$(NAME).o
LIBRARY=lib$(NAME)
STATIC=$(LIBRARY).a
DYNAMIC=$(LIBRARY).dylib
PRELOAD=$(LIBRARY)I.dylib
INCLUDES=cloak.h static.h
DEPEND=$(INCLUDES) Makefile

all: static dynamic preload
static: $(STATIC)
dynamic: $(DYNAMIC)
preload: $(PRELOAD)

$(OBJECT): $(DEPEND)

$(STATIC): $(OBJECT)
ar -cru $@ $^
ranlib $@

$(DYNAMIC): $(OBJECT)
clang -shared -o $@ $^ \
-Wl,-reexport_framework -Wl,Accelerate \
-install_name $(LIBDIR)/$@

$(PRELOAD): $(SOURCE) $(DEPEND)
clang -shared $(CFLAGS) -DVECLIBFORT_INTERPOSE -o $@ -O $(SOURCE) \
-Wl,-reexport_framework -Wl,Accelerate \
-install_name $(LIBDIR)/$@

install: all
mkdir -p $(LIBDIR)
cp -f $(STATIC) $(LIBDIR)
cp -f $(DYNAMIC) $(LIBDIR)
cp -f $(PRELOAD) $(LIBDIR)

clean:
rm -f $(OBJECT) $(STATIC) $(DYNAMIC) $(PRELOAD)

check: tester.f90 $(OBJECT)
gfortran -o tester -O $^ -framework Accelerate
./tester

Loading