Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ocl: pointer-arithmetic for device-pointers
* Implemented pointer-arithmetic for device-pointers using Intel's USM as well as fallback code. * Fallback to main-thread's stream (c_dbcsr_acc_opencl_stream_default). * Fixed c_dbcsr_acc_opencl_stream_default and reduce one level of indirection. * Reworked entire memory allocation (determining offsets). * Consolidated compile-time decisions about LIBXSMM_VERSION_NUMBER. * Removed runtime decisions accounting for pooled allocations. * Removed support for performance estimation and suitability. * Support older LIBXSMM (pooled memory allocations). * Set ACC_OPENCL_ATOMIC_KIND to sequentially consistent; set ACC_OPENCL_NLOCKS=1. * Complemented ACC_OPENCL_NLOCKS with environment variable. * Introduced ACC_OPENCL_OMPLOCKS, ACC_OPENCL_MEM_DEBUG, ACC_OPENCL_EVENT_FLUSH. * Implemented behavior of c_dbcsr_acc_opencl_stream_default already in c_dbcsr_acc_opencl_stream. * Cache active device-ID to avoid determining context/properties (c_dbcsr_acc_set_active_device). * Support event chain (dependency), improved handling errors (c_dbcsr_acc_stream_wait_event). * Support event chain (dependency), improved handling errors (c_dbcsr_acc_event_record). * Introduced lock-arguments (internal, e.g., c_dbcsr_acc_opencl_set_active_device). * Consolidated domain-locks into c_dbcsr_acc_opencl_config. * Made build-log available (c_dbcsr_acc_opencl_kernel). * Reworked stream-registry and stream-info facility. * Consolidated tuned parameters, and updated tuned parameters. * Use "int" instead of "cl_int" when taking the return-code. * Consistently use EXIT_SUCCESS instead of CL_SUCCESS. * Removed support for ACC_OPENCL_OVERMALLOC. * Removed support for per-thread device. * Removed ACC_OPENCL_EVENT_BARRIER. * Introduced ACC_OPENCL_MEM_TLS (disabled). * Simplified c_dbcsr_acc_opencl_memset. * Support ACC_OPENCL_STREAM_NULL in event facility. * Introduced assertion (dbcsr_acc_devmem.F). * Fixed using size_t as kernel argument. * Introduced UNROLL_AUTO.
- Loading branch information