0.4.1: support for half precision

lukstafi released this 12 Sep 10:10

· 49 commits to main since this release

f6d5821

In this release:

We pass the $CUDA_PATH/include path to the nvrtc compiler; otherwise e.g. #include <cuda_fp16.h> will not work. The user could already be doing this, but since we monitor the installation via conf-cuda, it's better to prepend the option automatically.
We work around ctypes not supporting the Float16 type.

Assets 2

Provide feedback