- Documentation for
Deviceptr.mem_free
: mention it's safe to call multiple times on the same pointer.
Deviceptr.equal
andDeviceptr.hash
.
- Debug number of total and unreleased events in a stream.
- Debug the total number of non-garbage-collected streams across all devices.
- Verifies that compilation options fit in a set of characters: alphanumeric and a few interpunction.
- Docu-comment typo.
- The flags
cu_event_wait_external
andcu_event_wait_default
were switched around forrecord ?external_
andwait ?external_
event functions. - Don't destroy released (destroyed) events in
Delimited_event.synchronize
.
get_free_and_total_mem
.- Multiple missing
sexp_of
conversions. cuda_call_hook
to help in debugging.is_success
functions.mem_alloc_async
i.e.Stream.mem_alloc
, andmem_free_async
i.e.Stream.mem_free
.Stream.mem_free
is attached as a finalizer byStream.mem_alloc
(with stream capture).
- Removed
Module.unload
, insteadModule.load_data_ex
attaches an unload as a finalizer (with context capture). Deviceptr.mem_free
is attached as a finalizer, but still available for "tight" memory management.
- Now detecting use-after-free for device memory pointers.
- CUDA events.
- Delimited events: they are owned by a stream they record, and are automatically destroyed after synchronization.
- Partitioned the API into modules.
- Removed
destroy
functions from the interface, attaching them as finalizers.
- Fixed broken types for
can_access_peer
andget_p2p_attributes
.
- Pass the $CUDA_PATH/include path to the nvrtc compiler; otherwise it will not
#include
anything. - Work around
Ctypes.bigarray_start
andtyp_of_bigarray_kind
becausectypes
does not support half precision.
- Previously commented out parts, that require a newer version of the CUDA API.
- Interface file
cudajit.mli
with documentation. - Expose context limits. Print default limits in
bin/properties
. sexp_of_kernel_param
- Dropped
JIT_
prefix forjit_option
values. - Self-contained types in the interface, with some corrections and renaming.
- Formatting: line length 100.
- A major bug, exacerbated by the asynchronous functionaliy of v0.3 -- functions performing asynchronous calls should keep the call arguments alive; the user should only forget (or free) the arguments after the calls complete (e.g. after synchronizing a stream).
- Only
launch_kernel
needed fixing as I don't think other async functions allocate passed arguments. - We hanlde this internally so no API change!
- Only
- Support for streams (except
cuStreamWaitEvent
and graph capture). - Support for asynchronous copying, including
cuMemcpyPeerAsync
.
- Renamed
byte_size
tosize_in_bytes
.
- Support for peer-to-peer device-to-device copying.
- Support for context flags.
ctx_create
properly handles context flags.
- Continuous Integration on GitHub thanks to GitHub action Jimver/cuda-toolkit, but only PTX compilation.
- Test target should erase compiler versions.
- Initial stand-alone release. For earlier changes, see e.g. ocannl/cudajit @ 2 months ago
- To be defensive, pass
-I
and-L
arguments to the compiler and linker with the default paths on linux-like systems.