cuda
Typed bindings to the CUDA Runtime and cuBLAS for NVIDIA GPU compute. This is plain C FFI over the vendor SDK: C+ stays a consumer of CUDA, with no kernel language and nothing to reimplement. For the wider GPU/numerics picture and the backend matrix, see GPU & numerics.
Two sub-modules:
cuda/runtime— device management and memory. ADeviceBufferowns device memory and frees it onDrop(cudaFree), so device allocations follow the same ownership rules as host memory.cuda/cublas— a cuBLASHandle(created once, freed onDropviacublasDestroy) exposing the dense Level-3 / Level-2 routinessgemmandsgemv, column-major, matching the cuBLAS ABI exactly.
import "cuda/runtime" as cuda;
import "cuda/cublas" as cublas;
guard let cublas::Handle::new() as h else { return 1; } // Drop = cublasDestroy
// Inputs live in DeviceBuffers; each frees its device memory on scope exit.
let dA: cuda::DeviceBuffer = cuda::DeviceBuffer::from_host(a_host);
let dB: cuda::DeviceBuffer = cuda::DeviceBuffer::from_host(b_host);
let dC: cuda::DeviceBuffer = cuda::DeviceBuffer::zeros(m * n);
// Column-major C = alpha*A*B + beta*C.
h.sgemm(m, n, k, 1.0f32, dA, dB, 0.0f32, dC);
dC.copy_to_host(c_host);
Because cuBLAS is column-major, lay out matrices column-major (or pass the
transpose flags) exactly as you would from C. The Handle and every
DeviceBuffer release their resources deterministically at scope exit; there is
no explicit teardown to forget.
Linking
The CUDA libraries usually live in lib64, outside the linker's default search
path. List the directory in your manifest so it resolves at both link and run
time without LD_LIBRARY_PATH:
[link]
search-paths = ["/usr/local/cuda/lib64"]
For the CPU fallback and a result-checking reference, see cblas and accelerate.