C+
Packages · v0.0.15

cuda

Typed bindings to the CUDA Runtime and cuBLAS for NVIDIA GPU compute. This is plain C FFI over the vendor SDK: C+ stays a consumer of CUDA, with no kernel language and nothing to reimplement. For the wider GPU/numerics picture and the backend matrix, see GPU & numerics.

Two sub-modules:

  • cuda/runtime — device management and memory. A DeviceBuffer owns device memory and frees it on Drop (cudaFree), so device allocations follow the same ownership rules as host memory.
  • cuda/cublas — a cuBLAS Handle (created once, freed on Drop via cublasDestroy) exposing the dense Level-3 / Level-2 routines sgemm and sgemv, column-major, matching the cuBLAS ABI exactly.
import "cuda/runtime" as cuda;
import "cuda/cublas" as cublas;

guard let cublas::Handle::new() as h else { return 1; }   // Drop = cublasDestroy

// Inputs live in DeviceBuffers; each frees its device memory on scope exit.
let dA: cuda::DeviceBuffer = cuda::DeviceBuffer::from_host(a_host);
let dB: cuda::DeviceBuffer = cuda::DeviceBuffer::from_host(b_host);
let dC: cuda::DeviceBuffer = cuda::DeviceBuffer::zeros(m * n);

// Column-major C = alpha*A*B + beta*C.
h.sgemm(m, n, k, 1.0f32, dA, dB, 0.0f32, dC);

dC.copy_to_host(c_host);

Because cuBLAS is column-major, lay out matrices column-major (or pass the transpose flags) exactly as you would from C. The Handle and every DeviceBuffer release their resources deterministically at scope exit; there is no explicit teardown to forget.

Linking

The CUDA libraries usually live in lib64, outside the linker's default search path. List the directory in your manifest so it resolves at both link and run time without LD_LIBRARY_PATH:

[link]
search-paths = ["/usr/local/cuda/lib64"]

For the CPU fallback and a result-checking reference, see cblas and accelerate.