CUDA Toolkit: features, tutorials and developer resources

The NVIDIA CUDA Toolkit provides a GPU development environment and tools for building, optimizing, and deploying GPU-accelerated applications. CUDA Toolkit 13.0 adds new programming-model and toolchain enhancements and explicit support for the NVIDIA Blackwell architecture.

The NVIDIA CUDA Toolkit offers a development environment for creating high-performance, GPU-accelerated applications across embedded systems, desktop workstations, data centers, cloud platforms, and supercomputers. The toolkit bundles GPU-accelerated libraries, debugging and optimization tools, a C/C++ compiler, and a runtime library, and the site includes a Download Now call to action and links to join the NVIDIA developer program for free tools and trainings.

CUDA Toolkit 13.0, now available for general use, introduces foundational enhancements including a tile-based programming model and unified developer experience on Arm platforms. The release highlights improvements to NVIDIA Nsight developer tools, math libraries, the NVCC compiler, and Accelerated Python. The toolkit offers built-in capabilities for distributing computations across multi-GPU configurations so applications can scale from single-GPU workstations to cloud installations with thousands of GPUs.

The toolkit adds explicit support for the NVIDIA Blackwell architecture, including next-generation Tensor Cores and Transformer Engine, the high-speed NVIDIA NVLink Switch, and mixed-precision modes with support for FP4. It continues to support standard C++ , Fortran, and Python parallel language constructs. NVIDIA also publishes tutorials and GTC Digital webinars covering compatibility, Jetson device upgrades, profiling and debugging, and how to write CUDA programs, alongside customer stories showcasing scientific and industry use cases.

Resources linked from the CUDA page include comprehensive documentation and release notes, technical blogs explaining CUDA 13 features, and CUDA containers in the NGC catalog. The site directs developers to CUDA-X libraries for Artificial Intelligence, data science, and math, self-paced and instructor-led training via the Deep Learning Institute, Nsight developer tools, sample CUDA code on GitHub, developer forums for technical support, and an internal bug submission system. The page aggregates featured blogs and latest news and emphasizes access to SDKs, trainings, and community connections for developers, researchers, and students.

75

Impact Score

Qwen 1M Integration Example with vLLM

Demonstrating how to use the Qwen/Qwen2.5-7B-Instruct-1M model in the vLLM framework for efficient long-context inference in Artificial Intelligence applications.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.