NVIDIA announces CUDA Tile in CUDA 13.1

CUDA 13.1 introduces CUDA Tile, a virtual instruction set for tile-based parallel programming that raises the programming abstraction above SIMT and abstracts tensor cores to support current and future tensor core architectures. The change targets workloads including Artificial Intelligence where tensors are a fundamental data type.

NVIDIA is introducing CUDA Tile as part of CUDA 13.1, calling it the largest advancement to the NVIDIA CUDA platform since it was invented in 2006. CUDA Tile provides a virtual instruction set for tile-based parallel programming, enabling developers to express algorithms at a higher level than the traditional single-instruction, multiple-thread (SIMT) model. The announcement frames CUDA Tile as a way to reduce the complexity of writing high-performance GPU code across multiple architectures by abstracting low-level hardware details.

The article positions CUDA Tile alongside existing libraries such as NVIDIA CUDA-X and NVIDIA CUTLASS, which already help developers extract performance from GPUs. It explains that with evolving workloads, especially in Artificial Intelligence, tensors have become a fundamental data type and that NVIDIA has created specialized hardware to accelerate tensor operations. The hardware named in the article includes NVIDIA Tensor Cores (TC) and NVIDIA Tensor Memory Accelerators (TMA), both of which the article says are integral to every new GPU architecture. CUDA Tile aims to hide the specifics of these tensor hardware units so that code written with CUDA Tile remains compatible across current and future tensor core designs.

The piece contrasts the flexibility and fine-grained control afforded by the SIMT programming model with the additional effort required to achieve good performance across architectures. By introducing a higher-level tile abstraction and a virtual instruction set, CUDA Tile is presented as a path to write algorithms without needing to manage specialized programming models for tensor cores directly. The article focuses on the compatibility benefits and the potential for simplified development workflows for tensor-heavy workloads, particularly those common in Artificial Intelligence.

68

Impact Score

Europe’s Artificial Intelligence challenge is structural dependence

Europe has talent, research strength, and rising investment in Artificial Intelligence, but startups remain reliant on American infrastructure, platforms, and late-stage capital. The argument centers on digital sovereignty, interoperability, and ownership as the conditions for building durable European champions.

Community backlash slows Artificial Intelligence data center expansion

Political resistance, regulatory scrutiny, and rising energy and water concerns are complicating the build-out of large Artificial Intelligence data centers across the United States. The pressure is increasing costs, delaying projects, and adding fresh risks to the economics behind Generative Artificial Intelligence infrastructure.

House panel advances export controls after China report

The House Foreign Affairs Committee moved export control legislation after a House Select Committee report detailed China’s use of illegal means to build its Artificial Intelligence and semiconductor sectors. The measure is aimed at chip smuggling and Artificial Intelligence model theft.

Intel repurposes scrap dies to expand CPU supply

Intel is repurposing wafer-edge and lower-yield silicon that would normally be discarded into sellable CPUs as industry demand outpaces supply. The strategy reflects a market where customers are willing to buy lower-tier parts to secure any available capacity.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.