DiffusionGemma rethinks text generation with diffusion

June 12, 2026

DiffusionGemma applies diffusion-style denoising to text, trading autoregressive token-by-token decoding for iterative canvas refinement. Its design combines encoder guidance, bidirectional denoising, scheduling, and entropy-based sampling.

DiffusionGemma shifts text generation away from the standard autoregressive pattern used by Large Language Models. Autoregressive Large Language Models generate one token at a time and can efficiently serve many users because decoding is often memory bound rather than compute bound. The model starts with a sequence of 256 randomly initialized tokens, called a canvas, and tries to choose better tokens for the entire canvas all at the same time. By predicting 256 tokens at the same time, the compute budget of 256 tokens is focused on a single user instead of spreading it across many users.

The approach adapts diffusion, a process associated with image generation, to discrete text. For text, noise cannot be added to a token in the same continuous way as pixels, so DiffusionGemma uses uniform state diffusion rather than only masked diffusion. In forward diffusion, random tokens are used as noise to create a dataset the same way you would do with masked diffusion. In reverse diffusion, the model detects which tokens are noise, proposes replacements across the canvas, accepts confident positions, and re-noises low-probability positions so the canvas stays close to the distribution seen during training.

The solution is not to train from scratch but to use an existing checkpoint as a start instead, namely the Gemma 4 26B A4B model. Gemma 4 26B A4B is a Mixture of Experts model that already went through extensive training and has great performance. The architecture uses an encoder-denoiser patch that lets a decoder-only model switch between encoder mode, which processes the input query, and denoiser mode, which updates the canvas. In denoiser mode, causal attention is replaced with bidirectional attention so each token can attend to all other tokens in the sequence. The model also shares the encoder’s KV cache with the denoiser, allowing the denoising steps to reuse prompt context without recalculating it.

Inference combines iterative diffusion with autoregressive stitching. The canvas in DiffusionGemma has a size of 256 tokens, which isn’t all that big. Specifically, the system first generates the 256 tokens using DiffusionGemma. Those 256 tokens only need to be passed through the encoder once to generate the KV cache after which the denoiser takes a number of steps to fill up this canvas. When it is finished, the prompt is updated with the new 256 tokens and added to the input sequence of the encoder to extend the KV cache. Scheduling controls the maximum denoising steps, logits temperature, and adaptive stopping; in the configuration of DiffusionGemma, the confidence threshold is 0.005 and the stability threshold is 1. The default Entropy Bounded Sampler initializes the canvas with uniformly drawn random tokens, accepts tokens where entropy shows sufficient confidence, and re-noises rejected tokens for later refinement.

Source

58

Impact Score

Latest News

Beyer and Markey reintroduce Artificial Intelligence environmental reporting bill

June 12, 2026

The proposed law would require Artificial Intelligence data centers to disclose environmental and energy impacts. Federal agencies would also study the lifecycle effects of Artificial Intelligence infrastructure.

NVIDIA shows RTX Spark platform at Computex 2026

June 12, 2026

NVIDIA presented RTX Spark in Taipei as a Windows on Arm platform spanning gaming, creator, and Artificial Intelligence workloads. Microsoft also detailed Windows 11 optimizations built specifically for the new NVIDIA silicon.

AWS enterprise processor targets Artificial Intelligence inference

June 12, 2026

AWS’s Annapurna Labs-designed enterprise server processor uses a chiplet architecture for cloud infrastructure and Artificial Intelligence inferencing. The design combines Arm compute resources, cache coherency, and high-bandwidth interconnects for AWS deployments.

Adobe launches CX Enterprise Coworker for agentic Artificial Intelligence workflows

June 12, 2026

Adobe has made CX Enterprise Coworker generally available, positioning the agentic Artificial Intelligence product as a central orchestration layer for marketing and customer experience teams. The system coordinates analytics, content creation and journey workflows across Adobe tools and third-party platforms.

Google DeepMind funds research into multi-agent Artificial Intelligence risks

June 12, 2026

Google DeepMind and partner organizations are backing research into how large numbers of autonomous Artificial Intelligence agents may behave when they interact online. The effort aims to build a safety field before agent-based systems become widely deployed.

DiffusionGemma rethinks text generation with diffusion

58

Impact Score

Latest News

Beyer and Markey reintroduce Artificial Intelligence environmental reporting bill

NVIDIA shows RTX Spark platform at Computex 2026

AWS enterprise processor targets Artificial Intelligence inference

Adobe launches CX Enterprise Coworker for agentic Artificial Intelligence workflows

Google DeepMind funds research into multi-agent Artificial Intelligence risks

Contact Us