Google DiffusionGemma applies diffusion techniques to Artificial Intelligence text generation

Google DeepMind has introduced DiffusionGemma, an experimental open-weights model designed to generate text using diffusion methods associated with image generation. The model targets faster local inference on consumer hardware rather than enterprise deployment.

Google’s DeepMind team unveiled an experimental new language model this week that uses techniques originally developed for Artificial Intelligence image generators to boost text output performance by as much as 4x when running on resource-constrained consumer hardware. It is free to download and can run with just 18 GB of DRAM or VRAM. The model, codenamed DiffusionGemma, joins Google’s open weights model family and is aimed at local deployment rather than cloud-scale inference.

But unlike Gemma 4, which launched this spring, the 26 billion-parameter mixture of experts model is not a large language model in a conventional sense. DiffusionGemma is closer to image models such as Stable Diffusion or Flux because it does not generate tokens one after another in an autoregressive sequence. Instead, it generates entire paragraphs of tokens at the same time, starting with a canvas of random tokens and refining them through denoising steps until the final output is reached.

Google is positioning the approach as a way to better use consumer hardware. Conventional large language models are memory-bandwidth bound because active parameters need to be streamed from memory for every token generated, making VRAM and bandwidth major constraints. Diffusion models are described as more compute-bound, which can help high-end graphics cards use excess processing capacity to improve output performance for local inference.

DiffusionGemma also reflects the tradeoffs seen in earlier diffusion language models. According to Google, the 26 billion-parameter model falls just behind Gemma 4 12B in the GPQA-Diamond benchmark, with its main advantage being output speed. The chart shows a roughly 2.25x speedup for DiffusionGemma over the 12B parameter large language model with speculative decode enabled. Compared to Gemma 4 26B-A4B, the speedup is nearly 4x when running a single Nvidia H100.

Google is releasing DiffusionGemma as an experimental model rather than an enterprise focused one. The model is available on repositories including Hugging Face under a highly permissive Apache 2.0 license, with support already merged into vLLM, MLX, and HF Transformers, and Llama.cpp support coming soon. Google has also been leaning more heavily on local inference, including a small large language model shipped with Chrome in May, as a way to reduce cloud costs tied to Artificial Intelligence services.

76

Impact Score

Artificial Intelligence gains ground at Le Mans

Artificial Intelligence tools are moving from back-office aids into design, preparation and race strategy at Le Mans. Teams still face confidentiality barriers before feeding sensitive performance data into external systems.

AMD opens Ryzen Artificial Intelligence Halo mini PC pre-orders

AMD’s Strix Halo-powered developer platform is now listed for pre-order through Micro Center in the US. The compact kit targets Artificial Intelligence developers with a shared-memory Ryzen Artificial Intelligence Max+ platform and Linux or Windows options.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.