h100 articles — AI Newsroom

DiffusionGemma: 1,000+ Tokens/Sec Open-Weights Text Gen

Jun 15, 2026·15 min read·8 sources

google-deepmind gemma diffusiongemma text-diffusion open-weights+9

On June 10, 2026, Google DeepMind releases DiffusionGemma, a 26B/3.8B active model based on Gemma 4 with a text-diffusion approach that denoises 256-token blocks in parallel. Up to 1,000+ tokens/sec on a single H100, ~4x faster than an equivalent autoregressive model, Apache 2.0 license, day-one support on vLLM, Hugging Face Transformers, Unsloth, NeMo. Explicitly labeled as 'experimental': quality is below Gemma 4 AR standard — speed is the point, not peak quality.