Model Recipes
Google

google/diffusiongemma-26B-A4B-it

Google's DiffusionGemma — a block-diffusion language model built on Gemma 4's MoE backbone (26B total / 4B active). Generates tokens via iterative denoising over a fixed-length canvas rather than left-to-right autoregressive decoding, enabling higher throughput with parallel block generation.

Block-diffusion MoE — 26B total / 4B active, canvas-based parallel generation with ~1.9x throughput vs autoregressive baseline

moe26B / 4B262,144 ctxvLLM day0-docker+multimodaltext
Guide