Model Recipes
Google

google/gemma-4-12B-it

Google's encoder-free unified Gemma 4 dense model (12B) with native text, image, and audio, plus thinking mode and tool-use protocol.

Encoder-free unified multimodal model with audio, structured thinking, and function calling — runs on a single 40 GB+ GPU

dense12B131,072 ctxvLLM nightly+multimodaltext
Guide