google/gemma-4-12B-it
Google's encoder-free unified Gemma 4 dense model (12B) with native text, image, and audio, plus thinking mode and tool-use protocol.
Encoder-free unified multimodal model with audio, structured thinking, and function calling — runs on a single 40 GB+ GPU