Model Recipes
Google

google/gemma-4-E2B-it

Google's compact Gemma 4 multimodal model (effective 2B) with native text, image, and audio, plus thinking mode and tool-use protocol.

Compact unified multimodal model with audio, thinking, and function calling — runs on a single 24 GB+ GPU

dense5B131,072 ctxvLLM 0.19.1+multimodaltext
Guide