Model Recipes
Google

google/gemma-4-E4B-it

Google's compact Gemma 4 multimodal model (effective 4B) with native text, image, and audio, plus thinking mode and tool-use protocol.

Effective-4B unified multimodal model with audio, thinking, and function calling — runs on a single 24 GB+ GPU

dense8B131,072 ctxvLLM 0.19.1+multimodaltext
Guide