Model Recipes
Mistral AI

mistralai/Voxtral-Mini-4B-Realtime-2602

Multilingual realtime speech transcription (13 languages) with a natively streaming causal audio encoder; configurable 80ms–2.4s transcription delay served via vLLM's Realtime API

Matches offline open-source ASR accuracy at 480ms delay; >12.5 tok/s on a single 16GB GPU

dense4.4B131,072 ctxvLLM 0.20.0+multimodal
Guide