mistralai/Voxtral-Mini-4B-Realtime-2602
Multilingual realtime speech transcription (13 languages) with a natively streaming causal audio encoder; configurable 80ms–2.4s transcription delay served via vLLM's Realtime API
Matches offline open-source ASR accuracy at 480ms delay; >12.5 tok/s on a single 16GB GPU