Model Recipes
Meta

meta-llama/Llama-4-Scout-17B-16E-Instruct

Llama 4 Scout 17B-16E MoE model with NVIDIA FP8/FP4 variants, fits on a single GPU with quantization

moe109B / 17B10,485,760 ctxvLLM 0.12.0+text
Guide