Model Recipes
Qwen

Qwen/Qwen3.5-397B-A17B

Multimodal MoE model with gated delta networks architecture, 397B total / 17B active parameters, up to 262K context

Verified on 8x H200, 8x MI300X/MI355X, and GB200 nodes

moe397B / 17B262,144 ctxvLLM 0.17.0+multimodaltext
Guide