Model Recipes
StepFun

stepfun-ai/Step-3.5-Flash

Production-grade reasoning MoE (~196B total / 11B active parameters) with hybrid attention schedules, SWA compensation, and multi-token prediction for low-latency long-context inference

Sparse MoE reasoning model with hybrid attention and step3p5 MTP speculative decoding

moe196B / 11B262,144 ctxvLLM 0.11.0+text
Guide