stepfun-ai/Step-3.5-Flash
Production-grade reasoning MoE (~196B total / 11B active parameters) with hybrid attention schedules, SWA compensation, and multi-token prediction for low-latency long-context inference
Sparse MoE reasoning model with hybrid attention and step3p5 MTP speculative decoding