Model Recipes
DeepSeek

deepseek-ai/DeepSeek-V4-Pro

DeepSeek V4 flagship MoE (1.6T total / 49B active) with hybrid CSA+HCA attention, manifold-constrained hyper-connections, Muon-trained on 32T+ tokens, and three-tier reasoning.

Frontier 1.6T/49B reasoning MoE with native FP4+FP8 weights, MTP speculative decoding, and 1M-token context

moe1600B / 49B1,048,576 ctxvLLM 0.20.0+text
Guide