Model Recipes
DeepSeek

deepseek-ai/DeepSeek-V3.2-Exp

Experimental DeepSeek-V3.2 preview with sparse attention (MQA-like logits) and FP8 KV cache; architecture matches DeepSeek-V3.1 except for the sparse attention mechanism.

Sparse attention MoE with FP8 KV cache and strong GSM8K score (~0.96)

moe671B / 37B163,840 ctxvLLM 0.12.0+text
Guide