Model Recipes
Qwen

Qwen/Qwen3-4B

Qwen3 4B dense model with hybrid thinking/non-thinking modes — fits on a single TPU v6e chip or one GPU.

Verified on TPU v6e (Trillium) with BF16 on a single chip

dense4B40,960 ctxvLLM 0.8.5+text
Guide