slime Documentation#

slime is an LLM post-training framework for RL scaling, providing two core capabilities:

  • High-Performance Training: Supports efficient training in various modes by connecting Megatron with SGLang;

  • Flexible Data Generation: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines.

slime’s design goal is to make these two capabilities reinforce each other without turning the system into a heavy stack of disconnected trainers, rollout services, and agent frameworks. Megatron training, SGLang rollout, custom data generation, reward computation, verifier feedback, and environment interaction all flow through the same training / rollout / Data Buffer path.

This makes slime one of the most battle-tested open RL post-training frameworks: small enough to understand and extend, but validated through complete training loops behind SOTA-level model releases.

Why This Design Matters#

  • Battle-tested by frontier model training: slime is the RL framework behind GLM-5.1, GLM-5, GLM-4.7, GLM-4.6, and GLM-4.5.

  • Native by design: slime passes Megatron arguments through directly and exposes installed SGLang arguments with a --sglang- prefix, so upstream training and serving optimizations remain available without adding another wrapper layer.

  • SGLang-focused rollout: slime chooses one rollout backend intentionally. This avoids flattening multiple inference engines into a lowest-common-denominator abstraction and lets RL workloads use SGLang-specific serving, routing, caching, disaggregation, and weight-sync behavior directly.

  • Agentic workflows as data generation: tool use, sandbox interaction, verifier rewards, environment feedback, multi-agent loops, and long-horizon agentic workflows plug into the same training / rollout / Data Buffer path instead of forking the training kernel.

  • BF16 training with FP8 rollout: large MoE recipes use Megatron BF16 training state with SGLang FP8 rollout/inference; long-context rollout can also use --sglang-kv-cache-dtype fp8_e4m3 to increase effective KV cache capacity.

  • Tested as RL infrastructure: CPU correctness tests run automatically, while GPU e2e tests cover real Megatron + SGLang training/rollout paths, including dense/MoE recipes, async rollout, SGLang config, checkpointing, precision, and debug replay. See CI (Continuous Integration).

Production Validation#

Beyond the GLM family, slime also supports:

  • Qwen series (Qwen3.6, Qwen3.5, Qwen3Next, Qwen3MoE, Qwen3, Qwen2.5);

  • DeepSeek V3 series (DeepSeek V3, V3.1, DeepSeek R1);

  • Llama 3.

Start by Use Case#

Hardware Platforms