slime Documentation#
slime is an LLM post-training framework for RL scaling, providing two core capabilities:
High-Performance Training: Supports efficient training in various modes by connecting Megatron with SGLang;
Flexible Data Generation: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines.
slime’s design goal is to make these two capabilities reinforce each other without turning the system into a heavy stack of disconnected trainers, rollout services, and agent frameworks. Megatron training, SGLang rollout, custom data generation, reward computation, verifier feedback, and environment interaction all flow through the same training / rollout / Data Buffer path.
This makes slime one of the most battle-tested open RL post-training frameworks: small enough to understand and extend, but validated through complete training loops behind SOTA-level model releases.
Why This Design Matters#
Battle-tested by frontier model training: slime is the RL framework behind GLM-5.1, GLM-5, GLM-4.7, GLM-4.6, and GLM-4.5.
Native by design: slime passes Megatron arguments through directly and exposes installed SGLang arguments with a
--sglang-prefix, so upstream training and serving optimizations remain available without adding another wrapper layer.SGLang-focused rollout: slime chooses one rollout backend intentionally. This avoids flattening multiple inference engines into a lowest-common-denominator abstraction and lets RL workloads use SGLang-specific serving, routing, caching, disaggregation, and weight-sync behavior directly.
Agentic workflows as data generation: tool use, sandbox interaction, verifier rewards, environment feedback, multi-agent loops, and long-horizon agentic workflows plug into the same training / rollout / Data Buffer path instead of forking the training kernel.
BF16 training with FP8 rollout: large MoE recipes use Megatron BF16 training state with SGLang FP8 rollout/inference; long-context rollout can also use
--sglang-kv-cache-dtype fp8_e4m3to increase effective KV cache capacity.Tested as RL infrastructure: CPU correctness tests run automatically, while GPU e2e tests cover real Megatron + SGLang training/rollout paths, including dense/MoE recipes, async rollout, SGLang config, checkpointing, precision, and debug replay. See CI (Continuous Integration).
Production Validation#
Beyond the GLM family, slime also supports:
Qwen series (Qwen3.6, Qwen3.5, Qwen3Next, Qwen3MoE, Qwen3, Qwen2.5);
DeepSeek V3 series (DeepSeek V3, V3.1, DeepSeek R1);
Llama 3.
Start by Use Case#
New to slime: Quick Start
Configure training and rollout arguments: Usage Guide
Add custom generation, reward, or rollout functions: Customization Guide
Build agentic RL workflows: Agentic RL Training Roadmap
Configure production SGLang rollout topology: SGLang Config: Advanced Engine Deployment
Use PD disaggregation: PD Disaggregation
Use BF16 training with FP8 rollout or FP8 KV cache: Low Precision Training and Rollout
Use delta weight sync: Delta Weight Sync
Understand CI and reliability coverage: CI (Continuous Integration)
Debug, trace, and profile long-running jobs: Debugging, Trace Viewer, Profiling
MoE
Advanced Features
Other Usage
Developer Guide
Hardware Platforms