slime Documentation#

slime is an LLM post-training framework for RL scaling, providing two core capabilities:

High-Performance Training: Supports efficient training in various modes by connecting Megatron with SGLang;
Flexible Data Generation: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines.

slime’s design goal is to make these two capabilities reinforce each other without turning the system into a heavy stack of disconnected trainers, rollout services, and agent frameworks. Megatron training, SGLang rollout, custom data generation, reward computation, verifier feedback, and environment interaction all flow through the same training / rollout / Data Buffer path.

This makes slime one of the most battle-tested open RL post-training frameworks: small enough to understand and extend, but validated through complete training loops behind SOTA-level model releases.

Why This Design Matters#

Battle-tested by frontier model training: slime is the RL framework behind GLM-5.2, GLM-5.1, GLM-5, GLM-4.7, GLM-4.6, and GLM-4.5.
Native by design: slime passes Megatron arguments through directly and exposes installed SGLang arguments with a --sglang- prefix, so upstream training and serving optimizations remain available without adding another wrapper layer.
SGLang-focused rollout: slime chooses one rollout backend intentionally. This avoids flattening multiple inference engines into a lowest-common-denominator abstraction and lets RL workloads use SGLang-specific serving, routing, caching, disaggregation, and weight-sync behavior directly.
Agentic workflows as data generation: tool use, sandbox interaction, verifier rewards, environment feedback, multi-agent loops, and long-horizon agentic workflows plug into the same training / rollout / Data Buffer path instead of forking the training kernel.
BF16 training with FP8 rollout: large MoE recipes use Megatron BF16 training state with SGLang FP8 rollout/inference; long-context rollout can also use --sglang-kv-cache-dtype fp8_e4m3 to increase effective KV cache capacity.
Tested as RL infrastructure: CPU correctness tests run automatically, while GPU e2e tests cover real Megatron + SGLang training/rollout paths, including dense/MoE recipes, async rollout, SGLang config, checkpointing, precision, and debug replay. See CI (Continuous Integration).

Production Validation#

Beyond the GLM family, slime also supports:

Qwen series (Qwen3.6, Qwen3.5, Qwen3Next, Qwen3MoE, Qwen3, Qwen2.5);
DeepSeek V3 series (DeepSeek V3, V3.1, DeepSeek R1);
Llama 3.

Start by Use Case#

New to slime: Quick Start
Configure training and rollout arguments: Usage Guide
Add custom generation, reward, or rollout functions: Customization Guide
Build agentic RL workflows: Agentic RL Training Roadmap
Configure production SGLang rollout topology: SGLang Config: Advanced Engine Deployment
Connect external rollout engines: External Rollout Engines Roadmap
Sync weights as byte-level deltas: Delta Weight Sync
Use PD disaggregation: PD Disaggregation
Use BF16 training with FP8 rollout or FP8 KV cache: Low Precision Training and Rollout
Understand CI and reliability coverage: CI (Continuous Integration)
Debug, trace, and profile long-running jobs: Debugging, Trace Viewer, Profiling

Get Started

Dense

MoE

Advanced Features

Other Usage

Developer Guide

Hardware Platforms

Blogs

slime Documentation

Contents

slime Documentation#

Why This Design Matters#

Production Validation#

Start by Use Case#