Skip to main content

Ctrl+K

Get Started

Quick Start
Usage Guide
Customization Guide
Agentic RL Training Roadmap
FAQ

Dense

Qwen3-4B with 8xH100
Gemma4 Dense and MoE with GSM8K
GLM4-9B with 8xH100

MoE

GLM-4.7-Flash with 8×H100
Qwen3-30B-A3B with 8xH100
GLM-5.2 744B-A40B with 256xH100
GLM-4.7 with 64xH100
DeepSeek R1 with 128xH100

Advanced Features

On-Policy Distillation
Speculative Decoding
Low Precision Training and Rollout
Reproducibility
Fault Tolerance
Observability
PD Disaggregation
External Rollout Engines Roadmap
Delta Weight Sync
SGLang Config: Advanced Engine Deployment
Megatron Config: Role-Based Training Overrides
Supporting Model Architectures Beyond Megatron-LM

Other Usage

SFT Qwen3-4B-Base
Search-R1 lite
Fully-Async Rollout Example
Retool: from SFT to RL
Multi-Agent RL
Coding-Agent RL

Developer Guide

CI (Continuous Integration)
Debugging
Trace Viewer
Profiling

Hardware Platforms

AMD

Blogs

v0.1.0: Redefining High-Performance RL Training Frameworks
slime: An SGLang-Native Post-Training Framework for RL Scaling

Repository
Open issue

Index

By slime Team

© Copyright 2025-2026, slime.

Last updated on Jul 07, 2026.