CI (Continuous Integration)#
slime CI has two layers:
Always-on CPU correctness tests that run on every PR, every push to
main, and manualworkflow_dispatch.Label-gated GPU end-to-end tests that validate real Megatron + SGLang training and rollout paths on self-hosted GPU runners.
This split is intentional. Most invariants should be checked quickly without waiting for the GPU fleet, while full training/rollout behavior is still covered by GPU e2e jobs.
How It Works#
The workflow is defined in .github/workflows/pr-test.yml, which is auto-generated from .github/workflows/pr-test.yml.j2.
CPU Jobs#
CPU jobs run on GitHub-hosted ubuntu-latest runners:
cpu-unittestinstalls CPU PyTorch and lightweight dependencies, then runs registered unit and contract tests withpython tests/<test_file>.py.agent-adapter-testdoes the same for agent adapter tests, with extra dependencies such asopenai,openai-agents, andanthropic.
CPU jobs do not use Docker, do not acquire GPUs, and do not call tests/ci/gpu_lock_exec.py.
GPU E2E Jobs#
GPU jobs run on self-hosted GPU runners. Each job:
Starts a Docker container, usually
slimerl/slime:latest; image validation usesslimerl/slime-test:latest.Installs slime with
pip install -e . --no-deps.Acquires the requested GPUs with
tests/ci/gpu_lock_exec.py --count <num_gpus>.Executes the registered test file with
python tests/<test_file>.py.
GPU tests usually follow the e2e pattern: prepare() downloads models/datasets, and execute() builds CLI arguments and calls U.execute_train(...).
Changed-Test Job#
run-ci-changed dynamically detects added or modified files under tests/test_*.py and tests/plugin_contracts/test_*.py relative to origin/main.
For each changed test file, it extracts a top-level NUM_GPUS = <N> constant and builds a matrix. If NUM_GPUS is missing, CI defaults to 8, so CPU-only tests should declare:
NUM_GPUS = 0
The changed-test job itself runs through the self-hosted Docker path. When NUM_GPUS = 0, it runs the test without acquiring GPUs.
CI Jobs and Triggers#
Trigger |
Job |
Type |
Description |
|---|---|---|---|
Automatic |
|
CPU |
Always-on unit and contract tests for argument validation, schedules, rewards, samples, rollout validation, checkpoint utilities, and plugin contracts. |
Automatic |
|
CPU |
Always-on agent adapter tests with optional provider SDK dependencies. |
|
|
GPU |
Lightweight smoke tests with small Qwen models. Fast GPU feedback loop. |
|
|
GPU |
SGLang config tests for advanced rollout engine deployment and mixed/offload scenarios. |
|
|
GPU |
Core Megatron training tests covering dense, MoE, PPO, MTP, OPD, async rollout, PD/Mooncake, and debug replay paths. |
|
|
GPU |
Numerical precision validation and parallel consistency checks. |
|
|
GPU |
Checkpoint save/load correctness, including CPU/GPU optimizer states and async save. |
|
|
GPU |
Broad image validation suite on |
|
|
Mixed |
Runs only changed tests, using each file’s |
workflow_dispatch can be used from the Actions page for manual validation. It runs the registered jobs according to the workflow conditions.
CPU Unit Tests#
The CPU suite is the first line of defense for correctness. It is designed to catch silent RL infrastructure bugs before a change reaches expensive GPU runs.
The registered CPU suite currently covers:
Megatron argument and HF config validation;
DP/CP scheduling utilities and CP loss invariance;
metric reporting and distributed metric aggregation;
reward-model grading utilities for math, GPQA, F1, DeepScaler, and DAPO-style math;
Samplebehavior, rollout validation, and agent trajectory merging;HF checkpoint saver behavior;
customization hook contracts for rollout functions, generate functions, runtime hooks, and path loading.
Agent adapter tests are kept in a separate CPU job because they need extra SDK dependencies.
Useful local commands:
python tests/test_agent_trajectory.py
python -m pytest tests/test_megatron_argument_validation.py tests/plugin_contracts/test_plugin_generate_contracts.py
GPU E2E Tests#
GPU e2e tests validate the integrated training/rollout behavior that CPU tests cannot cover:
run-ci-short: small-model smoke coverage for quick GPU feedback.run-ci-sglang-config: advanced SGLang deployment paths, including config-based engine layouts.run-ci-megatron: main Megatron backend coverage for dense/MoE recipes, async rollout, OPD, PPO-style paths, PD/Mooncake, and debug rollout-then-train replay.run-ci-precision: numerical consistency across parallel settings.run-ci-ckpt: checkpoint save/load combinations and async save.run-ci-image: broad validation of the release/test image.
Use targeted labels for routine PRs. Use run-ci-image sparingly because it consumes significantly more GPU time.
Writing a New Test#
CPU Tests#
For CPU-only tests:
Add the test under
tests/test_*.py,tests/utils/test_*.py, ortests/plugin_contracts/test_*.py, following nearby patterns.Add a top-level
NUM_GPUS = 0if the file may be run byrun-ci-changed.Make the file executable directly:
if __name__ == "__main__":
raise SystemExit(pytest.main([__file__]))
If the test should run permanently, register it in the
cpu-unittestoragent-adapter-testjob in.github/workflows/pr-test.yml.j2, then regenerate the workflow.
GPU E2E Tests#
For GPU e2e tests:
Create
tests/test_<your_test_name>.pyfollowing the existingprepare()/execute()pattern.Declare the required GPU count with
NUM_GPUS = <N>.Download required models/datasets in
prepare().Build arguments and call
U.execute_train(...)inexecute().Register the test in the appropriate GPU job in
.github/workflows/pr-test.yml.j2, then regenerate the workflow.
Example skeleton:
import os
import slime.utils.external_utils.command_utils as U
MODEL_NAME = "Qwen2.5-0.5B-Instruct"
MODEL_TYPE = "qwen2.5-0.5B"
NUM_GPUS = 4
def prepare():
U.exec_command("mkdir -p /root/models /root/datasets")
U.exec_command(f"hf download Qwen/{MODEL_NAME} --local-dir /root/models/{MODEL_NAME}")
def execute():
# Build argument strings and call U.execute_train(...)
...
if __name__ == "__main__":
prepare()
for proxy_var in ("http_proxy", "https_proxy", "HTTP_PROXY", "HTTPS_PROXY"):
os.environ.pop(proxy_var, None)
execute()
Workflow Generation#
The workflow file pr-test.yml is auto-generated from the Jinja2 template pr-test.yml.j2. Do not edit pr-test.yml directly.
To change the permanent CI matrix:
Edit
.github/workflows/pr-test.yml.j2.Run:
python .github/workflows/generate_github_workflows.py
Commit both
.github/workflows/pr-test.yml.j2and the generated.github/workflows/pr-test.yml.
Choosing Checks for a PR#
Pure argument parsing, reward, schedule, sample, trajectory, or hook-contract changes: rely on CPU tests first.
SGLang topology or rollout engine deployment changes: use
run-ci-sglang-config.Megatron training, loss, checkpoint conversion, or model recipe changes: use
run-ci-megatron; addrun-ci-precisionorrun-ci-ckptwhen relevant.Docker image or dependency changes: use
run-ci-image.New or modified tests: use
run-ci-changedfor quick targeted validation.