Fully-Async Rollout Example#
End-to-end demo of slime’s fully-async rollout path. A background asyncio
worker keeps a fixed pool of in-flight generations across rollout boundaries,
so the next training step doesn’t wait for the slowest in-flight sample.
The worker itself lives in slime.rollout.fully_async_rollout; this
directory is just the launch script + CI test.
Files#
run-qwen2.5-0.5B-fully_async.sh— single-node, 4-GPU, three-rollout demo with Qwen2.5-0.5B-Instruct on dapo-math-17k. Fast enough to be the CI smoke test for the fully-async path.
The same script doubles as tests/test_qwen2.5_0.5B_fully_async_short.py in
CI.
Prerequisites#
/root/models/Qwen2.5-0.5B-Instruct/ # HF checkpoint
/root/models/Qwen2.5-0.5B-Instruct_torch_dist/ # tools/convert_hf_to_torch_dist.py
/root/datasets/dapo-math-17k/dapo-math-17k.jsonl
Run#
cd slime
bash examples/fully_async/run-qwen2.5-0.5B-fully_async.sh
You should see:
fully-async rollout 0: target=8 queue_warm=0
fully-async rollout 0: done in ...s, queue_left=...
How To Plug Your Own Generate Into This#
Two pieces flip the standard pipeline into fully-async:
Use the async training driver:
python3 train_async.py(nottrain.py).Set the rollout function path:
--rollout-function-path slime.rollout.fully_async_rollout.generate_rollout_fully_async
For custom per-sample logic, use slime’s standard plug-in points — they work unchanged under fully-async:
--custom-generate-function-path your.module.generate # (args, sample, sampling_params) -> Sample | list[Sample]
--custom-rm-path your.module.reward # (args, sample | list[Sample]) -> float | list[float]
See examples/swe_codex/ for a non-trivial example that plugs in a
multi-turn agent (Claude Code in a Docker-Proxy sandbox) this way.
Worker Internals (Very Short)#
First call: create a process-wide
AsyncRolloutWorker(thread + asyncio loop). The worker is shared across all subsequentgenerate_rolloutcalls so its queue stays warm.Loop keeps up to
args.sglang_server_concurrencytasks in flight usinggenerate_and_rm_group.Completed groups land on an output queue; each
generate_rolloutcall drains until it hasrollout_batch_sizegroups and returns them sorted bysample.index.Groups containing an
ABORTEDsample are pushed back intodata_buffer.add_samplesinstead of being shipped to training.Worker is stopped automatically at process exit via
atexit.
Limitations#
No evaluation mode (would conflict with the continuous-running model).
Ordering across rollouts is best-effort — within a rollout, groups are sorted by index before being handed to training.
TODO: partial-rollout-style resume for
ABORTEDtrajectories is not yet wired; for now the trajectory is re-queued and starts over.