Reproducibility#
Reproducibility is a bedrock of scientific progress. By combining deterministic inference from SGLang with Megatron-LM deterministic mode, slime can provide bitwise experiment reproduction.
To enable deterministic training, uninstall FlashAttention 3 with pip uninstall flash_attn_3 -y, then set:
# sglang config
--sglang-enable-deterministic-inference
--sglang-attention-backend flashinfer
# megatron config
--deterministic-mode
Also set the following environment variables:
"env_vars": {
...,
"NCCL_ALGO": "Ring",
"NVTE_ALLOW_NONDETERMINISTIC_ALGO": "0",
"CUBLAS_WORKSPACE_CONFIG": ":4096:8"
}
We provide a fully deterministic GSM8K training script for Qwen2.5-0.5B.
Use the following commands to initialize the training data and checkpoint:
# download
hf download --repo-type dataset zhuzilin/gsm8k --local-dir /root/gsm8k
hf download Qwen/Qwen2.5-0.5B-Instruct --local-dir /root/Qwen2.5-0.5B-Instruct
# convert ckpt
cd slime/
source scripts/models/qwen2.5-0.5B.sh
PYTHONPATH=/root/Megatron-LM/ python \
tools/convert_hf_to_torch_dist.py \
${MODEL_ARGS[@]} \
--hf-checkpoint /root/Qwen2.5-0.5B-Instruct \
--save /root/Qwen2.5-0.5B-Instruct_torch_dist/
Run training with:
bash script/run-qwen2.5-0.5B-reproducibility.sh
The wandb screenshots are recorded in pull#370.