Multi-Agent RL#
This directory provides an example of running multi-agent reinforcement learning (RL) with slime.
Environment Setup#
The environment setup is identical to the standard RL setup used in slime.
Running the Script#
You can either define your own multi-agent system or use the provided default configuration.
MULTI_AGENT_CONFIGS = {
"custom_multi_agent_function_path": "examples.multi_agent.agent_system.run_agent_system",
"num_parallel": 5,
"incorrect_reward_weight": 0.8,
"correct_reward_weight": 1.2,
}
To start a run, execute:
cd slime/
bash examples/multi_agent/run-qwen3-30B-A3B-multi-agent.sh
New Arguments#
Specify the agent rollout function with the
--custom-generate-function-path
argument.Set the
--rollout-max-context-len
argument according to your model’s context window.
ROLLOUT_ARGS=(
--custom-generate-function-path examples.multi_agent.rollout_with_multi_agents.generate_with_multi_agents
--prompt-data /root/dapo-math-17k/dapo-math-17k.jsonl
--input-key prompt
--label-key label
--apply-chat-template
--rollout-shuffle
--rm-type deepscaler
--num-rollout 3000
--rollout-batch-size 32
--n-samples-per-prompt 8
--rollout-max-context-len 16384
--rollout-max-response-len 8192
--rollout-temperature 0.8
--global-batch-size 256
--balance-data
)