Multi-Agent RL

Contents

Multi-Agent RL#

This directory provides an example of running multi-agent reinforcement learning (RL) with slime.

Environment Setup#

The environment setup is identical to the standard RL setup used in slime.

Running the Script#

You can either define your own multi-agent system or use the provided default configuration.

MULTI_AGENT_CONFIGS = {
    "custom_multi_agent_function_path": "examples.multi_agent.agent_system.run_agent_system",
    "num_parallel": 5,
    "incorrect_reward_weight": 0.8,
    "correct_reward_weight": 1.2,
}

To start a run, execute:

cd slime/
bash examples/multi_agent/run-qwen3-30B-A3B-multi-agent.sh

New Arguments#

Specify the agent rollout function with the --custom-generate-function-path argument.
Set the --rollout-max-context-len argument according to your model’s context window.

ROLLOUT_ARGS=(
   --custom-generate-function-path examples.multi_agent.rollout_with_multi_agents.generate_with_multi_agents
   --prompt-data /root/dapo-math-17k/dapo-math-17k.jsonl
   --input-key prompt
   --label-key label
   --apply-chat-template
   --rollout-shuffle
   --rm-type deepscaler
   --num-rollout 3000
   --rollout-batch-size 32
   --n-samples-per-prompt 8
   --rollout-max-context-len 16384
   --rollout-max-response-len 8192
   --rollout-temperature 0.8

   --global-batch-size 256
   --balance-data
)