Megatron Config: Role-Based Training Overrides#

--megatron-config-path is a YAML-based configuration system for applying role-specific overrides on top of the shared Megatron CLI arguments. Today it is mainly intended for PPO actor / critic configuration.

Unlike --sglang-config, --megatron-config-path does not manage deployment, routing, or GPU orchestration. Its only job is to decide which training arguments each role should finally use.

Design Overview#

By default, when --megatron-config-path is not used, both actor and critic inherit the Megatron / slime CLI arguments directly.

With --megatron-config-path, the configuration is split into two layers:

Shared CLI arguments define the common Megatron topology, resource allocation, and default training parameters.
Role-level YAML overrides only specify the fields that should differ between actor and critic.

Key design principles:

CLI remains the shared baseline. slime first parses the normal CLI arguments, then applies the YAML role overrides.
Missing roles inherit automatically. If a role is absent from the YAML file, it simply keeps the CLI arguments unchanged.
Resource allocation is still controlled by CLI. num_nodes and num_gpus_per_node in YAML are ignored; placement is still controlled by --actor-num-* / --critic-num-*.

Config Format#

The config file is a YAML document whose top-level megatron key contains a list of role entries:

megatron:
  - name: default
    role: actor
    overrides:
      lr: 1e-6
      save: /path/to/actor_ckpt
  - name: default
    role: critic
    overrides:
      lr: 1e-5
      save: /path/to/critic_ckpt

Field Reference#

Field	Type	Default	Description
`name`	`str`	Optional	Label for this entry. The runtime does not depend on it today, but keeping `default` is recommended for forward compatibility.
`role`	`str`	Required	Role name. Currently supported values are `actor` and `critic`.
`overrides`	`dict`	`{}`	Role-specific argument overrides applied on top of the shared CLI arguments.
`args`	`dict`	`{}`	Backward-compatible alias for `overrides`. New configs should prefer `overrides`.

Note: Keys inside overrides use argparse attribute names, not CLI flag names. For example, use tensor_model_parallel_size rather than tensor-model-parallel-size.

Usage Pattern#

A typical PPO setup looks like this:

# megatron_ppo.yaml
megatron:
  - name: default
    role: actor
    overrides:
      lr: 1e-6
  - name: default
    role: critic
    overrides:
      lr: 1e-5

python train.py \
  --advantage-estimator ppo \
  --use-critic \
  --megatron-config-path megatron_ppo.yaml \
  --tensor-model-parallel-size 2 \
  --sequence-parallel \
  --pipeline-model-parallel-size 1 \
  --context-parallel-size 1 \
  --expert-model-parallel-size 1 \
  --expert-tensor-parallel-size 1 \
  --actor-num-nodes 1 \
  --actor-num-gpus-per-node 8 \
  --critic-num-nodes 1 \
  --critic-num-gpus-per-node 8 \
  ...

In this setup:

CLI defines the shared topology and resource layout.
YAML defines the role-specific differences, such as lr, load, save, or optimizer / scheduler parameters.

Overriding Only One Role#

You can also override only one role and let the other inherit the shared CLI configuration. For example, changing only the critic learning rate:

megatron:
  - name: default
    role: critic
    overrides:
      lr: 1e-5

In this case the actor keeps the shared CLI arguments unchanged.

Current Limitations#

PPO only for now. --megatron-config-path is currently intended for PPO actor / critic role configuration. It is not the recommended interface for GRPO, REINFORCE++, and other critic-free workflows.
Actor and critic must use the same Megatron parallel topology in current PPO. In particular, topology-related settings such as tensor_model_parallel_size, pipeline_model_parallel_size, context_parallel_size, expert_model_parallel_size, expert_tensor_parallel_size, and sequence_parallel should not differ between actor and critic.
Keep topology-related settings on CLI. The safest current pattern is to keep parallelism and resource arguments in the shared CLI configuration, and only put role-specific differences in YAML, such as lr, load, save, warmup, and optimizer / scheduler settings.

If you configure different parallel topologies for actor and critic, the behavior is currently unsupported and may fail during initialization or training.

FAQ#

Q: Can I provide only an actor entry or only a critic entry?#

Yes. Missing roles automatically inherit the shared CLI arguments, so you do not need to duplicate everything in YAML.

Q: Can I move `--actor-num-nodes` or `--critic-num-gpus-per-node` into YAML?#

No. Resource allocation and placement groups are still controlled by CLI arguments, and the corresponding YAML fields are ignored.

Megatron Config: Role-Based Training Overrides

Contents