Search-R1 lite#

This is a minimal reproduction of Search-R1 and an example of using multi-turn conversation and tool-calling in slime.

Environment Setup#

Use the slimerl/slime:latest image and initialize the environment required for Search-R1:

cd /root/
git clone https://github.com/THUDM/slime.git
pip install -e .
# for Search R1
pip install chardet

Please refer to the script provided in Search-R1 to download the data:

git clone https://github.com/PeterGriffinJin/Search-R1.git
cd Search-R1/
python scripts/data_process/nq_search.py --local_dir /root/nq_search/

Initialize the Qwen2.5-3B model:

# hf checkpoint
huggingface-cli download Qwen/Qwen2.5-3B --local-dir /root/Qwen2.5-3B

# mcore checkpoint
cd /root/slime
source scripts/models/qwen2.5-3B.sh
PYTHONPATH=/root/Megatron-LM python tools/convert_hf_to_torch_dist.py \
    ${MODEL_ARGS[@]} \
    --hf-checkpoint /root/Qwen2.5-3B \
    --save /root/Qwen2.5-3B_torch_dist

Running the Script#

You need to configure your serper.dev API in generate_with_search.py:

SEARCH_R1_CONFIGS = {
    "max_turns": 3,
    "topk": 3,
    "google_api_key": "YOUR_API_KEY",  # Replace with your actual API key
    "snippet_only": True,  # Set to True to only return snippets
    "proxy": None,  # Set to your proxy if needed
    "search_concurrency": 256,
    # rm
    "format_score": 0.2,
}

And run:

cd slime/
bash examples/search-r1/run_qwen2.5_3B.sh

Code Structure#

To implement multi-turn conversation + tool-calling in slime, you only need to implement a custom data generation function and a reward model for the task. These correspond to the following 2 configuration items in the startup script:

CUSTOM_ARGS=(
   --custom-generate-function-path generate_with_search.generate
   --custom-rm-path generate_with_search.reward_func
)

These are the generate and reward_func functions in generate_with_search.py.