First, set up the local search environment by following Search-R1:
conda create -n retriever python=3.10
conda activate retriever
# We recommend installing torch with conda for faiss-gpu compatibility
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers datasets pyserini
# Install the GPU version of faiss to ensure efficient RL rollout
conda install -c pytorch -c nvidia faiss-gpu=1.8.0
# API dependencies
pip install uvicorn fastapi
Next, download the ASearcher local retrieval server and retriever:
hf download inclusionAI/ASearcher-Local-Knowledge --repo-type dataset
hf download intfloat/e5-base-v2Finally, build the index:
bash agent/search/retrieval/build_index.sh
Set up the environment for RL training:
conda create -n rllm python=3.10
cd ./BranPO/
pip install -e .
Download the ASearcher training and test datasets:
hf download inclusionAI/ASearcher-train-data --repo-type dataset
hf download inclusionAI/ASearcher-test-data --repo-type dataset
After downloading, update the dataset file paths in agent/search/prepare_asearcher_data.py to match your local directories, then run the script to preprocess the data.
The 10k SFT cold start dataset is available on Hugging Face:
hf download ThornZ/Search-R1-SFT --repo-type datasetWe recommend using LLaMA-Factory for SFT training. You can find the provided training scripts in the sft/ directory.
We provide scripts for both GRPO and BranPO in ./train_grpo.sh and ./train_branpo.sh, respectively.
Make sure you have updated the model paths and retrieval knowledge base paths to your local directories before starting.
To evaluate your model, run run_eval.sh to test against the local retrieval server. Following that, execute run_llm_as_a_judge.sh to perform the LLM-as-a-Judge evaluation.
This codebase is built upon rLLM and veRL. The search workflow and training data are based on Search-R1 and ASearcher. We are sincerely grateful to these projects for their foundational contributions to the field!