This tutorial details the process of setting up a multi-node reward model training environment on AWS using EFA and FSx storage. The tutorial is structured into several chapters, each covering a specific aspect of the setup process. By following these chapters, you will be able to create a robust and efficient training environment for your deep learning models. As a practical walkthrough, we will outline how to fine-tune an already post-trained language model on the pointwise reward modeling task using the Bradley-Terry loss objective. However, the setup process can be applied to a wide range of training tasks and models.
This practical tutorial uses a modified version of the Axolotl Framework v0.8.1, with changes to support Qwen3 chat models. At the end of the tutorial, our infrastructure setup will look as follows:
- Chapter 0: Set up test EC2, create final AMI (optional), and compile Docker images
- Chapter 1: Create security groups and cluster placement groups
- Chapter 2: Create shared FSx for Lustre storage
- Chapter 3: Create launch template, maximize network bandwidth, and configure swap and FSx mounting
- Chapter 4: Launch EC2 instances, assign public IPs, and verify EFA and FSx connectivity
- Chapter 5: Run distributed training of the reward model and evaluate it on RewardBench
- An AWS account with appropriate permissions to create and manage EC2 instances, security groups, EFA, and FSx storage. The account must also have appropriately set service quota limits for the resources being used (e.g., EC2 instances, EFA interfaces, FSx storage).
- Basic knowledge of AWS services, particularly EC2, security groups, and EFA.
- Familiarity with deep learning frameworks and distributed training concepts is beneficial but not required.
This tutorial demonstrates how to set up a multi-node training environment on AWS using EFA and FSx storage through a practical example of fine-tuning a language model on a reward modeling task. However, the setup mainly leverages the AWS console GUI. For a more replicable setting when conducting large, repeatable experiments, it is recommended to use infrastructure-as-code tools such as Terraform or AWS CloudFormation to automate the setup process. Additionally, it may be beneficial to set up a Slurm cluster on top of the EC2 instances for easier management of distributed training jobs.
