Multi-node reward model training on AWS with EFA and FSx storage

This tutorial details the process of setting up a multi-node reward model training environment on AWS using EFA and FSx storage. The tutorial is structured into several chapters, each covering a specific aspect of the setup process. By following these chapters, you will be able to create a robust and efficient training environment for your deep learning models. As a practical walkthrough, we will outline how to fine-tune an already post-trained language model on the pointwise reward modeling task using the Bradley-Terry loss objective. However, the setup process can be applied to a wide range of training tasks and models.

This practical tutorial uses a modified version of the Axolotl Framework v0.8.1, with changes to support Qwen3 chat models. At the end of the tutorial, our infrastructure setup will look as follows:

Chapter 0: Set up test EC2, create final AMI (optional), and compile Docker images
Chapter 1: Create security groups and cluster placement groups
Chapter 2: Create shared FSx for Lustre storage
Chapter 3: Create launch template, maximize network bandwidth, and configure swap and FSx mounting
Chapter 4: Launch EC2 instances, assign public IPs, and verify EFA and FSx connectivity
Chapter 5: Run distributed training of the reward model and evaluate it on RewardBench

Prerequisites

An AWS account with appropriate permissions to create and manage EC2 instances, security groups, EFA, and FSx storage. The account must also have appropriately set service quota limits for the resources being used (e.g., EC2 instances, EFA interfaces, FSx storage).
Basic knowledge of AWS services, particularly EC2, security groups, and EFA.
Familiarity with deep learning frameworks and distributed training concepts is beneficial but not required.

Where to go from here

This tutorial demonstrates how to set up a multi-node training environment on AWS using EFA and FSx storage through a practical example of fine-tuning a language model on a reward modeling task. However, the setup mainly leverages the AWS console GUI. For a more replicable setting when conducting large, repeatable experiments, it is recommended to use infrastructure-as-code tools such as Terraform or AWS CloudFormation to automate the setup process. Additionally, it may be beneficial to set up a Slurm cluster on top of the EC2 instances for easier management of distributed training jobs.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Assets		Assets
Chapters		Chapters
Environment		Environment
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-node reward model training on AWS with EFA and FSx storage

Contents

Prerequisites

Where to go from here

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-node reward model training on AWS with EFA and FSx storage

Contents

Prerequisites

Where to go from here

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages