FedScaleLLM

Training large language models (LLMs) typically requires massive volumes of high-quality data, which are often distributed across multiple organizations and cannot be centrally aggregated due to privacy regulations. Federated learning offers a promising paradigm for collaboratively training LLMs without sharing raw data from each participant. However, existing federated LLM training systems face challenges: (1) limited and heterogeneous computational resources that prevent many participants from training large-scale models, (2) inefficient training pipelines due to the heavy coupling of computation and communication, and (3) potential privacy leakage through gradient exchanges during model aggregation.

To address these challenges, we present FedScaleLLM, the first system that jointly addresses resource heterogeneity, pipeline inefficiency, and privacy leakage in federated LLM training. First, a Resource-Aware Model Management mechanism partitions model states across clients and dynamically loads layers on demand, significantly reducing the GPU memory footprint at each participant. Second, a Pipelined Parallel Training Engine overlaps computation and communication through asynchronous pipelined execution and clustered parallel training, substantially improving system throughput. Third, an Anonymous Routing mechanism forwards gradient updates through dynamically constructed multi-hop paths, breaking the cross-round linkage between client identities and transmitted updates to mitigate privacy leakage risks. Extensive experiments on three benchmarks under different heterogeneous environments show that FedScaleLLM reduces GPU memory usage by up to 6x, lowers end-to-end training time by 17x, and achieves 18x higher throughput compared with state-of-the-art methods, while demonstrating privacy protection capability.

⚙️ Environment

Python.version = 3.9.21

Other dependencies are listed in requirements.txt.

Experiments are conducted in a heterogeneous federated environment consisting of 10 physical machines interconnected via 10 Gbps links, including 7 servers equipped with dual NVIDIA GeForce RTX 3090 GPUs and 3 servers with dual RTX 2080 Ti GPUs. Different tasks instantiate different numbers of logical clients to reflect realistic deployments: 9 clients for Code Generation, 8 clients for Question Answering, and 3 clients for Math Problem Solving, prioritizing RTX 3090 servers when available. For scalability evaluation, we scale to 16 logical clients by partitioning the 3090 servers and incorporating additional 2080 Ti-based clients.

🌍 Datasets and Tasks

We adopt the benchmark datasets released in FederatedScope-LLM. As summarized in the following table, the tasks span three representative domains, code generation, question answering, and mathematical reasoning, each exhibiting distinct data heterogeneity patterns.

Task	Training Dataset	# training samples	Partition	# clients	Test Dataset	# test samples
Code Generation	Fed-CodeAlpaca	7954	Non-IID	9	HumanEval	656
Question Answering	Fed-Dolly	15015	Non-IID	8	HELM	1600
Math Problem Solving	Fed-GSM8K-3	7473	IID	3	GSM8K	1319

🧠 Model Bases

In this paper, we used 5 LLMs: DeepSeek-Qwen-1.5B, DeepSeek-Qwen-7B, GPT3, DeepSeek-Llama-8B, and DeepSeek-Qwen-14B

📊 Baselines

We evaluate FedSpeed against a comprehensive suite of federated LLM training baselines, each representing one of the three dominant paradigms in the field: gradient approximation, model compression, and split learning. For each paradigm, the most competitive state-of-the-art method is chosen as the representative baseline. Besides, we compare our framework with two industrial state-of-the-art federated LLM training systems. The baselines are shown below:

Baseline	Year	Conference	Paper
SPRY	2025	NeurIPS	Thinking forward memory-efficient federated finetuning of language models
FedBiOT	2024	KDD	FedBiOT LLM Local Fine-tuning in Federated Learning without Full Model
M2FedSA	2024	ICML	Enhancing Storage and Computational Efficiency in Federated Multimodal Learning for Large-Scale Models
FederatedScope-LLM	2024	KDD	Federatedscope-llm A comprehensive package for fine-tuning large language models in federated learning
FATE-LLM	2023	Arxiv	FATE-LLM A Industrial Grade Federated Learning Framework for Large Language Models

🛠️ Running

The running example of FedScaleLLM is as follows.

HF_HUB_OFFLINE=1 CUDA_VISIBLE_DEVICES=0 python federatedscope/main.py --cfg federatedscope/llm/baseline/client1.yaml 2>&1 | tee logs/client1.log

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
benchmark		benchmark
doc		doc
environment		environment
federatedscope.egg-info		federatedscope.egg-info
federatedscope		federatedscope
fedspeed_final_models		fedspeed_final_models
materials		materials
scripts		scripts
tests		tests
.flake8		.flake8
.pre-commit-config.yaml		.pre-commit-config.yaml
.style.yapf		.style.yapf
LICENSE		LICENSE
README.md		README.md
framework.png		framework.png
meta.yaml		meta.yaml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FedScaleLLM

⚙️ Environment

🌍 Datasets and Tasks

🧠 Model Bases

📊 Baselines

🛠️ Running

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FedScaleLLM

⚙️ Environment

🌍 Datasets and Tasks

🧠 Model Bases

📊 Baselines

🛠️ Running

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages