[core] Large/full refactor of `from_pretrained` by Cyrilvallez · Pull Request #36033 · huggingface/transformers

Cyrilvallez · 2025-02-04T20:09:20Z

What does this PR do?

Preview

Large refactor of from_pretrained. The most important updates are the following:

Much more readable and maintainable in the future (hopefully, at least IMO). Simplified a looooot of weird/dead/useless code that accumulated over the years
faster model downloads (concurrent file download)
new keyword argument key_mapping, allowing direct mapping of the weight names if loading a model from the hub which is compatible to given arch, but was not converted accordingly. For example, the following snippet works nicely:

from transformers import LlamaForCausalLM, AutoTokenizer, LlamaConfig

key_mapping = {
    r"^transformer.wte": r"model.embed_tokens",
    r"^transformer.rotary": r"model.rotary_emb", 
    r"^transformer.ln_f": r"model.norm", 

    r"^transformer.h.(\d+).ln_1": r"model.layers.\1.input_layernorm",
    r"^transformer.h.(\d+).ln_2": r"model.layers.\1.post_attention_layernorm",

    r"^transformer.h.(\d+).mlp.c_fc_0": r"model.layers.\1.mlp.gate_proj",
    r"^transformer.h.(\d+).mlp.c_fc_1": r"model.layers.\1.mlp.up_proj",
    r"^transformer.h.(\d+).mlp.c_proj": r"model.layers.\1.mlp.down_proj",

    r"^transformer.h.(\d+).attn.attention.k_proj": r"model.layers.\1.self_attn.k_proj",
    r"^transformer.h.(\d+).attn.attention.v_proj": r"model.layers.\1.self_attn.v_proj",
    r"^transformer.h.(\d+).attn.attention.q_proj": r"model.layers.\1.self_attn.q_proj",
    r"^transformer.h.(\d+).attn.attention.out_proj": r"model.layers.\1.self_attn.o_proj",
}

model_id = "LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct"

config = LlamaConfig(num_hidden_layers=32,
                     num_key_value_heads=8,
                     intermediate_size=14336,
                     hidden_size=4096,
                     bos_token_id=1,
                     eos_token_id= 361,
                     pad_token_id=0,
                     rope_theta=500000.0,
                     vocab_size=102400
                     )
model = LlamaForCausalLM.from_pretrained(model_id, config=config, key_mapping=key_mapping, torch_dtype="float16", device_map=0)

and will allow to simplify model conversions. It will also help when teams want to add their models to the library, but tey are 1:1 compatible with existing archs.

Some pointers

Files modified apart from modeling_utils.py and hub.py are due to the removal of legacy functions get_file_from_repo and _load_pretrained_model_low_mem during the cleanup.

Note that I was as careful as can be during the refactor since this is extremely critical code. All tests are passing on the CI, and I tested the most common scenarios, as well as the new features added. Everything seem to work.

However, I am deeply sorry for the person reviewing @ArthurZucker 😆🥲🤗

HuggingFaceDocBuilderDev · 2025-02-04T21:02:37Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ydshieh · 2025-02-11T09:20:23Z

Hi @Cyrilvallez

The workflow run: https://github.com/huggingface/transformers/actions/runs/13249941627
The slack report: https://huggingface.slack.com/archives/C06LR9PQA00/p1739225850225799
The new failures: https://huggingface.co/datasets/hf-internal-testing/transformers_daily_ci/blob/13935d4e1231eb06ab3ce9647a6d3ec92a85b16e/2025-02-10/ci_results_run_models_gpu/new_model_failures.json

Cyrilvallez · 2025-02-12T10:44:54Z

All our test suite passes. See here for new failures compared to last full CI run, but I personally checked that they are all unrelated and are due to the diff between the tip of this branch and main.
Now rebasing

SunMarc

Thanks for this huge work @Cyrilvallez ! Left a few comments

src/transformers/modeling_utils.py

src/transformers/utils/hub.py

poedator · 2025-02-19T13:44:47Z

Hi, @Cyrilvallez! This refactor was long needed. Thank you for streamlining modeling_utils.py!

is it possible to avoid creating state_dict in RAM altogether and loading model params straight to GPU(s)? sometimes a server has too little RAM or it may be taken by tmpfs.
Perhaps single param/buffers items could be sent to GPU right after loading to disk, without waiting for whole state_dict to load?

Also please ensure that quantized models loading/saving is tested in this PR.

ydshieh · 2025-02-19T14:37:10Z

Also please ensure that quantized models loading/saving is tested in this PR.

Hmm, indeed, I need to trigger quantization CI too!

Cyrilvallez · 2025-02-19T14:43:11Z

Oh I thought we did already! No worries, I'll ping you on slack @ydshieh, I first have to do a few changes.

@poedator thanks! Concerning the loading, note that state dicts on the hub are usually quite small (never bigger than 10 GB), so loading everything in ram is never an issue. It would otherwise probably be very slow!

ydshieh · 2025-02-19T14:45:22Z

Oh I thought we did already!

My bad. I am (almost always) focus on the CI jobs regarding tests/models/xxx, but this time it is indeed important to check the CI jobs regarding quantization/xxx

poedator · 2025-02-19T15:13:25Z

note that state dicts on the hub are usually quite small (never bigger than 10 GB), so loading everything in ram is never an issue. It would otherwise probably be very slow!

But what about cases like loading models like Llama 70B, which is 140GB in 16bit, or loading 34 B model into 8 GPUs for data parallel, which takes 8 x 34 x 2 = 544GB of RAM?

Cyrilvallez · 2025-02-19T15:47:05Z

State dicts are loaded one after the other, so no need to worry! See e.g. here for Llama 70B: each state dict is at most 5 GB

ArthurZucker

Big big PR!

src/transformers/modeling_utils.py

ArthurZucker · 2025-03-12T12:39:21Z

Merging!

Prior to PR #36033, unexpected exceptions (e.g., ModuleNotFoundError) during hub model loading were not swallowed silently. They either matched specific except blocks or were raised. After #36033, a catch-all except Exception block was introduced without a fallback else, causing unknown errors to be silently ignored and leading to misleading downstream behavior. This commit adds an `else: raise e` to ensure only explicitly handled exceptions are suppressed. All others are surfaced, restoring pre-4.50 behavior and aiding in debugging and dependency visibility.

…37525) * fix: Restore explicit error surfacing for unexpected hub exceptions Prior to PR #36033, unexpected exceptions (e.g., ModuleNotFoundError) during hub model loading were not swallowed silently. They either matched specific except blocks or were raised. After #36033, a catch-all except Exception block was introduced without a fallback else, causing unknown errors to be silently ignored and leading to misleading downstream behavior. This commit adds an `else: raise e` to ensure only explicitly handled exceptions are suppressed. All others are surfaced, restoring pre-4.50 behavior and aiding in debugging and dependency visibility. Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

…uggingface#37525) * fix: Restore explicit error surfacing for unexpected hub exceptions Prior to PR huggingface#36033, unexpected exceptions (e.g., ModuleNotFoundError) during hub model loading were not swallowed silently. They either matched specific except blocks or were raised. After huggingface#36033, a catch-all except Exception block was introduced without a fallback else, causing unknown errors to be silently ignored and leading to misleading downstream behavior. This commit adds an `else: raise e` to ensure only explicitly handled exceptions are suppressed. All others are surfaced, restoring pre-4.50 behavior and aiding in debugging and dependency visibility. Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

Since Transformers 4.51, the `.from_pretrained` method includes model prefixes when reporting missing parameters. Following PR [1], this commit aligns the `test_model_weights_reload_no_missing_tied_weights` test to account for the new prefix behaviour. [1] huggingface/transformers#36033

Cyrilvallez force-pushed the refactor-from-pretrained branch from 1e626dd to bbab9b2 Compare February 5, 2025 09:34

Cyrilvallez marked this pull request as ready for review February 10, 2025 14:19

Cyrilvallez changed the title ~~[core] Refactor from_pretrained~~ [core] Large/full refactor of from_pretrained Feb 10, 2025

ArthurZucker self-requested a review February 10, 2025 18:21

Cyrilvallez force-pushed the refactor-from-pretrained branch from c727a4f to 6640a27 Compare February 11, 2025 12:31

ArthurZucker mentioned this pull request Feb 11, 2025

Fix Llama QnA from_pretrained #34061

Closed

5 tasks

Cyrilvallez force-pushed the refactor-from-pretrained branch from 8dffa45 to 35bcb2d Compare February 12, 2025 11:08

SunMarc self-requested a review February 12, 2025 17:34

Cyrilvallez mentioned this pull request Feb 14, 2025

Missing weights not initialized properly #35437 #35913

Closed

SunMarc reviewed Feb 18, 2025

View reviewed changes

Cyrilvallez force-pushed the refactor-from-pretrained branch from 35bcb2d to 7439363 Compare February 20, 2025 11:02

Cyrilvallez mentioned this pull request Feb 26, 2025

Load models much faster on accelerator devices!! #36380

Merged

ydshieh mentioned this pull request Mar 4, 2025

Saving model with shared tensors fails on cpu but succeeds on gpu #33688

Closed

4 tasks

Cyrilvallez force-pushed the refactor-from-pretrained branch 4 times, most recently from e82189a to db9d0b0 Compare March 11, 2025 12:54

ArthurZucker reviewed Mar 12, 2025

View reviewed changes

src/transformers/modeling_utils.py Show resolved Hide resolved

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_utils.py Show resolved Hide resolved

src/transformers/modeling_utils.py Show resolved Hide resolved

Cyrilvallez force-pushed the refactor-from-pretrained branch from 420e87d to dbbe05f Compare March 12, 2025 08:32

Cyrilvallez mentioned this pull request Mar 12, 2025

fix block mask typing #36661

Merged

Cyrilvallez force-pushed the refactor-from-pretrained branch from 49aef66 to c0f3057 Compare March 12, 2025 10:31

Cyrilvallez added 8 commits March 12, 2025 12:02

fix allocation

b36bf45

CIs

2f0bbbd

CIs

49e0f03

CIs

6b7062c

improve docstring

54cf71c

CIs

eb38c5d

Update modeling_utils.py

5bec6cf

fix

9723be2

Cyrilvallez force-pushed the refactor-from-pretrained branch from ea183ff to 9723be2 Compare March 12, 2025 11:42

ArthurZucker merged commit 071a161 into main Mar 12, 2025
20 of 24 checks passed

ArthurZucker deleted the refactor-from-pretrained branch March 12, 2025 12:39

Cyrilvallez mentioned this pull request Mar 12, 2025

FIx base model loading #36581

Closed

winglian mentioned this pull request Mar 12, 2025

don't pass NoneType for keep_in_fp32_modules #36675

Closed

5 tasks

gante mentioned this pull request Mar 14, 2025

[model loading] don't gc.collect() if only 1 shard is used #36721

Merged

Cyrilvallez mentioned this pull request Mar 14, 2025

Simplify keep_in_fp32_modules logic #36722

Merged

davidmezzetti mentioned this pull request Mar 21, 2025

Transformers 4.50 modified cached_file behavior neuml/txtai#889

Closed

kmehant mentioned this pull request Mar 25, 2025

bug: broken TP training since tensor_parallel public API is removed huggingface/accelerate#3456

Closed

4 tasks

dragoneye-alex mentioned this pull request Apr 2, 2025

_new_from_pretrained does not patch from_pretrained correctly with Transformers v4.50 NVIDIA/Model-Optimizer#169

Closed

manueldeprada mentioned this pull request Apr 15, 2025

Unrecognized model in Qwen/Qwen2.5-Coder-7B-Instruct #37477

Closed

4 tasks

manueldeprada mentioned this pull request Apr 15, 2025

fix: Restore explicit error surfacing for unexpected hub exceptions #37525

Merged

geetu040 mentioned this pull request May 21, 2025

&julianfong [MNT] CI steps to test by estimator with estimator specific dependencies, controlled by tag for separate VM sktime/sktime#7087

Merged

deadprogram mentioned this pull request May 28, 2025

Error on loading the model from huggingface marqo-ai/marqo-FashionCLIP#6

Open

ydshieh mentioned this pull request Jul 7, 2025

Skip test_eager_matches sdpa generate and update an integration test for blip-like models #39248

Merged

pbielak mentioned this pull request Jul 28, 2025

Fix test_model_weights_reload_no_missing_tied_weights huggingface/optimum-habana#2175

Merged

Conversation

Cyrilvallez commented Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Preview

Some pointers

Uh oh!

HuggingFaceDocBuilderDev commented Feb 4, 2025

Uh oh!

ydshieh commented Feb 11, 2025

Uh oh!

Cyrilvallez commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

poedator commented Feb 19, 2025

Uh oh!

ydshieh commented Feb 19, 2025

Uh oh!

Cyrilvallez commented Feb 19, 2025

Uh oh!

ydshieh commented Feb 19, 2025

Uh oh!

poedator commented Feb 19, 2025

Uh oh!

Cyrilvallez commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArthurZucker commented Mar 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Cyrilvallez commented Feb 4, 2025 •

edited

Loading

Cyrilvallez commented Feb 12, 2025 •

edited

Loading

Cyrilvallez commented Feb 19, 2025 •

edited

Loading