Skip to content

[core] Large/full refactor of from_pretrained#36033

Merged
ArthurZucker merged 40 commits intomainfrom
refactor-from-pretrained
Mar 12, 2025
Merged

[core] Large/full refactor of from_pretrained#36033
ArthurZucker merged 40 commits intomainfrom
refactor-from-pretrained

Conversation

@Cyrilvallez
Copy link
Member

@Cyrilvallez Cyrilvallez commented Feb 4, 2025

What does this PR do?

Preview

Large refactor of from_pretrained. The most important updates are the following:

  • Much more readable and maintainable in the future (hopefully, at least IMO). Simplified a looooot of weird/dead/useless code that accumulated over the years
  • faster model downloads (concurrent file download)
  • new keyword argument key_mapping, allowing direct mapping of the weight names if loading a model from the hub which is compatible to given arch, but was not converted accordingly. For example, the following snippet works nicely:
from transformers import LlamaForCausalLM, AutoTokenizer, LlamaConfig

key_mapping = {
    r"^transformer.wte": r"model.embed_tokens",
    r"^transformer.rotary": r"model.rotary_emb", 
    r"^transformer.ln_f": r"model.norm", 

    r"^transformer.h.(\d+).ln_1": r"model.layers.\1.input_layernorm",
    r"^transformer.h.(\d+).ln_2": r"model.layers.\1.post_attention_layernorm",

    r"^transformer.h.(\d+).mlp.c_fc_0": r"model.layers.\1.mlp.gate_proj",
    r"^transformer.h.(\d+).mlp.c_fc_1": r"model.layers.\1.mlp.up_proj",
    r"^transformer.h.(\d+).mlp.c_proj": r"model.layers.\1.mlp.down_proj",

    r"^transformer.h.(\d+).attn.attention.k_proj": r"model.layers.\1.self_attn.k_proj",
    r"^transformer.h.(\d+).attn.attention.v_proj": r"model.layers.\1.self_attn.v_proj",
    r"^transformer.h.(\d+).attn.attention.q_proj": r"model.layers.\1.self_attn.q_proj",
    r"^transformer.h.(\d+).attn.attention.out_proj": r"model.layers.\1.self_attn.o_proj",
}

model_id = "LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct"

config = LlamaConfig(num_hidden_layers=32,
                     num_key_value_heads=8,
                     intermediate_size=14336,
                     hidden_size=4096,
                     bos_token_id=1,
                     eos_token_id= 361,
                     pad_token_id=0,
                     rope_theta=500000.0,
                     vocab_size=102400
                     )
model = LlamaForCausalLM.from_pretrained(model_id, config=config, key_mapping=key_mapping, torch_dtype="float16", device_map=0)

and will allow to simplify model conversions. It will also help when teams want to add their models to the library, but tey are 1:1 compatible with existing archs.

Some pointers

Files modified apart from modeling_utils.py and hub.py are due to the removal of legacy functions get_file_from_repo and _load_pretrained_model_low_mem during the cleanup.

Note that I was as careful as can be during the refactor since this is extremely critical code. All tests are passing on the CI, and I tested the most common scenarios, as well as the new features added. Everything seem to work.

However, I am deeply sorry for the person reviewing @ArthurZucker 😆🥲🤗

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Cyrilvallez Cyrilvallez force-pushed the refactor-from-pretrained branch from 1e626dd to bbab9b2 Compare February 5, 2025 09:34
@Cyrilvallez Cyrilvallez marked this pull request as ready for review February 10, 2025 14:19
@Cyrilvallez Cyrilvallez changed the title [core] Refactor from_pretrained [core] Large/full refactor of from_pretrained Feb 10, 2025
@ArthurZucker ArthurZucker self-requested a review February 10, 2025 18:21
@Cyrilvallez Cyrilvallez force-pushed the refactor-from-pretrained branch from c727a4f to 6640a27 Compare February 11, 2025 12:31
@ArthurZucker ArthurZucker mentioned this pull request Feb 11, 2025
5 tasks
@Cyrilvallez
Copy link
Member Author

Cyrilvallez commented Feb 12, 2025

All our test suite passes. See here for new failures compared to last full CI run, but I personally checked that they are all unrelated and are due to the diff between the tip of this branch and main.
Now rebasing

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this huge work @Cyrilvallez ! Left a few comments

@poedator
Copy link
Contributor

Hi, @Cyrilvallez! This refactor was long needed. Thank you for streamlining modeling_utils.py!

is it possible to avoid creating state_dict in RAM altogether and loading model params straight to GPU(s)? sometimes a server has too little RAM or it may be taken by tmpfs.
Perhaps single param/buffers items could be sent to GPU right after loading to disk, without waiting for whole state_dict to load?

Also please ensure that quantized models loading/saving is tested in this PR.

@ydshieh
Copy link
Collaborator

ydshieh commented Feb 19, 2025

Also please ensure that quantized models loading/saving is tested in this PR.

Hmm, indeed, I need to trigger quantization CI too!

@Cyrilvallez
Copy link
Member Author

Oh I thought we did already! No worries, I'll ping you on slack @ydshieh, I first have to do a few changes.

@poedator thanks! Concerning the loading, note that state dicts on the hub are usually quite small (never bigger than 10 GB), so loading everything in ram is never an issue. It would otherwise probably be very slow!

@ydshieh
Copy link
Collaborator

ydshieh commented Feb 19, 2025

Oh I thought we did already!

My bad. I am (almost always) focus on the CI jobs regarding tests/models/xxx, but this time it is indeed important to check the CI jobs regarding quantization/xxx

@poedator
Copy link
Contributor

note that state dicts on the hub are usually quite small (never bigger than 10 GB), so loading everything in ram is never an issue. It would otherwise probably be very slow!

But what about cases like loading models like Llama 70B, which is 140GB in 16bit, or loading 34 B model into 8 GPUs for data parallel, which takes 8 x 34 x 2 = 544GB of RAM?

@Cyrilvallez
Copy link
Member Author

Cyrilvallez commented Feb 19, 2025

State dicts are loaded one after the other, so no need to worry! See e.g. here for Llama 70B: each state dict is at most 5 GB

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big big PR!

@Cyrilvallez Cyrilvallez force-pushed the refactor-from-pretrained branch from 420e87d to dbbe05f Compare March 12, 2025 08:32
@Cyrilvallez Cyrilvallez force-pushed the refactor-from-pretrained branch from 49aef66 to c0f3057 Compare March 12, 2025 10:31
@Cyrilvallez Cyrilvallez force-pushed the refactor-from-pretrained branch from ea183ff to 9723be2 Compare March 12, 2025 11:42
@ArthurZucker
Copy link
Collaborator

Merging!

@ArthurZucker ArthurZucker merged commit 071a161 into main Mar 12, 2025
20 of 24 checks passed
@ArthurZucker ArthurZucker deleted the refactor-from-pretrained branch March 12, 2025 12:39
manueldeprada added a commit that referenced this pull request Apr 15, 2025
Prior to PR #36033, unexpected exceptions (e.g., ModuleNotFoundError) during hub model loading were not swallowed silently. They either matched specific except blocks or were raised.

After #36033, a catch-all except Exception block was introduced without a fallback else, causing unknown errors to be silently ignored and leading to misleading downstream behavior.

This commit adds an `else: raise e` to ensure only explicitly handled exceptions are suppressed. All others are surfaced, restoring pre-4.50 behavior and aiding in debugging and dependency visibility.
manueldeprada added a commit that referenced this pull request Apr 15, 2025
…37525)

* fix: Restore explicit error surfacing for unexpected hub exceptions

Prior to PR #36033, unexpected exceptions (e.g., ModuleNotFoundError) during hub model loading were not swallowed silently. They either matched specific except blocks or were raised.

After #36033, a catch-all except Exception block was introduced without a fallback else, causing unknown errors to be silently ignored and leading to misleading downstream behavior.

This commit adds an `else: raise e` to ensure only explicitly handled exceptions are suppressed. All others are surfaced, restoring pre-4.50 behavior and aiding in debugging and dependency visibility.

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
cyr0930 pushed a commit to cyr0930/transformers that referenced this pull request Apr 18, 2025
…uggingface#37525)

* fix: Restore explicit error surfacing for unexpected hub exceptions

Prior to PR huggingface#36033, unexpected exceptions (e.g., ModuleNotFoundError) during hub model loading were not swallowed silently. They either matched specific except blocks or were raised.

After huggingface#36033, a catch-all except Exception block was introduced without a fallback else, causing unknown errors to be silently ignored and leading to misleading downstream behavior.

This commit adds an `else: raise e` to ensure only explicitly handled exceptions are suppressed. All others are surfaced, restoring pre-4.50 behavior and aiding in debugging and dependency visibility.

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
…uggingface#37525)

* fix: Restore explicit error surfacing for unexpected hub exceptions

Prior to PR huggingface#36033, unexpected exceptions (e.g., ModuleNotFoundError) during hub model loading were not swallowed silently. They either matched specific except blocks or were raised.

After huggingface#36033, a catch-all except Exception block was introduced without a fallback else, causing unknown errors to be silently ignored and leading to misleading downstream behavior.

This commit adds an `else: raise e` to ensure only explicitly handled exceptions are suppressed. All others are surfaced, restoring pre-4.50 behavior and aiding in debugging and dependency visibility.

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
pbielak pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jul 28, 2025
Since Transformers 4.51, the `.from_pretrained` method includes model
prefixes when reporting missing parameters. Following PR [1], this
commit aligns the `test_model_weights_reload_no_missing_tied_weights`
test to account for the new prefix behaviour.

[1] huggingface/transformers#36033
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants