Skip to content

Whisper: fall back to canonical openai/whisper-* processor when mlx-community repos lack one#712

Draft
contrapuntal wants to merge 1 commit into
Blaizzy:mainfrom
contrapuntal:fix/whisper-processor-fallback-645
Draft

Whisper: fall back to canonical openai/whisper-* processor when mlx-community repos lack one#712
contrapuntal wants to merge 1 commit into
Blaizzy:mainfrom
contrapuntal:fix/whisper-processor-fallback-645

Conversation

@contrapuntal
Copy link
Copy Markdown
Contributor

@contrapuntal contrapuntal commented May 6, 2026

Loading any mlx-community whisper repo (whisper-large-v3-mlx, whisper-base-mlx-4bit, whisper-base.en-mlx, etc.) crashes on first transcription with ValueError: Processor not found. These repos ship weights only, so WhisperProcessor.from_pretrained raises during load — leaving _processor = None after only a warning.

This PR adds a fallback: read the architecture signature from config.json and retry the processor load against the canonical openai/whisper-* repo that produced this architecture. Processor files are architecture-independent (~4 MB), so a one-time download recovers transcription with no user intervention.

Architecture-keyed lookup

The 5-tuple (n_audio_state, n_mels, n_audio_layer, n_text_layer, n_vocab) from config.json uniquely identifies each canonical openai/whisper variant. vocab_size = 51864 flags .en English-only models; n_mels = 128 flags large-v3 family; n_text_layer = 4 flags large-v3-turbo. large-v1 and large-v2 share dims (their processor files are interchangeable), so the lookup maps that signature to large-v2.

Identifying by dims rather than directory name handles the real mlx-community landscape uniformly — ~50+ repos with arbitrary suffixes like -4bit, -8bit, -q4, -fp32, -asr-*, plus user-renamed local directories. Both openai/mlx config keys (n_audio_state, n_mels, …) and HF Transformers keys (d_model, num_mel_bins, …) are read.

Error handling

Catches OSError specifically on the local load (transformers' signal for missing files) instead of bare Exception. Other failures — corrupt JSON, permission errors — propagate so a fine-tuned local checkpoint can't be silently masked by the canonical OpenAI processor; a vocab mismatch would generate garbage transcription with no error signal otherwise. The transformers import catches ImportError specifically.

Network behavior

When the fallback fires, the canonical processor is fetched from HF Hub (~4 MB). Set HF_HUB_OFFLINE=1 or TRANSFORMERS_OFFLINE=1 in air-gapped environments — transformers raises a clear offline-mode error rather than waiting on a network timeout. Documented in the load-helper docstring.

Behavior

Before, on mlx-community/whisper-base-mlx-4bit (or any non-canonical name):

warning: Could not load WhisperProcessor: Can't load feature extractor for '<path>'...
model._processor = None
# later, on first generate():
ValueError: Processor not found. Make sure the model was loaded with a HuggingFace processor.

After:

warning: Loaded WhisperProcessor from openai/whisper-base as fallback because <path> is missing processor files: ...
model._processor = <WhisperProcessor>
# transcription succeeds

Repos whose architecture isn't a recognized canonical variant, or that lack a readable config.json, preserve the existing "warn and set _processor = None" behavior.

Tests: 11 unittest cases covering dim-based resolution (tiny / quantized / large-v3 / large-v3-turbo / .en / HF Transformers config format), behavior preservation (missing config / unknown dims / local success), and error propagation (ValueError propagates, canonical fallback failure leaves _processor = None).

Fixes #645

@contrapuntal contrapuntal force-pushed the fix/whisper-processor-fallback-645 branch 2 times, most recently from 80ff9ac to ad33fd5 Compare May 6, 2026 17:32
…izzy#645)

mlx-community whisper conversions ship weights only — no
preprocessor_config.json or tokenizer files — so
WhisperProcessor.from_pretrained silently fails on load and the model
crashes with `ValueError: Processor not found.` on first generate().

Recover by reading the architecture signature from config.json — the
5-tuple (n_audio_state, n_mels, n_audio_layer, n_text_layer, n_vocab)
uniquely identifies each canonical openai/whisper variant, including all
.en English-only models and large-v3-turbo (distinguished by 4 decoder
layers vs 32). Map the signature to the corresponding openai/whisper-*
repo and retry the processor load there.

Identifying by dims rather than directory name handles the real
mlx-community landscape — ~50+ repos with arbitrary suffixes
(whisper-large-v3-mlx-4bit, whisper-base-mlx-q4, whisper-base.en-mlx-fp32,
whisper-large-v3-asr-4bit, etc.) and user-renamed local directories.

Also tightens error handling:

  * Catch OSError specifically on the local load (transformers' signal
    for missing files) rather than bare Exception. Other failures —
    corrupt JSON, permission errors — propagate so a fine-tuned local
    checkpoint can't be silently masked by the canonical OpenAI processor
    (a vocab mismatch would generate garbage transcription with no error
    signal).

  * Catch ImportError specifically on the transformers import.

Documents HF_HUB_OFFLINE / TRANSFORMERS_OFFLINE in the load-helper
docstring so users in air-gapped environments know how to suppress the
fallback's network round-trip.

Handles both openai/mlx config keys (n_audio_state, n_mels, …) and HF
Transformers keys (d_model, num_mel_bins, …).

Fixes Blaizzy#645
@contrapuntal contrapuntal force-pushed the fix/whisper-processor-fallback-645 branch from ad33fd5 to 060b813 Compare May 6, 2026 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Whisper post_load_hook should fall back to canonical openai/whisper-* processor files when mlx-community repos lack them

1 participant