Whisper: fall back to canonical openai/whisper-* processor when mlx-community repos lack one by contrapuntal · Pull Request #712 · Blaizzy/mlx-audio

contrapuntal · 2026-05-06T02:47:18Z

Loading any mlx-community whisper repo (whisper-large-v3-mlx, whisper-base-mlx-4bit, whisper-base.en-mlx, etc.) crashes on first transcription with ValueError: Processor not found. These repos ship weights only, so WhisperProcessor.from_pretrained raises during load — leaving _processor = None after only a warning.

This PR adds a fallback: read the architecture signature from config.json and retry the processor load against the canonical openai/whisper-* repo that produced this architecture. Processor files are architecture-independent (~4 MB), so a one-time download recovers transcription with no user intervention.

Architecture-keyed lookup

The 5-tuple (n_audio_state, n_mels, n_audio_layer, n_text_layer, n_vocab) from config.json uniquely identifies each canonical openai/whisper variant. vocab_size = 51864 flags .en English-only models; n_mels = 128 flags large-v3 family; n_text_layer = 4 flags large-v3-turbo. large-v1 and large-v2 share dims (their processor files are interchangeable), so the lookup maps that signature to large-v2.

Identifying by dims rather than directory name handles the real mlx-community landscape uniformly — ~50+ repos with arbitrary suffixes like -4bit, -8bit, -q4, -fp32, -asr-*, plus user-renamed local directories. Both openai/mlx config keys (n_audio_state, n_mels, …) and HF Transformers keys (d_model, num_mel_bins, …) are read.

Error handling

Catches OSError specifically on the local load (transformers' signal for missing files) instead of bare Exception. Other failures — corrupt JSON, permission errors — propagate so a fine-tuned local checkpoint can't be silently masked by the canonical OpenAI processor; a vocab mismatch would generate garbage transcription with no error signal otherwise. The transformers import catches ImportError specifically.

Network behavior

When the fallback fires, the canonical processor is fetched from HF Hub (~4 MB). Set HF_HUB_OFFLINE=1 or TRANSFORMERS_OFFLINE=1 in air-gapped environments — transformers raises a clear offline-mode error rather than waiting on a network timeout. Documented in the load-helper docstring.

Behavior

Before, on mlx-community/whisper-base-mlx-4bit (or any non-canonical name):

warning: Could not load WhisperProcessor: Can't load feature extractor for '<path>'...
model._processor = None
# later, on first generate():
ValueError: Processor not found. Make sure the model was loaded with a HuggingFace processor.

After:

warning: Loaded WhisperProcessor from openai/whisper-base as fallback because <path> is missing processor files: ...
model._processor = <WhisperProcessor>
# transcription succeeds

Repos whose architecture isn't a recognized canonical variant, or that lack a readable config.json, preserve the existing "warn and set _processor = None" behavior.

Tests: 11 unittest cases covering dim-based resolution (tiny / quantized / large-v3 / large-v3-turbo / .en / HF Transformers config format), behavior preservation (missing config / unknown dims / local success), and error propagation (ValueError propagates, canonical fallback failure leaves _processor = None).

Fixes #645

…izzy#645) mlx-community whisper conversions ship weights only — no preprocessor_config.json or tokenizer files — so WhisperProcessor.from_pretrained silently fails on load and the model crashes with `ValueError: Processor not found.` on first generate(). Recover by reading the architecture signature from config.json — the 5-tuple (n_audio_state, n_mels, n_audio_layer, n_text_layer, n_vocab) uniquely identifies each canonical openai/whisper variant, including all .en English-only models and large-v3-turbo (distinguished by 4 decoder layers vs 32). Map the signature to the corresponding openai/whisper-* repo and retry the processor load there. Identifying by dims rather than directory name handles the real mlx-community landscape — ~50+ repos with arbitrary suffixes (whisper-large-v3-mlx-4bit, whisper-base-mlx-q4, whisper-base.en-mlx-fp32, whisper-large-v3-asr-4bit, etc.) and user-renamed local directories. Also tightens error handling: * Catch OSError specifically on the local load (transformers' signal for missing files) rather than bare Exception. Other failures — corrupt JSON, permission errors — propagate so a fine-tuned local checkpoint can't be silently masked by the canonical OpenAI processor (a vocab mismatch would generate garbage transcription with no error signal). * Catch ImportError specifically on the transformers import. Documents HF_HUB_OFFLINE / TRANSFORMERS_OFFLINE in the load-helper docstring so users in air-gapped environments know how to suppress the fallback's network round-trip. Handles both openai/mlx config keys (n_audio_state, n_mels, …) and HF Transformers keys (d_model, num_mel_bins, …). Fixes Blaizzy#645

contrapuntal force-pushed the fix/whisper-processor-fallback-645 branch 2 times, most recently from 80ff9ac to ad33fd5 Compare May 6, 2026 17:32

contrapuntal force-pushed the fix/whisper-processor-fallback-645 branch from ad33fd5 to 060b813 Compare May 6, 2026 17:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Whisper: fall back to canonical openai/whisper-* processor when mlx-community repos lack one#712

Whisper: fall back to canonical openai/whisper-* processor when mlx-community repos lack one#712
contrapuntal wants to merge 1 commit into
Blaizzy:mainfrom
contrapuntal:fix/whisper-processor-fallback-645

contrapuntal commented May 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

contrapuntal commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

contrapuntal commented May 6, 2026 •

edited

Loading