Skip to content

Conversation

@christian-lms
Copy link
Contributor

@christian-lms christian-lms commented Aug 5, 2025

Contribution by LM Studio Team. Initial implementation, more details to follow. Please note numerical precision considerations vs. the reference implementation.

Co-authored-by: Neil Mehta <neil@lmstudio.ai>
Co-authored-by: Matt Clayton <matt@lmstudio.ai>
Comment on lines +388 to +389
x = x * mx.expand_dims(expert_weights, axis=2)
return x.sum(axis=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes it's important to do the sum in fp32.. maybe check another implementation to see?

@psm-2
Copy link

psm-2 commented Aug 5, 2025

@awni Is fine-tuning included in this update?

@niryuu
Copy link

niryuu commented Aug 5, 2025

ValueError: Unsupported quantization method mxfp4
Do we need additional libraries update?

@christian-lms
Copy link
Contributor Author

@niryuu where are you getting this error?

@awni
Copy link
Member

awni commented Aug 5, 2025

Presumably from trying to load the original model in mlx-lm (which is expected to not work). You can reproduce it with e.g. mlx_lm.convert --hf-path openai/gpt-oss-20b. I'll try and add a dequant path for that so we can load them directly.

@awni
Copy link
Member

awni commented Aug 5, 2025

Ok you should be able to load the original models now. I added a dequant step if needed:

mlx_lm.convert --hf-path openai/gpt-oss-20b -q --q-bits 8

And:

mlx_lm.generate --model mlx_model --prompt "Write a story about Einstein" -m 512

On an M2 Ultra:

<|channel|>analysis<|message|>The user: "Write a story about Einstein". They want a story about Einstein. We need to produce a story. The user didn't specify length or style. We can choose a creative narrative. Maybe a fictional story where Einstein is a character, or a story about his life, or a whimsical story. We can incorporate his personality, his relativity, his curiosity, his interactions. We can make it a short story, maybe with a twist. The user didn't specify constraints. We can produce a story that is engaging, maybe with a moral. Let's think: We can write a story about Einstein as a child, or as a professor, or as a time traveler. Or we can write a story about a young student who meets Einstein. Or a story about Einstein's relativity in a metaphorical sense. Or a story about Einstein's love for music. Or a story about Einstein's relationship with his wife. Or a story about Einstein's involvement in the Manhattan Project. Or a story about Einstein's philosophical musings. Or a story about Einstein's time in America. Or a story about Einstein's interactions with other scientists. Or a story about Einstein's childhood in Ulm. Or a story about Einstein's later years. Or a story about Einstein's "thought experiment" of riding a beam of light. Or a story about Einstein's "relativity of simultaneity" in a narrative. Or a story about Einstein's "E=mc^2" as a metaphor for love. Or a story about Einstein's "relativity" in a social context. Or a story about Einstein's "relativity" in a comedic way. Or a story about Einstein's "relativity" in a magical realism style. Or a story about Einstein's "relativity" in a sci-fi setting. Or a story about Einstein's "relativity" in a children's story. The user didn't specify. We can choose a creative approach. Let's write a story about a young boy who meets Einstein in a dream, and learns about relativity and the importance of curiosity. Or a story about Einstein's time in Princeton, where he meets a young student. Or a story about Einstein's "thought experiment" of riding a light beam, but in a narrative form. Or a story about Einstein's "relativity" as a metaphor for empathy. Or a story about Einstein's "relativity" in a comedic way. Let's choose a story that is accessible, maybe a short story with a moral. Let's write a story about a young
==========
Prompt: 72 tokens, 224.537 tokens-per-sec
Generation: 512 tokens, 84.333 tokens-per-sec
Peak memory: 22.345 GB

@christian-lms
Copy link
Contributor Author

Yep, it works! Uploading quants now. Are we good to merge?

@stakodiak
Copy link

Yep. LGTM

Copy link
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @christian-lms and the rest of the LM Studio team!!

@awni awni merged commit 667a711 into ml-explore:main Aug 5, 2025
4 checks passed
@psm-2
Copy link

psm-2 commented Aug 6, 2025

@awni Would it be possible to support fine-tuning?
ValueError: Lora does not support gpt_oss_moe

@altaic
Copy link

altaic commented Aug 6, 2025

@psm-2 looks like all the other support will land soon in #357

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants