Skip to content

Conversation

@jvmncs
Copy link
Contributor

@jvmncs jvmncs commented Jul 16, 2025

This example runs Kimi-K2-Instruct at native precision with:

  • 4 nodes with 8x H100 GPUs each (32 H100s total)
  • Tensor parallel size: 16, Pipeline parallel size: 2
  • RDMA networking for high-performance inter-node communication
  • Ray for distributed orchestration
  • vLLM nightly build for Kimi-K2-Instruct pipeline parallelism support

Checklist

  • Example is documented with comments throughout, in a Literate Programming style.
  • Example does not require third-party dependencies to be installed locally
  • Example follows the style guide
  • Example pins its dependencies
    • Example pins container images to a stable tag, not a dynamic tag like latest
    • Example specifies a python_version for the base image, if it is used
    • Example pins all dependencies to at least minor version, ~=x.y.z or ==x.y
    • Example dependencies with version < 1 are pinned to patch version, ==0.y.z

(Modal's internal guide page for this repo is Multi-node examples guidance.)

@jvmncs jvmncs force-pushed the kimi-k2-inference branch 2 times, most recently from 65e39de to 0938de2 Compare July 18, 2025 12:48
@jvmncs jvmncs force-pushed the kimi-k2-inference branch from 0938de2 to e6da57e Compare July 18, 2025 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants