vllm-project / tpu-inference Public

Notifications You must be signed in to change notification settings
Fork 21
Star 143

Code
Issues 10
Pull requests 57
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: vllm-project/tpu-inference

Labels 10 Milestones 0

New pull request New

57 Open 936 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[WIP] [TPU host offload] Setup precompile functions to TPU host offload

#1019 opened Nov 5, 2025 by saikat-royc

Loading…

Remove SKIP_JAX_PRECOMPILE

#1018 opened Nov 5, 2025 by kyuyeunk

Loading…

Remove JAX_RANDOM_WEIGHTS

#1017 opened Nov 5, 2025 by kyuyeunk

Loading…

Support Embedding Model/Task

#1015 opened Nov 5, 2025 by carlesoctav

Loading…

Fix default value for USE_MOE_EP_KERNEL

#1014 opened Nov 5, 2025 by kyuyeunk

Loading…

initial commit on compressed-tensors quantization support for fp8

#1011 opened Nov 4, 2025 by qihqi • Draft

Add async scheduler test to CI

#1010 opened Nov 4, 2025 by jcyang43

Loading…

[GPT-OSS] Load MXFP4 weights directly and dequantize online

#992 opened Oct 31, 2025 by amishacorns

Loading…

async scheduler fix _substitute_placeholder_token_fn bug

#991 opened Oct 31, 2025 by cychiuak

Loading…

[Misc] Add CODEOWNERS to the project.

#988 opened Oct 31, 2025 by py4

Loading…

[Feature] Add automated PyPI publishing workflow

#985 opened Oct 31, 2025 by ylangtsou

Loading…

Update tpu_worker_jax.py

#982 opened Oct 30, 2025 by fenyuan-gg

Loading…

Add lora layer tests

#981 opened Oct 30, 2025 by vanbasten23

Loading…

[GPT-OSS] add unit test for GPT-OSS

#978 opened Oct 30, 2025 by bzgoogle • Draft

[DeepSeek][Do-Not-Merge] experimental PR for DeepSeek optimization

#977 opened Oct 30, 2025 by bzgoogle • Draft

[FIX] Add dummy get_input_embeddings to fix vLLM model type check

#971 opened Oct 29, 2025 by kuafou

Loading…

[TPU host offload][FIX] Keep KV cache using NamedSharding on CPU

#970 opened Oct 29, 2025 by juncgu-google

Loading…

[Llama4/JAX] Llama4 FP8 Quantized Weight Loading and Sharding

#962 opened Oct 28, 2025 by sierraisland

Loading…

[Spec Decoding] Reduce TPU <-> CPU data transfer

#961 opened Oct 28, 2025 by Lumosis

Loading…

Update README.md

#956 opened Oct 27, 2025 by bvrockwell

Loading…

Install runai-model-streamer module in Dockerfile

#955 opened Oct 27, 2025 by amacaskill

Loading…

dtype in ModelConfig will be implicitly casted to torch.dtype so in tpu_jax, we need to check for torch dtype as well

#945 opened Oct 27, 2025 by lc5211

Loading…

[GPT-OSS] enable attention-sink

#943 opened Oct 27, 2025 by bzgoogle • Draft

[do not review yet] Get rid of a2a

#939 opened Oct 24, 2025 by vanbasten23 • Draft

[multi-host] add quick start guide

#928 opened Oct 23, 2025 by Lumosis

Loading…

Previous 1 2 3 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!