-
Notifications
You must be signed in to change notification settings - Fork 21
Pull requests: vllm-project/tpu-inference
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[WIP] [TPU host offload] Setup precompile functions to TPU host offload
#1019
opened Nov 5, 2025 by
saikat-royc
Loading…
[GPT-OSS] Load MXFP4 weights directly and dequantize online
#992
opened Oct 31, 2025 by
amishacorns
Loading…
async scheduler fix _substitute_placeholder_token_fn bug
#991
opened Oct 31, 2025 by
cychiuak
Loading…
[FIX] Add dummy get_input_embeddings to fix vLLM model type check
#971
opened Oct 29, 2025 by
kuafou
Loading…
[TPU host offload][FIX] Keep KV cache using NamedSharding on CPU
#970
opened Oct 29, 2025 by
juncgu-google
Loading…
[Llama4/JAX] Llama4 FP8 Quantized Weight Loading and Sharding
#962
opened Oct 28, 2025 by
sierraisland
Loading…
dtype in ModelConfig will be implicitly casted to torch.dtype so in tpu_jax, we need to check for torch dtype as well
#945
opened Oct 27, 2025 by
lc5211
Loading…
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.