TokenSpeed is a speed-of-light LLM inference engine.
-
Updated
May 17, 2026 - Python
TokenSpeed is a speed-of-light LLM inference engine.
Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.
Add a description, image, and links to the lightseek topic page so that developers can more easily learn about it.
To associate your repository with the lightseek topic, visit your repo's landing page and select "manage topics."