TensorZero Gateway Ignores timeouts Setting for Ollama Models
#3884
hutho
started this conversation in
Bug Reports
Replies: 1 comment
-
|
Hi @hutho, we currently have a global server level timeout of 5 minutes for all requests to prevent resource leaks. This is why you're seeing this behavior. To be honest, a 20-minute request is probably not the best use case for HTTP. We're thinking about how to handle longer-lived requests for models like OpenAI's GPT 5 pro and others with long-running requests and will likely build a general solution (polling or equivalent). |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Bug Report: TensorZero Gateway Ignores
timeoutsSetting for Ollama ModelsSummary:
The TensorZero Gateway appears to ignore the
timeoutssetting intensorzero.tomlfor Ollama models. This causes requests to large models that take longer than the default 5 minutes to load to fail with a timeout error.System Details:
Configuration:
docker-compose.yml:tensorzero.toml:Steps to Reproduce:
0.0.0.0).docker-compose up -d.curlcommand to make a request to a large Ollama model:nullor empty response.Gateway Log Snippet:
Troubleshooting Steps Taken:
tensorzero.tomlfile is correctly mounted into the gateway container.[models.ollama-gpt-oss-20b].timeouts) and the provider level ([models.ollama-gpt-oss-20b.providers.ollama].timeouts).timeout_mssetting, which resulted in a configuration parsing error.Expected Behavior:
The TensorZero Gateway should respect the
timeoutssetting intensorzero.tomland wait for the specified duration before timing out.Actual Behavior:
The gateway consistently times out at the default 5 minutes, ignoring the configured
timeoutsvalue.Beta Was this translation helpful? Give feedback.
All reactions