TensorZero Gateway Ignores `timeouts` Setting for Ollama Models #3884

hutho · 2025-10-11T03:10:53Z

hutho
Oct 11, 2025

Bug Report: TensorZero Gateway Ignores timeouts Setting for Ollama Models

Summary:

The TensorZero Gateway appears to ignore the timeouts setting in tensorzero.toml for Ollama models. This causes requests to large models that take longer than the default 5 minutes to load to fail with a timeout error.

System Details:

TensorZero Version: version 2025.7.4.
Ollama Version: ollama version is 0.12.3
Operating System: macOS
Installation Method:
- TensorZero: Running via Docker Compose, built from source.
- Ollama: Installed via Homebrew.

Configuration:

docker-compose.yml:

services:
  ui:
    build:
      context: .
      dockerfile: ui/Dockerfile
    ports:
      - "5173:4000"
    environment:
      - TENSORZERO_CLICKHOUSE_URL=http://tensorzero_user:tensorzero_password@clickhouse:8123/tensorzero
      - TENSORZERO_GATEWAY_URL=http://gateway:3000
    volumes:
      - ./tensorzero.toml:/app/config/tensorzero.toml
    depends_on:
      - gateway
    networks:
      - tensorzero_network
  gateway:
    build:
      context: .
      dockerfile: gateway/Dockerfile
    ports:
      - "3000:3000"
    environment:
      - TENSORZERO_CLICKHOUSE_URL=http://tensorzero_user:tensorzero_password@clickhouse:8123/tensorzero
      - TENSORZERO_CONFIG_FILE=/app/config/tensorzero.toml
    volumes:
      - ./tensorzero.toml:/app/config/tensorzero.toml
    depends_on:
      clickhouse:
        condition: service_healthy
    networks:
      - tensorzero_network
# ... (clickhouse service)

tensorzero.toml:

[models.ollama-gpt-oss-20b]
routing = ["ollama"]
timeouts = { non_streaming.total_ms = 1200000 } # 20 minutes

[models.ollama-gpt-oss-20b.providers.ollama]
type = "openai"
api_base = "http://host.docker.internal:11434/v1"
model_name = "gpt-oss:20b"
api_key_location = "none"

Steps to Reproduce:

Ensure the Ollama service is running and accessible from Docker (i.e., listening on 0.0.0.0).
Start the TensorZero gateway using docker-compose up -d.

Execute the following curl command to make a request to a large Ollama model:

curl -s -X POST "http://localhost:3000/openai/v1/chat/completions" -H "Content-Type: application/json" -d '''{"model": "tensorzero::function_name::gpt_oss_chat","messages": [{"role": "user","content": "Write a haiku about a whale."}]}'''

The command returns a null or empty response.
The TensorZero gateway logs show a timeout error at approximately 5 minutes (300,000 ms), despite the 20-minute timeout set in the configuration.

Gateway Log Snippet:

ERROR tower_http::trace::on_failure: response failed classification=Status code: 502 Bad Gateway latency=300057 ms

Troubleshooting Steps Taken:

Verified that the Ollama service is running and accessible from the host machine.
Confirmed that the tensorzero.toml file is correctly mounted into the gateway container.
Attempted to set the timeout at both the model level ([models.ollama-gpt-oss-20b].timeouts) and the provider level ([models.ollama-gpt-oss-20b.providers.ollama].timeouts).
Attempted to use the deprecated timeout_ms setting, which resulted in a configuration parsing error.

Expected Behavior:

The TensorZero Gateway should respect the timeouts setting in tensorzero.toml and wait for the specified duration before timing out.

Actual Behavior:

The gateway consistently times out at the default 5 minutes, ignoring the configured timeouts value.

virajmehta · 2025-10-11T03:28:35Z

virajmehta
Oct 11, 2025
Maintainer

Hi @hutho, we currently have a global server level timeout of 5 minutes for all requests to prevent resource leaks. This is why you're seeing this behavior.

To be honest, a 20-minute request is probably not the best use case for HTTP. We're thinking about how to handle longer-lived requests for models like OpenAI's GPT 5 pro and others with long-running requests and will likely build a general solution (polling or equivalent).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TensorZero Gateway Ignores `timeouts` Setting for Ollama Models #3884

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

TensorZero Gateway Ignores timeouts Setting for Ollama Models #3884

Uh oh!

hutho Oct 11, 2025

Replies: 1 comment

Uh oh!

virajmehta Oct 11, 2025 Maintainer

TensorZero Gateway Ignores `timeouts` Setting for Ollama Models #3884

hutho
Oct 11, 2025

virajmehta
Oct 11, 2025
Maintainer