What does inference caching do? it doesn't seem to work #3558

vegetableman · 2025-09-17T03:02:40Z

vegetableman
Sep 17, 2025

Hi team, appreciate the work that has been put in Tensorzero. Posting here, since am not sure if this a bug.

Using TensorZero Gateway 2025.9.3

Below is my code:

         client = TensorZeroGateway.build_embedded(
            config_file="configs/tensorzero.toml",
            clickhouse_url="http://chuser:chpassword@localhost:8123/tensorzero",
         )

        result = client.inference(
                function_name="segment_sections",  
                input={
                    "system": system_prompt,
                    "messages": [
                        {
                            "role": "user",
                            "content": text,
                        }
                    ]
                },
                params={
                    "chat_completion": {
                        "max_tokens": 4000,
                        "temperature": 0.0
                    }
                },
                cache_options={"enabled": "on", "max_age_s": 3600}
            )

My config:

[tools.extract_sections]
description = "Extracts meaningful sections each with title and starting line index from a document."
parameters = "tools/extract_sections.json"
strict = true

# FUNCTIONS

[functions.segment_sections]
type = "chat"
tools = ["extract_sections"]

[functions.segment_sections.variants.gpt_4o_mini]
type = "chat_completion"
model = "openai::gpt-4o-mini"

With caching enabled, sending the same request twice, the calls are still sent to openAI. I checked the usage counter on openAI dashboard, it's still incrementing. Shouldn't tensorzero cache the inference and return the stored inference response? Attached is the screenshot of the second request. Would appreciate if you guys could help me out.

Above is a cached response from openAI, not tensorzero.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What does inference caching do? it doesn't seem to work #3558

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

What does inference caching do? it doesn't seem to work #3558

Uh oh!

Uh oh!

vegetableman Sep 17, 2025

Replies: 0 comments

vegetableman
Sep 17, 2025