feat: support 'same-as-agent' model option for legacy evaluators #1048

Chibionos · 2025-12-23T05:11:50Z

Summary

Add support for the same-as-agent model configuration in legacy LLM-based evaluators
When an evaluator specifies same-as-agent as its model, it now resolves to the actual model from the agent runtime
Added LLMAgentRuntimeProtocol for runtimes to provide model information

Changes

Added LLMAgentRuntimeProtocol - protocol for runtimes that can provide agent model info
Updated _get_agent_model() to be async and query model from runtime (not factory)
Added _find_agent_model_in_runtime() for recursive delegate traversal through runtime wrappers
Updated _load_evaluators() to be async to support async model query
Removed agent.json file fallback (runtime handles loading agent config)

Architecture

The model resolution follows the runtime wrapper chain:

UiPathResumableRuntime (delegate property)
  └─> TelemetryRuntimeWrapper (get_agent_model delegates to _delegate)
        └─> AgentsLangGraphRuntime (implements LLMAgentRuntimeProtocol.get_agent_model)

Companion PR

This PR works with uipath-agents-python PR that implements:

AgentsLangGraphRuntime.get_agent_model() - loads model from agent.json
TelemetryRuntimeWrapper.get_agent_model() - delegates to underlying runtime
AgentsRuntimeFactory - removed get_agent_model (it's a runtime concept, not factory)

Test plan

Unit tests for LLMAgentRuntimeProtocol integration
Tested with calculator_same_as_agent example containing evaluators with "model": "same-as-agent"
Verified both LLM-as-judge and Trajectory evaluators resolve model correctly
Verified recursive delegate traversal through runtime wrapper chain

🤖 Generated with Claude Code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/uipath/_cli/_evals/_runtime.py

akshaylive

IMO it's better to do something like this:

from typing import Protocol, runtime_checkable


@runtime_checkable
class LLMAgentProtocol(Protocol):
    def get_agent_model(self) -> str:
        ...

And switch the implementation of _get_agent_model to:

def _get_agent_model(self, runtime: UiPathRuntimeProtocol) -> str | None:
    if isinstance(runtime, LLMAgentProtocol):
        return runtime.get_agent_model()
    else:
        return None

That way, react agent can implement that method and it should work seamlessly.

Chibionos · 2025-12-23T08:46:44Z

IMO it's better to do something like this:
from typing import Protocol, runtime_checkable


@runtime_checkable
class LLMAgentProtocol(Protocol):
    def get_agent_model(self) -> str:
        ...
And switch the implementation of _get_agent_model to:
def _get_agent_model(self, runtime: UiPathRuntimeProtocol) -> str | None:
    if isinstance(runtime, LLMAgentProtocol):
        return runtime.get_agent_model()
    else:
        return None
That way, react agent can implement that method and it should work seamlessly.

Good catch implemented your Protocol-based pattern.

akshaylive · 2025-12-23T16:33:20Z

I don't think the factory should contain the model. It's a runtime concept. I.e; the factory can create different runtimes with different LLMs based on the entrypoint, so I'd rather keep it in runtime. This is why runtime: UiPathRuntimeProtocol has to be piped everywhere but it should be doable, right?

Also, please create a unit test to test the change :)

src/uipath/_cli/_evals/_evaluator_factory.py

src/uipath/_cli/_evals/_runtime.py

mathurk

lgtm

Add support for the 'same-as-agent' model configuration in legacy LLM-based evaluators. When an evaluator specifies 'same-as-agent' as its model, it now resolves to the actual model from agent.json settings instead of throwing an error. Changes: - Updated EvaluatorFactory to accept and pass agent_model parameter - Added _get_agent_model() method to runtime to load model from agent.json - Added logging for model resolution and evaluator creation - Fixed error message in trajectory evaluator (was incorrectly saying "LLM evaluator") 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implements the Protocol-based approach for getting agent model: - Adds LLMAgentFactoryProtocol with get_agent_model() method - Updates _get_agent_model() to check if factory implements protocol - Falls back to file-based approach if protocol not implemented This allows runtime factories to provide agent model information directly, enabling cleaner 'same-as-agent' resolution for evaluators. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Addresses PR review feedback: - Rename LLMAgentFactoryProtocol to LLMAgentRuntimeProtocol (it's a runtime concept) - Remove agent.json fallback logic (runtime handles this) - Make _get_agent_model() async - creates temp runtime to query model - Add _find_agent_model_in_runtime() for recursive delegate traversal - Make _load_evaluators() async to support async model query The runtime (AgentsLangGraphRuntime) now implements get_agent_model(), and wrapper runtimes (TelemetryRuntimeWrapper) delegate appropriately. This follows the principle that model info is a runtime property, not factory. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Combines schema and agent model fetching into a single _ensure_metadata_loaded() method that creates one temporary runtime instead of potentially two separate ones. Results are cached for subsequent access, improving performance when both schema and agent model are needed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

akshaylive · 2025-12-29T20:40:44Z

src/uipath/_cli/_evals/_runtime.py

+        if isinstance(runtime, LLMAgentRuntimeProtocol):
+            return runtime.get_agent_model()
+
+        # Check for delegate property (used by UiPathResumableRuntime, TelemetryRuntimeWrapper)


I'd prefer adding this method in UiPathResumableRuntime and TelemetryRuntimeWrapper but we can do this later..

src/uipath/_cli/_evals/_runtime.py

akshaylive

We need to refactor the eval runtime to ensure one runtime instance per evaluation but that can happen in a separate PR.

Please add a unit test to ensure that get_agent_model logic doesn't break during the future refactor before merging.

Add comprehensive tests for: - LLMAgentRuntimeProtocol detection - _find_agent_model_in_runtime recursive delegate traversal - _ensure_metadata_loaded caching behavior - _get_agent_model cached retrieval - get_schema cached retrieval - Realistic wrapper chain model resolution 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

akshaylive

🚢

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Dec 23, 2025

Chibionos requested review from akshaylive, mjnovice and radu-mocanu and removed request for radu-mocanu December 23, 2025 05:13

chatgpt-codex-connector bot reviewed Dec 23, 2025

View reviewed changes

src/uipath/_cli/_evals/_runtime.py Outdated Show resolved Hide resolved

Chibionos force-pushed the feat/same-as-agent-evaluator-model branch 2 times, most recently from 48a0c5e to 9e71de8 Compare December 23, 2025 05:45

akshaylive reviewed Dec 23, 2025

View reviewed changes

mjnovice reviewed Dec 23, 2025

View reviewed changes

src/uipath/_cli/_evals/_evaluator_factory.py Show resolved Hide resolved

mjnovice reviewed Dec 23, 2025

View reviewed changes

src/uipath/_cli/_evals/_runtime.py Outdated Show resolved Hide resolved

mathurk approved these changes Dec 23, 2025

View reviewed changes

cristipufu mentioned this pull request Dec 27, 2025

feat: add model settings id parameter #1053

Open

Chibi Vikram and others added 3 commits December 29, 2025 12:04

Chibionos force-pushed the feat/same-as-agent-evaluator-model branch from 6beb0e0 to a3fde44 Compare December 29, 2025 20:18

akshaylive reviewed Dec 29, 2025

View reviewed changes

src/uipath/_cli/_evals/_runtime.py Outdated Show resolved Hide resolved

akshaylive approved these changes Dec 29, 2025

View reviewed changes

Chibionos force-pushed the feat/same-as-agent-evaluator-model branch from d7fa304 to 3b74169 Compare December 29, 2025 21:29

akshaylive approved these changes Dec 29, 2025

View reviewed changes

chore: bump version to 2.3.1

49b0f7c

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Chibionos merged commit 55a0c2b into main Dec 29, 2025
80 checks passed

Chibionos deleted the feat/same-as-agent-evaluator-model branch December 29, 2025 22:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support 'same-as-agent' model option for legacy evaluators #1048

feat: support 'same-as-agent' model option for legacy evaluators #1048

Chibionos commented Dec 23, 2025 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

akshaylive left a comment •

edited

Loading

Uh oh!

Chibionos commented Dec 23, 2025

Uh oh!

akshaylive commented Dec 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

mathurk left a comment

Uh oh!

akshaylive Dec 29, 2025

Uh oh!

Uh oh!

akshaylive left a comment

Uh oh!

akshaylive left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: support 'same-as-agent' model option for legacy evaluators #1048

feat: support 'same-as-agent' model option for legacy evaluators #1048

Conversation

Chibionos commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Architecture

Companion PR

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

akshaylive left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Chibionos commented Dec 23, 2025

Uh oh!

akshaylive commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mathurk left a comment

Choose a reason for hiding this comment

Uh oh!

akshaylive Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

akshaylive left a comment

Choose a reason for hiding this comment

Uh oh!

akshaylive left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Chibionos commented Dec 23, 2025 •

edited

Loading

akshaylive left a comment •

edited

Loading

akshaylive commented Dec 23, 2025 •

edited

Loading