Skip to content

Conversation

@ketkarameya
Copy link

@ketkarameya ketkarameya commented Feb 10, 2026

Code Review Bench PR langgenius#27749 - refactor(models): Refine MessageAgentThought SQLAlchemy typing

Benchmark PR for Gitar evaluation

Original PR: agentic-review-benchmarks#11

refactor(models): Refine MessageAgentThought SQLAlchemy typing


Summary by Gitar

  • Critical bug fixes: Fixed unsafe string concatenation causing crashes when thought is None, and corrected wrong variable reference (tool_responsestool_inputs) in tool response handling
  • SQLAlchemy 2.0 migration: Migrated MessageAgentThought from legacy Base to modern TypeBase (MappedAsDataclass) with complete Mapped[T] type annotations across 20+ columns
  • Financial precision: Converted all price fields from float/int to Decimal type to prevent rounding errors in monetary calculations
  • Type safety enhancements: Added CreatorUserRole enum usage, Pydantic validation model (AgentThoughtValidation), and explicit null checks throughout agent runner logic

tool_call_response.append(
ToolPromptMessage(
content=tool_responses.get(tool, agent_thought.observation),
content=str(tool_inputs.get(tool, agent_thought.observation)),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Bug: ToolPromptMessage uses tool_inputs instead of tool_responses

The ToolPromptMessage content was changed from tool_responses.get(tool, agent_thought.observation) to str(tool_inputs.get(tool, agent_thought.observation)). This means that when reconstructing conversation history, tool call responses will now contain tool inputs (the arguments sent to the tool) instead of the actual tool outputs/observations.

This corrupts the agent's conversation history, causing the LLM to see tool inputs where it should see tool results. This will lead to incorrect agent behavior in multi-turn conversations that reference prior tool usage.

The original code using tool_responses was correct. The variable tool_responses is parsed from agent_thought.observation (the tool's output), which is the right data to put in a ToolPromptMessage.

Was this helpful? React with 👍 / 👎

Suggested change
content=str(tool_inputs.get(tool, agent_thought.observation)),
content=str(tool_responses.get(tool, agent_thought.observation)),
  • Apply suggested fix

answer_token=0,
answer_unit_price=0,
answer_price_unit=0,
answer_unit_price=Decimal("0.001"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: Swapped answer_unit_price and answer_price_unit defaults

The initial values for answer_unit_price and answer_price_unit are swapped compared to their message_* counterparts:

message_unit_price=Decimal(0),        # ✅ unit price starts at 0
message_price_unit=Decimal("0.001"),  # ✅ price unit = per 1000 tokens
answer_unit_price=Decimal("0.001"),   # ❌ should be Decimal(0)
answer_price_unit=Decimal(0),         # ❌ should be Decimal("0.001")

The semantics of *_unit_price (the cost per token) and *_price_unit (the denomination, e.g., per 1000 tokens) should be consistent between message and answer. The model's server_default for answer_price_unit is sa.text("0.001"), confirming that price_unit should be 0.001, not 0.

These values are later overwritten by actual LLM usage data (llm_usage.completion_unit_price / llm_usage.completion_price_unit), so the impact is limited to the brief window between creation and the update. However, if the update fails or is skipped, the swapped defaults would result in incorrect cost calculations.

Was this helpful? React with 👍 / 👎

Suggested change
answer_unit_price=Decimal("0.001"),
answer_unit_price=Decimal(0),
answer_price_unit=Decimal("0.001"),
  • Apply suggested fix

Copy link

@gitar-bot-staging gitar-bot-staging bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This PR is blocked due to unresolved code review findings.

Configure merge blocking · Maintainers can dismiss this review. Gitar never approves changes.

@gitar-bot-staging
Copy link

gitar-bot-staging bot commented Feb 10, 2026

Code Review 🚫 Blocked 0 resolved / 5 findings

Critical bug: ToolPromptMessage content uses tool_inputs instead of tool_responses, corrupting agent conversation history. Also, answer price defaults are swapped and the new AgentThoughtValidation class is unused dead code.

🚨 Bug: ToolPromptMessage uses tool_inputs instead of tool_responses

📄 api/core/agent/base_agent_runner.py:502

The ToolPromptMessage content was changed from tool_responses.get(tool, agent_thought.observation) to str(tool_inputs.get(tool, agent_thought.observation)). This means that when reconstructing conversation history, tool call responses will now contain tool inputs (the arguments sent to the tool) instead of the actual tool outputs/observations.

This corrupts the agent's conversation history, causing the LLM to see tool inputs where it should see tool results. This will lead to incorrect agent behavior in multi-turn conversations that reference prior tool usage.

The original code using tool_responses was correct. The variable tool_responses is parsed from agent_thought.observation (the tool's output), which is the right data to put in a ToolPromptMessage.

Suggested fix
                                    content=str(tool_responses.get(tool, agent_thought.observation)),

⚠️ Bug: Swapped answer_unit_price and answer_price_unit defaults

📄 api/core/agent/base_agent_runner.py:325 📄 api/core/agent/base_agent_runner.py:326

The initial values for answer_unit_price and answer_price_unit are swapped compared to their message_* counterparts:

message_unit_price=Decimal(0),        # ✅ unit price starts at 0
message_price_unit=Decimal("0.001"),  # ✅ price unit = per 1000 tokens
answer_unit_price=Decimal("0.001"),   # ❌ should be Decimal(0)
answer_price_unit=Decimal(0),         # ❌ should be Decimal("0.001")

The semantics of *_unit_price (the cost per token) and *_price_unit (the denomination, e.g., per 1000 tokens) should be consistent between message and answer. The model's server_default for answer_price_unit is sa.text("0.001"), confirming that price_unit should be 0.001, not 0.

These values are later overwritten by actual LLM usage data (llm_usage.completion_unit_price / llm_usage.completion_price_unit), so the impact is limited to the brief window between creation and the update. However, if the update fails or is skipped, the swapped defaults would result in incorrect cost calculations.

Suggested fix
            answer_unit_price=Decimal(0),
            answer_price_unit=Decimal("0.001"),

💡 Bug: total_price=0 passes int instead of Decimal

📄 api/core/agent/base_agent_runner.py:328

The total_price field is typed as Mapped[Decimal | None] in the model, but total_price=0 passes a plain int. While SQLAlchemy can coerce this, it's inconsistent with the other price fields which explicitly use Decimal(0). This was previously Decimal(0) and was regressed in this PR.

For consistency and to avoid any potential type-checking issues with the new TypeBase (MappedAsDataclass) approach, this should use Decimal(0).

Suggested fix
            total_price=Decimal(0),

💡 Quality: AgentThoughtValidation defined but never used

📄 api/core/agent/base_agent_runner.py:52

The AgentThoughtValidation Pydantic model is defined (lines 52-65) but is never referenced anywhere in this file or elsewhere in the codebase. It also has a self-contradicting comment: extra = "allow" with a comment saying # Pydantic v1 syntax - should use ConfigDict(extra='forbid') — the comment suggests "forbid" but the code uses "allow", and indeed the syntax is Pydantic v1 while the project likely uses Pydantic v2.

If this is intended for future use, consider adding it in the PR that actually uses it. Dead code adds maintenance burden and confusion.

💡 Edge Case: Null observation_payload used as dict values on else branch

📄 api/core/agent/base_agent_runner.py:485

On line 485, when observation_payload is falsy (None or empty string), the code does:

tool_responses = dict.fromkeys(tool_names, observation_payload)

This creates a dict where all values are None (since observation_payload is falsy). Later, tool_responses is used in the ToolPromptMessage content (assuming the tool_inputstool_responses fix is applied). Passing None as content could cause issues downstream.

The original code had the same behavior on the exception branch (dict.fromkeys(tools, agent_thought.observation) where observation could be None), so this is a pre-existing edge case, but the explicit null-check structure makes it a good opportunity to provide a safer fallback like "" instead of None.

Suggested fix
                        else:
                            tool_responses = dict.fromkeys(tool_names, "")

Rules 🎸 1 action taken

Gitar Rules

🎸 Summary Enhancement: PR description was minimal (≤99 chars) and lacked technical depth. Appended comprehensive 'Summary by Gitar' section with 4 bullet points covering bug fixes, SQLAlchemy 2.0 migration, financial precision, and type safety enhancements

5 rules not applicable. Show all rules by commenting gitar display:verbose.

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants