Skip to content

test(steering): Fix failing integ tests#1580

Merged
mkmeral merged 1 commit intostrands-agents:mainfrom
mkmeral:fix/steering-test-flakiness
Jan 28, 2026
Merged

test(steering): Fix failing integ tests#1580
mkmeral merged 1 commit intostrands-agents:mainfrom
mkmeral:fix/steering-test-flakiness

Conversation

@mkmeral
Copy link
Contributor

@mkmeral mkmeral commented Jan 28, 2026

Summary

This PR fixes the flaky test_agent_with_tool_steering_e2e integration test in tests_integ/steering/test_tool_steering.py.

Problem

The test was failing intermittently because:

  1. Bug in LLMSteeringHandler initialization: The context_providers or [LedgerProvider()] pattern treated an empty list as falsy, defaulting to [LedgerProvider()] even when explicitly passed an empty list
  2. Soft guidance message: The steering handler's error message ("Consider this approach and continue") was too soft, causing the main agent to ask clarifying questions instead of following the guidance
  3. Ambiguous test system prompt: The steering LLM analyzed ledger context data showing "pending" operations, which confused the main agent into thinking there was a duplicate request

Root Cause Analysis

When the test ran:

  1. Agent tried to call send_email
  2. LedgerProvider added the tool call to the ledger with "pending" status
  3. Steering LLM saw this context and returned guidance like "Context data shows a duplicate send_email call..."
  4. Main agent interpreted the "pending" information as uncertainty and asked the user instead of switching to send_notification

Solution

1. Fix LLMSteeringHandler context_providers handling

# Before (buggy)
providers = context_providers or [LedgerProvider()]

# After (correct)
providers = [LedgerProvider()] if context_providers is None else context_providers

2. Make guidance message more directive

# Before
f"Tool call cancelled given new guidance. {action.reason}. Consider this approach and continue"

# After
f"Tool call cancelled. {action.reason} You MUST follow this guidance immediately."

3. Improve test system prompt

Made the system prompt explicit, rule-based, and disabled context providers to prevent ledger data from influencing the steering decision.

Testing

  • Ran the test 10+ times in succession with 100% pass rate
  • All unit tests pass
  • All other integration tests in the steering module pass

Files Changed

  • src/strands/experimental/steering/core/handler.py - More directive guidance message
  • src/strands/experimental/steering/handlers/llm/llm_handler.py - Fix context_providers handling
  • tests/strands/experimental/steering/core/test_handler.py - Update unit test expectation
  • tests_integ/steering/test_tool_steering.py - Improved test configuration

@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

zastrowm
zastrowm previously approved these changes Jan 28, 2026
afarntrog
afarntrog previously approved these changes Jan 28, 2026
This fix addresses the flaky test_agent_with_tool_steering_e2e test.

Root causes identified and fixed:
1. LLMSteeringHandler used 'or' operator for context_providers which
   treated empty list as falsy, causing unintended default to LedgerProvider
2. The steering handler's guidance message was too soft ("Consider this
   approach") instead of directive ("You MUST follow this guidance")
3. The test's system prompt was ambiguous, allowing the steering LLM to
   analyze context data instead of following simple rules

Changes:
- Fixed LLMSteeringHandler.__init__ to properly distinguish between None
  and empty list for context_providers parameter
- Updated guidance message format in SteeringHandler._handle_tool_steering_action
  to be more directive when guiding agents
- Updated test system prompt to be explicit and rule-based, disabling
  context providers to prevent confusing ledger data from influencing
  the steering decision

The test now reliably:
1. Intercepts send_email calls with 'guide' decision
2. Provides clear directive to use send_notification instead
3. Main agent follows guidance and calls send_notification successfully
@mkmeral mkmeral dismissed stale reviews from afarntrog and zastrowm via 12a2f17 January 28, 2026 19:33
@mkmeral mkmeral force-pushed the fix/steering-test-flakiness branch from 4c85033 to 12a2f17 Compare January 28, 2026 19:33
@github-actions github-actions bot added size/s and removed size/s labels Jan 28, 2026
@mkmeral mkmeral enabled auto-merge (squash) January 28, 2026 19:37
@mkmeral mkmeral merged commit 4e4534e into strands-agents:main Jan 28, 2026
15 checks passed
gaurav71531 pushed a commit to gaurav71531/sdk-python that referenced this pull request Feb 4, 2026
manoj-selvakumar5 pushed a commit to manoj-selvakumar5/strands-sdk-python that referenced this pull request Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants