UN-2793 [FEAT] Add retry logic with exponential backoff to SDK1#1564
UN-2793 [FEAT] Add retry logic with exponential backoff to SDK1#1564chandrasekharan-zipstack merged 14 commits intomainfrom
Conversation
Implemented automatic retry logic for platform and prompt service calls with configurable exponential backoff, comprehensive test coverage, and CI integration. Features: - Exponential backoff with jitter for transient failures - Configurable via environment variables (MAX_RETRIES, MAX_TIME, BASE_DELAY, etc.) - Retries ConnectionError, Timeout, HTTPError (502/503/504), OSError - 67 tests with 100% pass rate - CI integration with test reporting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Summary by CodeRabbit
WalkthroughAdds environment-configurable exponential-backoff retry utilities and applies them to SDK1 platform and prompt service calls; introduces SDK1 tests, fixtures, and tox/CI report steps; updates SDK1 docs and bumps multiple unstract-sdk dependency versions and tool image tags. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Tool as PlatformHelper
participant Retry as Retry Decorator
participant HTTP as Platform Service
note right of Retry #d1e7dd: Config via PLATFORM_SERVICE_* env vars\n(max_retries, max_time, base_delay, multiplier, jitter)
Tool->>Retry: _get_adapter_configuration(adapter_id)
alt Success first attempt
Retry->>HTTP: GET /adapters/{id}/config
HTTP-->>Retry: 200 OK
Retry-->>Tool: config
else Transient failures then success
loop Attempts (exponential backoff)
Retry->>HTTP: GET /adapters/{id}/config
HTTP--x Retry: ConnectionError/Timeout/HTTP 502-504
Retry-->>Retry: wait(backoff)
end
Retry->>HTTP: GET /adapters/{id}/config
HTTP-->>Retry: 200 OK
Retry-->>Tool: config
else Exhausted
Retry-->>Tool: raise exception
Tool->>Tool: get_adapter_config wraps into SdkError
end
sequenceDiagram
autonumber
participant Prompt as PromptTool
participant Retry as Retry Decorator
participant HTTP as Prompt Service
note right of Retry #f0edf6: Config via PROMPT_SERVICE_* env vars
Prompt->>Retry: _call_service("answer-prompt", payload)
alt First attempt OK
Retry->>HTTP: POST /answer-prompt
HTTP-->>Retry: 200 OK
Retry-->>Prompt: result
else Retryable failures
loop Attempts (exponential backoff)
Retry->>HTTP: POST /answer-prompt
HTTP--x Retry: ConnectionError/Timeout/HTTP 502-504
Retry-->>Retry: wait(backoff)
end
alt Eventually succeeds
Retry->>HTTP: POST /answer-prompt
HTTP-->>Retry: 200 OK
Retry-->>Prompt: result
else Exhausted
Retry-->>Prompt: raise error
Prompt->>Prompt: caller may handle (None / error handler)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
unstract/sdk1/src/unstract/sdk1/platform.py (1)
258-279: _call_service suppresses retryable HTTPError; split handling to allow retries.Catching RequestException broadly prevents retries on 502/503/504. Handle HTTPError first and re-raise retryable statuses.
try: if method.upper() == "POST": response = requests.post( url=url, json=payload, params=params, headers=req_headers ) elif method.upper() == "GET": response = requests.get(url=url, params=params, headers=req_headers) else: raise ValueError(f"Unsupported HTTP method: {method}") response.raise_for_status() except ConnectionError as connect_err: logger.exception("Connection error to platform service") msg = ( "Unable to connect to platform service. Retrying with backoff, " "please contact admin if retries ultimately fail." ) self.tool.stream_log(msg, level=LogLevel.ERROR) raise ConnectionError(msg) from connect_err - except RequestException as e: + except HTTPError as e: + # Allow decorator to retry transient HTTP failures + status = getattr(getattr(e, "response", None), "status_code", None) + if status in {502, 503, 504}: + raise + # Non-retryable HTTP errors: extract and report + resp = getattr(e, "response", None) + error_message = str(e) + content_type = (getattr(resp, "headers", {}) or {}).get("Content-Type", "").lower() if resp else "" + if resp and MimeType.JSON in content_type: + try: + response_json = resp.json() + if "error" in response_json: + error_message = response_json["error"] + except Exception: + pass + elif getattr(resp, "text", ""): + error_message = resp.text + self.tool.stream_error_and_exit(f"Error from platform service. {error_message}") + except RequestException as e: # Extract error information from the response if available error_message = str(e) content_type = response.headers.get("Content-Type", "").lower() if MimeType.JSON in content_type: response_json = response.json() if "error" in response_json: error_message = response_json["error"] elif response.text: error_message = response.text self.tool.stream_error_and_exit( f"Error from platform service. {error_message}" )
🧹 Nitpick comments (8)
unstract/sdk1/tests/conftest.py (1)
37-51: Add type annotations to the helper function.The
_set_envhelper function is missing return type and parameter type annotations, as flagged by static analysis.As per coding guidelines.
Apply this diff to add the missing type annotations:
- def _set_env(prefix: str, **kwargs): + def _set_env(prefix: str, **kwargs: str | int | float | bool) -> None: """Set environment variables with given prefix. Args: prefix: Environment variable prefix (e.g., 'PLATFORM_SERVICE') **kwargs: Key-value pairs to set (e.g., max_retries=5) """unstract/sdk1/tests/test_platform.py (2)
57-66: Use the computedresultto validate response (fix F841).Assert the returned JSON matches
expected_datato both verify behavior and remove the unused variable warning.Based on static analysis hints
else: result = getattr(platform_helper, method_name)(*method_args) + assert result == expected_data assert mock_request.call_count == 1
96-113: Speed up “slow” test by patching sleep in retry_utils.Avoid real delays by stubbing time.sleep used by the decorator. Keeps semantics while cutting runtime.
- def test_max_retries_exceeded(self, mock_tool, clean_env): + def test_max_retries_exceeded(self, mock_tool, clean_env): """Test service call fails after exceeding max retries.""" platform_helper = PlatformHelper( tool=mock_tool, platform_host="http://localhost", platform_port="3001", ) - with patch("requests.get") as mock_get: + with patch("requests.get") as mock_get, patch("unstract.sdk1.utils.retry_utils.time.sleep", return_value=None): mock_get.side_effect = ConnectionError("Persistent failure") with pytest.raises(ConnectionError): platform_helper._call_service("test-endpoint") # Default: 3 retries + 1 initial = 4 attempts assert mock_get.call_count == 4unstract/sdk1/src/unstract/sdk1/platform.py (3)
65-67: Avoid IndexError for empty host; trim trailing slash robustly.Use rstrip('/') instead of indexing last char.
- if platform_host[-1] == "/": - return f"{platform_host[:-1]}:{platform_port}" + if platform_host.endswith("/"): + return f"{platform_host.rstrip('/')}:{platform_port}" return f"{platform_host}:{platform_port}"
199-205: Don’t send None request_id header.Only include request-id when available.
- request_headers = { - RequestHeader.REQUEST_ID: self.request_id, - RequestHeader.AUTHORIZATION: f"Bearer {self.bearer_token}", - } + request_headers = { + RequestHeader.AUTHORIZATION: f"Bearer {self.bearer_token}", + } + if self.request_id: + request_headers[RequestHeader.REQUEST_ID] = self.request_id
250-255: Add request timeouts to enable Timeout retries and avoid hangs.Without timeouts, requests may block indefinitely and never trigger retry logic.
- response = requests.post( - url=url, json=payload, params=params, headers=req_headers - ) + response = requests.post( + url=url, json=payload, params=params, headers=req_headers, timeout=10 + ) elif method.upper() == "GET": - response = requests.get(url=url, params=params, headers=req_headers) + response = requests.get(url=url, params=params, headers=req_headers, timeout=10)Consider making timeout configurable (e.g., PLATFORM_SERVICE_TIMEOUT seconds).
unstract/sdk1/tests/utils/test_retry_utils.py (1)
244-247: Annotate custom_predicate return type.Add explicit -> bool to quiet linters and improve clarity.
Based on static analysis hints
- def custom_predicate(e): + def custom_predicate(e) -> bool: # Only retry if message contains "retry" return "retry" in str(e)- def custom_predicate(e): + def custom_predicate(e) -> bool: return False- def custom_predicate(e): + def custom_predicate(e) -> bool: return isinstance(e, ValueError)- def custom_predicate(e): + def custom_predicate(e) -> bool: return "retry" in str(e)- def custom_predicate(e): + def custom_predicate(e) -> bool: return "retry" in str(e)Also applies to: 273-275, 442-444, 464-466, 488-490
unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py (1)
88-118: Reduce decorator/wrapper complexity or silence lint warning.Ruff flags C901 for nested logic. Consider small extractions (delay calc + give-up checks) or add a targeted noqa if you prefer current structure.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (12)
.github/workflows/ci-test.yaml(2 hunks)tox.ini(2 hunks)unstract/sdk1/README.md(1 hunks)unstract/sdk1/src/unstract/sdk1/platform.py(8 hunks)unstract/sdk1/src/unstract/sdk1/prompt.py(3 hunks)unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py(1 hunks)unstract/sdk1/tests/__init__.py(1 hunks)unstract/sdk1/tests/conftest.py(1 hunks)unstract/sdk1/tests/test_platform.py(1 hunks)unstract/sdk1/tests/test_prompt.py(1 hunks)unstract/sdk1/tests/utils/__init__.py(1 hunks)unstract/sdk1/tests/utils/test_retry_utils.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.13.3)
unstract/sdk1/tests/test_prompt.py
95-95: Do not assert blind exception: Exception
(B017)
unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py
88-88: retry_with_exponential_backoff is too complex (13 > 10)
(C901)
116-116: decorator is too complex (12 > 10)
(C901)
118-118: wrapper is too complex (11 > 10)
(C901)
118-118: Dynamically typed expressions (typing.Any) are disallowed in *args
(ANN401)
118-118: Dynamically typed expressions (typing.Any) are disallowed in **kwargs
(ANN401)
118-118: Dynamically typed expressions (typing.Any) are disallowed in wrapper
(ANN401)
unstract/sdk1/tests/test_platform.py
62-62: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
unstract/sdk1/tests/utils/test_retry_utils.py
244-244: Missing return type annotation for private function custom_predicate
(ANN202)
273-273: Missing return type annotation for private function custom_predicate
Add return type annotation: bool
(ANN202)
442-442: Missing return type annotation for private function custom_predicate
(ANN202)
464-464: Missing return type annotation for private function custom_predicate
(ANN202)
488-488: Missing return type annotation for private function custom_predicate
(ANN202)
unstract/sdk1/tests/conftest.py
41-41: Missing return type annotation for private function _set_env
Add return type annotation: None
(ANN202)
41-41: Missing type annotation for **kwargs
(ANN003)
🔇 Additional comments (15)
tox.ini (1)
2-2: LGTM!The SDK1 test environment configuration is well-structured and follows the same pattern as the existing
runnertestenv. The configuration correctly:
- Skips slow tests in CI with
-m "not slow"for faster feedback- Generates coverage reports (term and HTML)
- Outputs markdown report to the repository root
- Uses
uvfor efficient dependency managementAlso applies to: 34-45
.github/workflows/ci-test.yaml (2)
51-65: LGTM!The dual report publishing strategy is well-implemented. Both the Runner and SDK1 reports:
- Use
hashFiles()guards to conditionally render only when reports exist- Employ separate headers to avoid conflicts
- Follow the same pattern for consistency
77-83: LGTM!The job summary correctly includes the SDK1 report with proper file existence checks and formatting.
unstract/sdk1/tests/__init__.py (1)
1-1: LGTM!Standard package initializer with appropriate docstring.
unstract/sdk1/tests/utils/__init__.py (1)
1-1: LGTM!Standard package initializer with appropriate docstring.
unstract/sdk1/README.md (2)
10-24: LGTM!The retry configuration documentation is clear, comprehensive, and user-friendly. The table format makes it easy to understand the available options and their defaults.
26-41: LGTM!The test running instructions are practical and cover the essential workflows (basic tests, coverage). Using
uv runis consistent with the project's tooling choices.unstract/sdk1/src/unstract/sdk1/prompt.py (2)
12-12: LGTM!Importing the retry decorator from the dedicated utilities module.
191-235: LGTM!The retry decorator is correctly applied to
_call_service, which is the appropriate level for retry logic. The decorator sits below the error handling decorator (@handle_service_exceptions) on the public methods, ensuring that:
- Retries happen first on transient failures
- If all retries are exhausted, the error handler catches and logs the final exception
The enhanced docstring clearly explains the retry behavior and configuration options.
unstract/sdk1/tests/test_prompt.py (4)
35-49: LGTM!This test correctly verifies that successful calls complete on the first attempt without unnecessary retries.
51-76: LGTM!Good use of parametrization to test multiple error types (ConnectionError and Timeout) with a single test implementation. The test correctly verifies that the service recovers after a single retry.
101-131: LGTM!Excellent parametrized test verifying that all wrapper methods (answer_prompt, index, extract, summarize) inherit the retry behavior from the decorated
_call_servicemethod.
133-154: LGTM!This test correctly verifies the interaction between the retry decorator and the error handling decorator. After all retries are exhausted, the error handler catches the exception and calls
stream_error_and_exit, returningNoneinstead of propagating the exception.unstract/sdk1/tests/conftest.py (2)
9-13: LGTM!The
mock_loggerfixture correctly creates a MagicMock with thelogging.Loggerspec, ensuring that only valid logger methods can be called during tests.
16-34: LGTM!The
clean_envfixture comprehensively removes all retry-related environment variables before each test, ensuring test isolation. Usingraising=Falseis appropriate to handle cases where variables don't exist.
Replace tag reference with full commit SHA for better security: - marocchino/sticky-pull-request-comment@v2 → @7737449 (v2.9.4) This prevents potential supply chain attacks where tags could be moved to point to malicious code. Commit SHAs are immutable. Fixes SonarQube security hotspot for external GitHub action. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
9da67d3 to
c8846c2
Compare
Fixed HTTPError handling in _get_adapter_configuration to check status codes and re-raise retryable errors (502, 503, 504) so the retry decorator can handle them. Non-retryable errors are still converted to SdkError as before. Changes: - Check HTTPError status code before converting to SdkError - Re-raise HTTPError for 502/503/504 to allow retry decorator to retry - Added parametrized test for all retryable status codes (502, 503, 504) - All 12 platform tests pass This fixes a bug where 502/503/504 errors were not being retried because they were converted to SdkError before the retry decorator could see them. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
unstract/sdk1/tests/test_platform.py (2)
37-64: Remove unused variable assignment.Line 62 assigns the result but never uses it. Since the test only verifies call count, either use the result in an assertion or remove the assignment.
Apply this diff to remove the unused assignment:
if method_name == "_get_adapter_configuration": - result = PlatformHelper._get_adapter_configuration( + PlatformHelper._get_adapter_configuration( mock_tool, *method_args ) else: - result = getattr(platform_helper, method_name)(*method_args) + getattr(platform_helper, method_name)(*method_args) assert mock_request.call_count == 1
190-213: Consider more specific logging assertions.The test verifies that
stream_logwas called and checks for "retry" in log messages. While this provides basic coverage, it could be strengthened by asserting specific log message patterns or verifying that the retry attempt number is logged.Example enhancement:
# Verify specific retry log message mock_tool.stream_log.assert_any_call( unittest.mock.ANY, # Match any string containing retry info level=LogLevel.ERROR ) # Or check for specific patterns log_messages = [call.args[0] for call in mock_tool.stream_log.call_args_list] assert any("Retrying" in msg or "retry" in msg for msg in log_messages)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (2)
unstract/sdk1/src/unstract/sdk1/platform.py(8 hunks)unstract/sdk1/tests/test_platform.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.13.3)
unstract/sdk1/tests/test_platform.py
62-62: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
🔇 Additional comments (15)
unstract/sdk1/tests/test_platform.py (8)
14-25: LGTM!The
mock_toolfixture is well-structured and provides all necessary environment variables and mock methods for testing.
27-35: LGTM!The
platform_helperfixture correctly instantiates PlatformHelper with the mock tool and test configuration.
66-94: LGTM!The test correctly verifies retry behavior on ConnectionError, asserting 2 attempts (initial + 1 retry) and successful eventual response.
96-112: LGTM!The test correctly verifies max retries behavior, expecting 4 attempts (1 initial + 3 retries) before raising ConnectionError. The
@pytest.mark.slowmarker is appropriate.
114-129: LGTM!The test correctly verifies that non-retryable HTTP errors (404) don't trigger retry logic, with only 1 attempt expected.
131-157: LGTM!The parametrized test correctly verifies that retryable HTTP errors (502, 503, 504) trigger retry logic, with 2 attempts expected and eventual success.
159-166: LGTM!The test correctly verifies that the public
get_adapter_configmethod wraps ConnectionError in SdkError with appropriate message and cause chain.
168-188: LGTM!The test correctly verifies POST request retry behavior, asserting 2 attempts (initial + retry) and correct response handling.
unstract/sdk1/src/unstract/sdk1/platform.py (7)
6-8: LGTM!Import changes correctly split ConnectionError and HTTPError into separate imports from requests.exceptions for clarity.
21-21: LGTM!The retry decorator import is correctly added from the new retry_utils module.
96-113: LGTM!The decorator application and expanded docstring correctly describe the retry behavior and environment-configurable parameters.
145-160: LGTM!The exception handling correctly addresses the past review comment by re-raising retryable HTTP errors (502, 503, 504) to allow the retry decorator to handle them, while converting non-retryable errors to SdkError with context.
193-198: LGTM!The try/except block correctly wraps ConnectionError from the internal method into SdkError for the public API, preserving the original exception as the cause with
from e.
214-233: LGTM!The decorator application and expanded docstring correctly describe the retry behavior and environment-configurable parameters for the
_call_servicemethod.
266-273: LGTM!The ConnectionError handling is correct: it logs the error with appropriate context, streams an error message to the tool, and raises a descriptive ConnectionError to trigger the retry decorator. This replaces the previous exit path with retry-friendly error propagation.
Replaced direct equality comparisons (==) with pytest.approx() for floating point values to avoid precision issues and satisfy SonarQube code quality check (python:S1244). Changes in test_retry_utils.py: - test_exponential_backoff_without_jitter: Use pytest.approx() for 1.0, 2.0, 4.0, 8.0 - test_max_delay_cap: Use pytest.approx() for 5.0 This is the proper way to compare floating point values in tests, accounting for floating point precision limitations. All 4 TestCalculateDelay tests pass. Fixes SonarQube: python:S1244 - Do not perform equality checks with floating point values. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
unstract/sdk1/tests/utils/test_retry_utils.py (1)
252-252: Add return type annotations to custom_predicate test helpers.Static analysis flagged missing return type annotations for the
custom_predicatehelper functions used in tests. While these are test-local helpers, adding type hints improves consistency and catches potential issues.Apply this pattern to all custom_predicate functions:
- def custom_predicate(e): + def custom_predicate(e) -> bool: return "retry" in str(e)Also applies to: 281-281, 450-450, 472-472, 496-496
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
unstract/sdk1/src/unstract/sdk1/platform.py(14 hunks)unstract/sdk1/tests/test_prompt.py(1 hunks)unstract/sdk1/tests/utils/test_retry_utils.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.13.3)
unstract/sdk1/tests/utils/test_retry_utils.py
252-252: Missing return type annotation for private function custom_predicate
(ANN202)
281-281: Missing return type annotation for private function custom_predicate
Add return type annotation: bool
(ANN202)
450-450: Missing return type annotation for private function custom_predicate
(ANN202)
472-472: Missing return type annotation for private function custom_predicate
(ANN202)
496-496: Missing return type annotation for private function custom_predicate
(ANN202)
🔇 Additional comments (5)
unstract/sdk1/src/unstract/sdk1/platform.py (3)
99-163: LGTM! Retry decorator and error handling properly implemented.The retry decorator is correctly applied, and the HTTPError handling at lines 148-153 properly re-raises retryable errors (502/503/504) to allow the decorator to retry, while converting non-retryable errors to SdkError. The comprehensive docstring documents the retry configuration via environment variables.
217-290: LGTM! ConnectionError handling enables proper retry behavior.The retry decorator is correctly applied, and the ConnectionError handling at lines 269-276 appropriately logs the failure and re-raises a descriptive ConnectionError, allowing the decorator to retry the operation. The docstring clearly documents the retry configuration.
196-201: No downstream callers catch ConnectionError onget_adapter_config;SdkErroris consistently propagated or wrapped by existing handlers.unstract/sdk1/tests/test_prompt.py (1)
1-174: LGTM! Comprehensive integration tests for prompt tool retry logic.The test suite thoroughly validates:
- Successful calls without retries
- Retry behavior on ConnectionError and Timeout
- Max retry limits (past review comment addressed - now uses specific
ConnectionError)- Wrapper method retry inheritance
- Error handling interaction with retry logic
Test structure is clear, uses appropriate fixtures, and leverages parametrization effectively.
unstract/sdk1/tests/utils/test_retry_utils.py (1)
1-635: Excellent comprehensive test coverage!The test suite thoroughly exercises all retry utilities including:
- Error retryability detection (ConnectionError, Timeout, HTTPError, OSError)
- Exponential backoff calculation with/without jitter
- Retry decorator behavior (success, retries, max limits, predicates)
- Environment-driven configuration
- Preconfigured decorators
- Logging behavior
The tests are well-organized, use appropriate fixtures, and validate both success and failure paths.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
unstract/sdk1/pyproject.toml (1)
65-66: Prefer pinning test-only dependencies for repeatable CI runsThe rest of this test group pins exact versions (
pytest==8.3.3,pytest-mock==3.14.0, etc.), but these two entries are open-ended. Allowing futurepytest-cov/pytest-md-reportreleases to auto-upgrade can silently break the new reporting pipeline. Please pin them to the versions you validated (e.g.pytest-cov==6.0.0,pytest-md-report==0.6.2) to keep the environment reproducible.unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py (1)
170-172: Clarify the dual role of max_time as delay cap.The
calculate_delayfunction expectsmax_delay(the cap for individual retry delays), butmax_time(the total time budget) is passed here. This meansmax_timeserves a dual purpose:
- Maximum total time across all retry attempts
- Maximum delay for any single retry
This is conservative (no single delay can exceed the total budget) and works correctly, but it's semantically unusual. Typically,
max_delaywould be independent (e.g., 60s per retry) whilemax_timewould be larger (e.g., 300s total). Consider adding a comment explaining this design choice, or introducing a separatemax_delayparameter for clarity.Example clarifying comment:
# Calculate delay for next retry + # Note: max_time is used as both total time budget and per-delay cap delay = calculate_delay( attempt, base_delay, multiplier, max_time, jitter )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
⛔ Files ignored due to path filters (1)
unstract/sdk1/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
unstract/sdk1/pyproject.toml(1 hunks)unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py(1 hunks)unstract/sdk1/tests/conftest.py(1 hunks)unstract/sdk1/tests/test_platform.py(1 hunks)unstract/sdk1/tests/utils/test_retry_utils.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- unstract/sdk1/tests/conftest.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (3)
unstract/sdk1/tests/test_platform.py (1)
1-233: LGTM! Comprehensive integration test coverage.The test suite is well-structured with:
- Clear fixtures for mock_tool and platform_helper setup
- Parametrized tests to reduce duplication
- Coverage of success paths, retry scenarios, error handling, and edge cases
- Proper use of pytest.mark.slow for time-consuming tests
- Verification of logging behavior
The tests effectively validate the retry logic integration with PlatformHelper methods.
unstract/sdk1/tests/utils/test_retry_utils.py (1)
1-667: LGTM! Excellent comprehensive test coverage.The test suite demonstrates thorough coverage:
- TestIsRetryableError: Validates all retryable error types and errno codes
- TestCalculateDelay: Tests exponential backoff with/without jitter and max_delay capping
- TestRetryWithExponentialBackoff: Covers retry logic, predicates, time/retry limits, and logging
- TestCreateRetryDecorator: Validates environment-driven configuration and validation
- TestPreconfiguredDecorators: Ensures platform and prompt service decorators work correctly
- TestRetryLogging: Verifies logging at different stages of retry attempts
The tests are well-organized, use appropriate fixtures, and leverage parametrization effectively.
unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py (1)
1-305: Well-implemented retry utilities with good design.The implementation demonstrates solid engineering:
- Clear separation of concerns (error detection, delay calculation, decorator logic)
- Robust error handling with both exception tuples and optional predicates
- Environment-driven configuration with validation
- Comprehensive logging at appropriate levels
- Proper use of type hints and documentation
- Preconfigured decorators for platform and prompt services
The retry logic correctly handles time limits, retry counts, and exponential backoff with optional jitter.
Resolved merge conflicts in uv.lock files by accepting incoming changes and regenerating lock files to ensure dependencies are up to date. Changes include: - Updated lock files for backend, platform-service, prompt-service, and sdk1 - Merged latest main branch changes including celery config fixes - Added notification and scheduler worker enhancements
#1567) [FEAT] Update unstract-sdk to v0.78.0 across all services and tools - Updated unstract-sdk dependency from v0.77.3 to v0.78.0 in all pyproject.toml files - Main repository, backend, workers, platform-service, prompt-service - filesystem and tool-registry modules - Updated tool requirements.txt files (structure, classifier, text_extractor) - Bumped tool versions in properties.json: - Structure tool: 0.0.88 → 0.0.89 - Classifier tool: 0.0.68 → 0.0.69 - Text extractor tool: 0.0.64 → 0.0.65 - Updated tool versions in backend/sample.env and public_tools.json - Regenerated all uv.lock files with new SDK version This update brings in the retry logic with exponential backoff from unstract-sdk v0.78.0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>
|
|
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
unstract/tool-registry/tool_registry_config/public_tools.json (2)
8-111: Classifier Docker image tag “0.0.69” not found on Docker Hub
The updatedimage_tag/image_urlreferences a non-existent tag. Publishunstract/tool-classifier:0.0.69or revert to a valid version before merging.
119-196: Ensure Docker image unstract/tool-text-extractor:0.0.65 is publishedDocker Hub query returned “not found” for unstract/tool-text-extractor:0.0.65. Publish the 0.0.65 tag (or correct the tag to an existing version) before deploying.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
⛔ Files ignored due to path filters (7)
backend/uv.lockis excluded by!**/*.lockplatform-service/uv.lockis excluded by!**/*.lockprompt-service/uv.lockis excluded by!**/*.lockunstract/filesystem/uv.lockis excluded by!**/*.lockunstract/tool-registry/uv.lockis excluded by!**/*.lockuv.lockis excluded by!**/*.lockworkers/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (15)
backend/pyproject.toml(1 hunks)backend/sample.env(1 hunks)platform-service/pyproject.toml(1 hunks)prompt-service/pyproject.toml(1 hunks)pyproject.toml(1 hunks)tools/classifier/requirements.txt(1 hunks)tools/classifier/src/config/properties.json(1 hunks)tools/structure/requirements.txt(1 hunks)tools/structure/src/config/properties.json(1 hunks)tools/text_extractor/requirements.txt(1 hunks)tools/text_extractor/src/config/properties.json(1 hunks)unstract/filesystem/pyproject.toml(1 hunks)unstract/tool-registry/pyproject.toml(1 hunks)unstract/tool-registry/tool_registry_config/public_tools.json(3 hunks)workers/pyproject.toml(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- tools/structure/src/config/properties.json
- tools/classifier/src/config/properties.json
- tools/text_extractor/src/config/properties.json
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (10)
workers/pyproject.toml (1)
25-25: Dependency bump aligns with SDK 0.78.0 rolloutKeeping the azure extra while moving to
0.78.0matches the broader upgrade plan; LGTM.platform-service/pyproject.toml (1)
18-18: Platform service dependency bump looks goodVersion
0.78.0with the same extras keeps parity with other services and supports the new retry utilities.tools/text_extractor/requirements.txt (1)
7-7: Text extractor upgrade confirmedThe move to
unstract-sdk[aws]~=0.78.0keeps the expected extra and aligns with the SDK bump elsewhere.unstract/tool-registry/pyproject.toml (1)
14-14: Registry dependency bump approvedUpdating to
unstract-sdk~=0.78.0keeps the registry in sync with the rest of the stack.tools/structure/requirements.txt (1)
6-6: Structure tool dependency bump checks out
unstract-sdk[aws]~=0.78.0matches the project-wide upgrade and retains the needed extra.unstract/filesystem/pyproject.toml (1)
9-9: Filesystem SDK bump looks goodRaising to
unstract-sdk~=0.78.0keeps the filesystem package aligned with the new SDK release.backend/pyproject.toml (1)
38-38: Backend dependency update verifiedThe backend now targets
unstract-sdk[aws,gcs,azure]~=0.78.0, matching the multicloud profile we ship—looks solid.pyproject.toml (1)
45-45: Hook environment stays in syncUpdating the hook-check dependency set to
unstract-sdk~=0.78.0maintains consistency with the rest of the repo.prompt-service/pyproject.toml (2)
22-28: LGTM! SDK1 integration properly configured.The addition of
unstract-sdk1as an editable dependency with AWS, GCS, and Azure extras correctly supports the retry logic integration described in the PR.
18-18: Approve dependency update
unstract-sdk[aws,gcs,azure]~=0.78.0is available on PyPI with the specified extras.



What
Why
How
Core Retry Logic (
retry_utils.py)Service Integration
@retry_platform_service_calldecorator to_get_adapter_configuration()and_call_service()@retry_prompt_service_calldecorator to_call_service()Test Suite (67 tests)
@pytest.mark.slow(skipped in CI for faster runs)CI Integration
sdk1environment to tox.iniCan this PR break any existing features. If yes, please list possible items. If no, please explain why.
No, this PR cannot break existing features because:
Database Migrations
Env Config
Optional environment variables for retry configuration (per service prefix:
PLATFORM_SERVICE_orPROMPT_SERVICE_):MAX_RETRIESMAX_TIMEBASE_DELAYMULTIPLIERJITTERExamples:
PLATFORM_SERVICE_MAX_RETRIES=5PROMPT_SERVICE_BASE_DELAY=2.0PLATFORM_SERVICE_JITTER=falseNote: These are optional. If not set, sensible defaults are used.
Relevant Docs
Related Issues or PRs
Dependencies Versions
requests,pytest(for tests)Notes on Testing
Local Testing
CI Testing
@pytest.mark.slow) for faster feedbackManual Testing
Screenshots
N/A - Backend library changes only
Checklist
I have read and understood the Contribution Guidelines.