UN-2725 [FEAT] - Support for OpenAI's GPT-5 models#1516
UN-2725 [FEAT] - Support for OpenAI's GPT-5 models#1516chandrasekharan-zipstack merged 5 commits intomainfrom
Conversation
|
Warning Rate limit exceeded@pk-zipstack has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 15 minutes and 58 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
Summary by CodeRabbit
WalkthroughDependency and tool-version bumps across services and tools: unstract-sdk to ~=0.77.1, python-dotenv to 1.0.1, llama-index to 0.13.2; tool config versions and container image tags updated for Structure, Classifier, and Text Extractor. No public API changes. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
backend/pyproject.toml (1)
33-33: Align python-dotenv version spec across the monorepo.Detected mixed pins in:
- tool-sidecar/pyproject.toml (“python-dotenv>=1.0.0”)
- runner/pyproject.toml (“python-dotenv>=1.0.0”)
- x2text-service/pyproject.toml (“python-dotenv~=1.0.0”)
- platform-service/pyproject.toml (“python-dotenv~=1.0”)
- tox.ini (python-dotenv>=1.0.0)
Choose and apply one uniform spec everywhere:
• Strict pin (==1.0.1) for exact reproducibility
• Or minor-range (>=1.0.1,<2.0.0) for flexible updatestools/classifier/requirements.txt (1)
7-7: Classifier on unstract-sdk[aws] ~=0.77.0: looks good; add a smoke path.Add/confirm a simple E2E or contract test that runs a minimal classification using a GPT‑5 model to catch SDK/model ID mismatches early.
I can draft a minimal smoke test (pytest) that exercises the tool with a mock provider—want me to add it?
unstract/tool-registry/tool_registry_config/public_tools.json (1)
109-111: Avoid drift: derive image_url from image_name + image_tag or pin by digest.
Reduce duplication by constructing image_url at runtime, or pin images by sha256 to improve supply-chain reproducibility.Apply pattern:
- "image_url": "docker:unstract/tool-classifier:0.0.68", + "image_url": "docker:unstract/tool-classifier:@TAG@", // server-side replace from image_tagOr add a sibling field:
+ "image_digest": "sha256:<digest>"Also applies to: 194-196
backend/sample.env (1)
81-83: Optional: single source of truth for image tag.
Keep only TAG in env and build URL in code to prevent drift between URL/NAME/TAG.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting
⛔ Files ignored due to path filters (6)
backend/uv.lockis excluded by!**/*.lockplatform-service/uv.lockis excluded by!**/*.lockprompt-service/uv.lockis excluded by!**/*.lockunstract/filesystem/uv.lockis excluded by!**/*.lockunstract/tool-registry/uv.lockis excluded by!**/*.lockuv.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
backend/pyproject.toml(1 hunks)backend/sample.env(1 hunks)platform-service/pyproject.toml(1 hunks)prompt-service/pyproject.toml(2 hunks)pyproject.toml(1 hunks)tools/classifier/requirements.txt(1 hunks)tools/classifier/src/config/properties.json(1 hunks)tools/structure/src/config/properties.json(1 hunks)tools/text_extractor/requirements.txt(1 hunks)tools/text_extractor/src/config/properties.json(1 hunks)unstract/filesystem/pyproject.toml(1 hunks)unstract/tool-registry/pyproject.toml(1 hunks)unstract/tool-registry/tool_registry_config/public_tools.json(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (12)
backend/pyproject.toml (1)
40-40: Sync unstract-sdk versions & verify GPT-5 support
- tools/structure/requirements.txt:6 still pins
unstract-sdk[aws]~=0.74.0; update it to~=0.77.0[azure]to match other services.- Manually confirm that unstract-sdk v0.77.x includes GPT-5 support (correct model IDs like
gpt-5*) and that Azure OpenAI in your target regions exposes those models.- Verify OpenAI/Azure SDK client minimum versions are pulled in (or add explicit pins) and that any relevant environment variable names haven’t changed.
unstract/tool-registry/pyproject.toml (1)
14-14: Registry metadata in sync: toolVersion, image_tag, and image_url values align for classifier (0.0.68) and text_extractor (0.0.64).unstract/filesystem/pyproject.toml (1)
9-9: Core SDK only: verify provider-specific imports
Runrg -nP 'import\s+(?:boto3|aioboto3|azure\.identity)' unstract/filesystemIf AWS (boto3/aioboto3) or Azure (azure-identity) are imported, update
unstract/filesystem/pyproject.tomlto include those alongsideunstract-sdk~=0.77.0; otherwise leave as-is.tools/text_extractor/src/config/properties.json (1)
5-5: Version references updated to 0.0.64 – public_tools.json (toolVersion, image_url, image_tag) and config/properties.json now reflect 0.0.64; no env samples contain versioned entries.tools/classifier/src/config/properties.json (1)
5-5: Approve version bump; registry parity confirmed. Ensure Docker image unstract/tool-classifier:0.0.68 is published before merging to avoid deploy-time pull failures.prompt-service/pyproject.toml (3)
36-36: Test group dotenv bump—LGTM.Keep test/runtime dotenv versions aligned (they are).
19-19: Confirm GPT-5 model discovery across config → SDK → UI/API
No static allowlists forgpt-5were detected—manually verify end-to-end support in your config, the unstract-SDK, and the UI/API.
14-15: Verify LlamaIndex 0.13.2 and python-dotenv 1.0.1 upgrade
- Search for deprecated LlamaIndex APIs in the actual code folder (e.g.
prompt-service/) with:rg -nP '\b(ServiceContext|LLMPredictor|PromptHelper|GPTVectorStoreIndex)\b' -C2 --type=py prompt-service/- Ensure both
import llama_indexandimport dotenvsucceed in your runtime after installing the new versions.platform-service/pyproject.toml (1)
18-18: SDK bump confirmed—lock file updated
Allunstract-sdkreferences now align on~=0.77.0, andplatform-service/uv.lockreflects the bump. Proceed to smoke-test platform endpoints with SDK v0.77.0.tools/text_extractor/requirements.txt (1)
7-7: Rebuild/publish tool-text-extractor:0.0.64 before rollout. AlltoolVersionentries inproperties.jsonandpublic_tools.jsonare correctly updated to 0.0.64.unstract/tool-registry/tool_registry_config/public_tools.json (1)
119-119: Text Extractor bump to 0.0.64 is consistent—no 0.0.63 stragglers found.backend/sample.env (1)
81-83: Verified all 0.0.85 references removed; updated to 0.0.86
No remaining instances of version 0.0.85 found in the repository.
gaya3-zipstack
left a comment
There was a problem hiding this comment.
Looks good. @hari-kuriakose 's script can save time here, for next time, I think.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
pyproject.toml (1)
38-38: Standardize python-dotenv across all modules.Pin to ==1.0.1 here is fine; ensure every subproject and test group uses the same constraint to prevent env parsing regressions during CI.
#!/bin/bash set -euo pipefail # Locate non-==1.0.1 dotenv specs rg -nI 'python-dotenv' -C1 | grep -Ev '==\s*1\.0\.1' || true
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting
⛔ Files ignored due to path filters (6)
backend/uv.lockis excluded by!**/*.lockplatform-service/uv.lockis excluded by!**/*.lockprompt-service/uv.lockis excluded by!**/*.lockunstract/filesystem/uv.lockis excluded by!**/*.lockunstract/tool-registry/uv.lockis excluded by!**/*.lockuv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
backend/pyproject.toml(1 hunks)platform-service/pyproject.toml(1 hunks)prompt-service/pyproject.toml(2 hunks)pyproject.toml(1 hunks)tools/classifier/requirements.txt(1 hunks)tools/text_extractor/requirements.txt(1 hunks)unstract/filesystem/pyproject.toml(1 hunks)unstract/tool-registry/pyproject.toml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- unstract/tool-registry/pyproject.toml
- platform-service/pyproject.toml
- backend/pyproject.toml
- prompt-service/pyproject.toml
- unstract/filesystem/pyproject.toml
- tools/text_extractor/requirements.txt
|
|



What
Why
How
Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
Database Migrations
Env Config
Relevant Docs
Related Issues or PRs
Dependencies Versions
Notes on Testing
Screenshots
Checklist
I have read and understood the Contribution Guidelines.