UN-2725 [FEAT] - Support for OpenAI's GPT-5 models by pk-zipstack · Pull Request #1516 · Zipstack/unstract

pk-zipstack · 2025-09-01T10:47:25Z

What

Unstract-sdk version update for OpenAI's GPT-5 models.

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

No, tested with the updated versions in local.

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

UN-2725 Updated Lllama-index version and related dependencies to Support GPT-5 models unstract-sdk#196

Dependencies Versions

Notes on Testing

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

coderabbitai · 2025-09-01T10:47:33Z

Warning

Rate limit exceeded

@pk-zipstack has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 15 minutes and 58 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between a9f0858 and 099e400.

⛔ Files ignored due to path filters (1)

prompt-service/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (1)

prompt-service/pyproject.toml (2 hunks)

Summary by CodeRabbit

Chores
- Updated dependencies across services: unstract-sdk to ~=0.77.1, python-dotenv to 1.0.1, llama-index to 0.13.2.
Tools
- Upgraded tool versions and images: Structure Tool 0.0.86, Classifier 0.0.68, Text Extractor 0.0.64.
Configuration
- Refreshed sample environment and tool registry to reference new tool versions and image tags.

Walkthrough

Dependency and tool-version bumps across services and tools: unstract-sdk to ~=0.77.1, python-dotenv to 1.0.1, llama-index to 0.13.2; tool config versions and container image tags updated for Structure, Classifier, and Text Extractor. No public API changes.

Changes

Cohort / File(s)	Summary
Unstract SDK version bump `backend/pyproject.toml`, `platform-service/pyproject.toml`, `prompt-service/pyproject.toml`, `unstract/filesystem/pyproject.toml`, `unstract/tool-registry/pyproject.toml`, `pyproject.toml` (hook), `tools/classifier/requirements.txt`, `tools/text_extractor/requirements.txt`	Update `unstract-sdk` dependency from `~=0.76.1` → `~=0.77.1` (including extras: `[azure]`, `[gcs, azure, aws]`, `[aws]`, and hook entry).
python-dotenv version bump `backend/pyproject.toml`, `prompt-service/pyproject.toml` (deps & test), `pyproject.toml` (hook)	Update `python-dotenv` from `1.0.0` → `1.0.1`.
llama-index version bump `prompt-service/pyproject.toml`	Update `llama-index` from `0.12.39` → `0.13.2`.
Tool config version bumps `tools/classifier/src/config/properties.json`, `tools/structure/src/config/properties.json`, `tools/text_extractor/src/config/properties.json`	Bump `toolVersion`: Classifier `0.0.67`→`0.0.68`; Structure `0.0.85`→`0.0.86`; Text Extractor `0.0.63`→`0.0.64`.
Tool registry updates `unstract/tool-registry/tool_registry_config/public_tools.json`	Sync `classify` and `text_extractor` entries: `toolVersion`, `image_tag`, and `image_url` updated to the new versions (`0.0.68` and `0.0.64`).
Structure tool image in env `backend/sample.env`	Update `STRUCTURE_TOOL_IMAGE_URL` and `STRUCTURE_TOOL_IMAGE_TAG` from `0.0.85` → `0.0.86`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch deps/unstract-sdk-v0.77.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

backend/pyproject.toml (1)

33-33: Align python-dotenv version spec across the monorepo.

Detected mixed pins in:

tool-sidecar/pyproject.toml (“python-dotenv>=1.0.0”)

runner/pyproject.toml (“python-dotenv>=1.0.0”)

x2text-service/pyproject.toml (“python-dotenv~=1.0.0”)

platform-service/pyproject.toml (“python-dotenv~=1.0”)

tox.ini (python-dotenv>=1.0.0)

Choose and apply one uniform spec everywhere:
• Strict pin (==1.0.1) for exact reproducibility
• Or minor-range (>=1.0.1,<2.0.0) for flexible updates

tools/classifier/requirements.txt (1)

7-7: Classifier on unstract-sdk[aws] ~=0.77.0: looks good; add a smoke path.

Add/confirm a simple E2E or contract test that runs a minimal classification using a GPT‑5 model to catch SDK/model ID mismatches early.

I can draft a minimal smoke test (pytest) that exercises the tool with a mock provider—want me to add it?
unstract/tool-registry/tool_registry_config/public_tools.json (1)
109-111: Avoid drift: derive image_url from image_name + image_tag or pin by digest.
Reduce duplication by constructing image_url at runtime, or pin images by sha256 to improve supply-chain reproducibility.

Apply pattern:
-        "image_url": "docker:unstract/tool-classifier:0.0.68",
+        "image_url": "docker:unstract/tool-classifier:@TAG@",  // server-side replace from image_tag
Or add a sibling field:
+        "image_digest": "sha256:<digest>"
Also applies to: 194-196
backend/sample.env (1)

81-83: Optional: single source of truth for image tag.
Keep only TAG in env and build URL in code to prevent drift between URL/NAME/TAG.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between d34f6b8 and 91c5266.

⛔ Files ignored due to path filters (6)

backend/uv.lock is excluded by !**/*.lock
platform-service/uv.lock is excluded by !**/*.lock
prompt-service/uv.lock is excluded by !**/*.lock
unstract/filesystem/uv.lock is excluded by !**/*.lock
unstract/tool-registry/uv.lock is excluded by !**/*.lock
uv.lock is excluded by !**/*.lock

📒 Files selected for processing (13)

backend/pyproject.toml (1 hunks)
backend/sample.env (1 hunks)
platform-service/pyproject.toml (1 hunks)
prompt-service/pyproject.toml (2 hunks)
pyproject.toml (1 hunks)
tools/classifier/requirements.txt (1 hunks)
tools/classifier/src/config/properties.json (1 hunks)
tools/structure/src/config/properties.json (1 hunks)
tools/text_extractor/requirements.txt (1 hunks)
tools/text_extractor/src/config/properties.json (1 hunks)
unstract/filesystem/pyproject.toml (1 hunks)
unstract/tool-registry/pyproject.toml (1 hunks)
unstract/tool-registry/tool_registry_config/public_tools.json (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (12)

backend/pyproject.toml (1)

40-40: Sync unstract-sdk versions & verify GPT-5 support

tools/structure/requirements.txt:6 still pins unstract-sdk[aws]~=0.74.0; update it to ~=0.77.0[azure] to match other services.

Manually confirm that unstract-sdk v0.77.x includes GPT-5 support (correct model IDs like gpt-5*) and that Azure OpenAI in your target regions exposes those models.

Verify OpenAI/Azure SDK client minimum versions are pulled in (or add explicit pins) and that any relevant environment variable names haven’t changed.

unstract/tool-registry/pyproject.toml (1)

14-14: Registry metadata in sync: toolVersion, image_tag, and image_url values align for classifier (0.0.68) and text_extractor (0.0.64).
unstract/filesystem/pyproject.toml (1)
9-9: Core SDK only: verify provider-specific imports
Run
rg -nP 'import\s+(?:boto3|aioboto3|azure\.identity)' unstract/filesystem
If AWS (boto3/aioboto3) or Azure (azure-identity) are imported, update unstract/filesystem/pyproject.toml to include those alongside unstract-sdk~=0.77.0; otherwise leave as-is.
tools/text_extractor/src/config/properties.json (1)

5-5: Version references updated to 0.0.64 – public_tools.json (toolVersion, image_url, image_tag) and config/properties.json now reflect 0.0.64; no env samples contain versioned entries.

tools/classifier/src/config/properties.json (1)

5-5: Approve version bump; registry parity confirmed. Ensure Docker image unstract/tool-classifier:0.0.68 is published before merging to avoid deploy-time pull failures.
prompt-service/pyproject.toml (3)
36-36: Test group dotenv bump—LGTM.

Keep test/runtime dotenv versions aligned (they are).

19-19: Confirm GPT-5 model discovery across config → SDK → UI/API
No static allowlists for gpt-5 were detected—manually verify end-to-end support in your config, the unstract-SDK, and the UI/API.

14-15: Verify LlamaIndex 0.13.2 and python-dotenv 1.0.1 upgrade
Search for deprecated LlamaIndex APIs in the actual code folder (e.g. prompt-service/) with:
rg -nP '\b(ServiceContext|LLMPredictor|PromptHelper|GPTVectorStoreIndex)\b' -C2 --type=py prompt-service/
Ensure both import llama_index and import dotenv succeed in your runtime after installing the new versions.
platform-service/pyproject.toml (1)

18-18: SDK bump confirmed—lock file updated
All unstract-sdk references now align on ~=0.77.0, and platform-service/uv.lock reflects the bump. Proceed to smoke-test platform endpoints with SDK v0.77.0.

tools/text_extractor/requirements.txt (1)

7-7: Rebuild/publish tool-text-extractor:0.0.64 before rollout. All toolVersion entries in properties.json and public_tools.json are correctly updated to 0.0.64.

unstract/tool-registry/tool_registry_config/public_tools.json (1)

119-119: Text Extractor bump to 0.0.64 is consistent—no 0.0.63 stragglers found.

backend/sample.env (1)

81-83: Verified all 0.0.85 references removed; updated to 0.0.86
No remaining instances of version 0.0.85 found in the repository.

pyproject.toml

tools/structure/src/config/properties.json

unstract/tool-registry/tool_registry_config/public_tools.json

gaya3-zipstack

Looks good. @hari-kuriakose 's script can save time here, for next time, I think.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

pyproject.toml (1)
38-38: Standardize python-dotenv across all modules.

Pin to ==1.0.1 here is fine; ensure every subproject and test group uses the same constraint to prevent env parsing regressions during CI.
#!/bin/bash
set -euo pipefail
# Locate non-==1.0.1 dotenv specs
rg -nI 'python-dotenv' -C1 | grep -Ev '==\s*1\.0\.1' || true

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 91c5266 and a9f0858.

⛔ Files ignored due to path filters (6)

backend/uv.lock is excluded by !**/*.lock
platform-service/uv.lock is excluded by !**/*.lock
prompt-service/uv.lock is excluded by !**/*.lock
unstract/filesystem/uv.lock is excluded by !**/*.lock
unstract/tool-registry/uv.lock is excluded by !**/*.lock
uv.lock is excluded by !**/*.lock

📒 Files selected for processing (8)

backend/pyproject.toml (1 hunks)
platform-service/pyproject.toml (1 hunks)
prompt-service/pyproject.toml (2 hunks)
pyproject.toml (1 hunks)
tools/classifier/requirements.txt (1 hunks)
tools/text_extractor/requirements.txt (1 hunks)
unstract/filesystem/pyproject.toml (1 hunks)
unstract/tool-registry/pyproject.toml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (6)

unstract/tool-registry/pyproject.toml
platform-service/pyproject.toml
backend/pyproject.toml
prompt-service/pyproject.toml
unstract/filesystem/pyproject.toml
tools/text_extractor/requirements.txt

pyproject.toml

tools/classifier/requirements.txt

github-actions · 2025-09-04T12:07:07Z

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

sonarqubecloud · 2025-09-04T12:07:16Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Updated unstract-sdk version to v0.77.0 and also updated tool versions

91c5266

pk-zipstack requested review from Deepak-Kesavan, chandrasekharan-zipstack and gaya3-zipstack September 1, 2025 10:47

pk-zipstack self-assigned this Sep 1, 2025

coderabbitai bot reviewed Sep 1, 2025

View reviewed changes

pyproject.toml Show resolved Hide resolved

tools/structure/src/config/properties.json Show resolved Hide resolved

unstract/tool-registry/tool_registry_config/public_tools.json Show resolved Hide resolved

chandrasekharan-zipstack approved these changes Sep 1, 2025

View reviewed changes

gaya3-zipstack approved these changes Sep 1, 2025

View reviewed changes

Updated unstract-sdk to version 0.77.1

a9f0858

coderabbitai bot reviewed Sep 2, 2025

View reviewed changes

pyproject.toml Show resolved Hide resolved

tools/classifier/requirements.txt Show resolved Hide resolved

pk-zipstack added 3 commits September 4, 2025 17:29

Merge branch 'main' into deps/unstract-sdk-v0.77.0

79a6cb2

Merge branch 'main' into deps/unstract-sdk-v0.77.0

9d0607f

Added uv.lock for prompt-studio

099e400

chandrasekharan-zipstack merged commit 91e78db into main Sep 4, 2025
8 checks passed

chandrasekharan-zipstack deleted the deps/unstract-sdk-v0.77.0 branch September 4, 2025 12:16

chandrasekharan-zipstack changed the title ~~UN-2725 [DEPS] - Support for OpenAI's GPT-5 models~~ UN-2725 [FEAT] - Support for OpenAI's GPT-5 models Sep 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UN-2725 [FEAT] - Support for OpenAI's GPT-5 models#1516

UN-2725 [FEAT] - Support for OpenAI's GPT-5 models#1516
chandrasekharan-zipstack merged 5 commits intomainfrom
deps/unstract-sdk-v0.77.0

pk-zipstack commented Sep 1, 2025

Uh oh!

coderabbitai bot commented Sep 1, 2025 •

edited

Loading

Rate limit exceeded

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gaya3-zipstack left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

sonarqubecloud bot commented Sep 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pk-zipstack commented Sep 1, 2025

What

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

Uh oh!

coderabbitai bot commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gaya3-zipstack left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

sonarqubecloud bot commented Sep 4, 2025

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Sep 1, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)