Skip to content

UN-2725 [FEAT] - Support for OpenAI's GPT-5 models#1516

Merged
chandrasekharan-zipstack merged 5 commits intomainfrom
deps/unstract-sdk-v0.77.0
Sep 4, 2025
Merged

UN-2725 [FEAT] - Support for OpenAI's GPT-5 models#1516
chandrasekharan-zipstack merged 5 commits intomainfrom
deps/unstract-sdk-v0.77.0

Conversation

@pk-zipstack
Copy link
Contributor

What

  • Unstract-sdk version update for OpenAI's GPT-5 models.

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

  • No, tested with the updated versions in local.

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 1, 2025

Warning

Rate limit exceeded

@pk-zipstack has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 15 minutes and 58 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between a9f0858 and 099e400.

⛔ Files ignored due to path filters (1)
  • prompt-service/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • prompt-service/pyproject.toml (2 hunks)

Summary by CodeRabbit

  • Chores
    • Updated dependencies across services: unstract-sdk to ~=0.77.1, python-dotenv to 1.0.1, llama-index to 0.13.2.
  • Tools
    • Upgraded tool versions and images: Structure Tool 0.0.86, Classifier 0.0.68, Text Extractor 0.0.64.
  • Configuration
    • Refreshed sample environment and tool registry to reference new tool versions and image tags.

Walkthrough

Dependency and tool-version bumps across services and tools: unstract-sdk to ~=0.77.1, python-dotenv to 1.0.1, llama-index to 0.13.2; tool config versions and container image tags updated for Structure, Classifier, and Text Extractor. No public API changes.

Changes

Cohort / File(s) Summary
Unstract SDK version bump
backend/pyproject.toml, platform-service/pyproject.toml, prompt-service/pyproject.toml, unstract/filesystem/pyproject.toml, unstract/tool-registry/pyproject.toml, pyproject.toml (hook), tools/classifier/requirements.txt, tools/text_extractor/requirements.txt
Update unstract-sdk dependency from ~=0.76.1~=0.77.1 (including extras: [azure], [gcs, azure, aws], [aws], and hook entry).
python-dotenv version bump
backend/pyproject.toml, prompt-service/pyproject.toml (deps & test), pyproject.toml (hook)
Update python-dotenv from 1.0.01.0.1.
llama-index version bump
prompt-service/pyproject.toml
Update llama-index from 0.12.390.13.2.
Tool config version bumps
tools/classifier/src/config/properties.json, tools/structure/src/config/properties.json, tools/text_extractor/src/config/properties.json
Bump toolVersion: Classifier 0.0.670.0.68; Structure 0.0.850.0.86; Text Extractor 0.0.630.0.64.
Tool registry updates
unstract/tool-registry/tool_registry_config/public_tools.json
Sync classify and text_extractor entries: toolVersion, image_tag, and image_url updated to the new versions (0.0.68 and 0.0.64).
Structure tool image in env
backend/sample.env
Update STRUCTURE_TOOL_IMAGE_URL and STRUCTURE_TOOL_IMAGE_TAG from 0.0.850.0.86.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch deps/unstract-sdk-v0.77.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
backend/pyproject.toml (1)

33-33: Align python-dotenv version spec across the monorepo.

Detected mixed pins in:

  • tool-sidecar/pyproject.toml (“python-dotenv>=1.0.0”)
  • runner/pyproject.toml (“python-dotenv>=1.0.0”)
  • x2text-service/pyproject.toml (“python-dotenv~=1.0.0”)
  • platform-service/pyproject.toml (“python-dotenv~=1.0”)
  • tox.ini (python-dotenv>=1.0.0)

Choose and apply one uniform spec everywhere:
• Strict pin (==1.0.1) for exact reproducibility
• Or minor-range (>=1.0.1,<2.0.0) for flexible updates

tools/classifier/requirements.txt (1)

7-7: Classifier on unstract-sdk[aws] ~=0.77.0: looks good; add a smoke path.

Add/confirm a simple E2E or contract test that runs a minimal classification using a GPT‑5 model to catch SDK/model ID mismatches early.

I can draft a minimal smoke test (pytest) that exercises the tool with a mock provider—want me to add it?

unstract/tool-registry/tool_registry_config/public_tools.json (1)

109-111: Avoid drift: derive image_url from image_name + image_tag or pin by digest.
Reduce duplication by constructing image_url at runtime, or pin images by sha256 to improve supply-chain reproducibility.

Apply pattern:

-        "image_url": "docker:unstract/tool-classifier:0.0.68",
+        "image_url": "docker:unstract/tool-classifier:@TAG@",  // server-side replace from image_tag

Or add a sibling field:

+        "image_digest": "sha256:<digest>"

Also applies to: 194-196

backend/sample.env (1)

81-83: Optional: single source of truth for image tag.
Keep only TAG in env and build URL in code to prevent drift between URL/NAME/TAG.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between d34f6b8 and 91c5266.

⛔ Files ignored due to path filters (6)
  • backend/uv.lock is excluded by !**/*.lock
  • platform-service/uv.lock is excluded by !**/*.lock
  • prompt-service/uv.lock is excluded by !**/*.lock
  • unstract/filesystem/uv.lock is excluded by !**/*.lock
  • unstract/tool-registry/uv.lock is excluded by !**/*.lock
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • backend/pyproject.toml (1 hunks)
  • backend/sample.env (1 hunks)
  • platform-service/pyproject.toml (1 hunks)
  • prompt-service/pyproject.toml (2 hunks)
  • pyproject.toml (1 hunks)
  • tools/classifier/requirements.txt (1 hunks)
  • tools/classifier/src/config/properties.json (1 hunks)
  • tools/structure/src/config/properties.json (1 hunks)
  • tools/text_extractor/requirements.txt (1 hunks)
  • tools/text_extractor/src/config/properties.json (1 hunks)
  • unstract/filesystem/pyproject.toml (1 hunks)
  • unstract/tool-registry/pyproject.toml (1 hunks)
  • unstract/tool-registry/tool_registry_config/public_tools.json (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (12)
backend/pyproject.toml (1)

40-40: Sync unstract-sdk versions & verify GPT-5 support

  • tools/structure/requirements.txt:6 still pins unstract-sdk[aws]~=0.74.0; update it to ~=0.77.0[azure] to match other services.
  • Manually confirm that unstract-sdk v0.77.x includes GPT-5 support (correct model IDs like gpt-5*) and that Azure OpenAI in your target regions exposes those models.
  • Verify OpenAI/Azure SDK client minimum versions are pulled in (or add explicit pins) and that any relevant environment variable names haven’t changed.
unstract/tool-registry/pyproject.toml (1)

14-14: Registry metadata in sync: toolVersion, image_tag, and image_url values align for classifier (0.0.68) and text_extractor (0.0.64).

unstract/filesystem/pyproject.toml (1)

9-9: Core SDK only: verify provider-specific imports
Run

rg -nP 'import\s+(?:boto3|aioboto3|azure\.identity)' unstract/filesystem

If AWS (boto3/aioboto3) or Azure (azure-identity) are imported, update unstract/filesystem/pyproject.toml to include those alongside unstract-sdk~=0.77.0; otherwise leave as-is.

tools/text_extractor/src/config/properties.json (1)

5-5: Version references updated to 0.0.64 – public_tools.json (toolVersion, image_url, image_tag) and config/properties.json now reflect 0.0.64; no env samples contain versioned entries.

tools/classifier/src/config/properties.json (1)

5-5: Approve version bump; registry parity confirmed. Ensure Docker image unstract/tool-classifier:0.0.68 is published before merging to avoid deploy-time pull failures.

prompt-service/pyproject.toml (3)

36-36: Test group dotenv bump—LGTM.

Keep test/runtime dotenv versions aligned (they are).


19-19: Confirm GPT-5 model discovery across config → SDK → UI/API
No static allowlists for gpt-5 were detected—manually verify end-to-end support in your config, the unstract-SDK, and the UI/API.


14-15: Verify LlamaIndex 0.13.2 and python-dotenv 1.0.1 upgrade

  • Search for deprecated LlamaIndex APIs in the actual code folder (e.g. prompt-service/) with:
    rg -nP '\b(ServiceContext|LLMPredictor|PromptHelper|GPTVectorStoreIndex)\b' -C2 --type=py prompt-service/
  • Ensure both import llama_index and import dotenv succeed in your runtime after installing the new versions.
platform-service/pyproject.toml (1)

18-18: SDK bump confirmed—lock file updated
All unstract-sdk references now align on ~=0.77.0, and platform-service/uv.lock reflects the bump. Proceed to smoke-test platform endpoints with SDK v0.77.0.

tools/text_extractor/requirements.txt (1)

7-7: Rebuild/publish tool-text-extractor:0.0.64 before rollout. All toolVersion entries in properties.json and public_tools.json are correctly updated to 0.0.64.

unstract/tool-registry/tool_registry_config/public_tools.json (1)

119-119: Text Extractor bump to 0.0.64 is consistent—no 0.0.63 stragglers found.

backend/sample.env (1)

81-83: Verified all 0.0.85 references removed; updated to 0.0.86
No remaining instances of version 0.0.85 found in the repository.

Copy link
Contributor

@gaya3-zipstack gaya3-zipstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. @hari-kuriakose 's script can save time here, for next time, I think.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
pyproject.toml (1)

38-38: Standardize python-dotenv across all modules.

Pin to ==1.0.1 here is fine; ensure every subproject and test group uses the same constraint to prevent env parsing regressions during CI.

#!/bin/bash
set -euo pipefail
# Locate non-==1.0.1 dotenv specs
rg -nI 'python-dotenv' -C1 | grep -Ev '==\s*1\.0\.1' || true
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 91c5266 and a9f0858.

⛔ Files ignored due to path filters (6)
  • backend/uv.lock is excluded by !**/*.lock
  • platform-service/uv.lock is excluded by !**/*.lock
  • prompt-service/uv.lock is excluded by !**/*.lock
  • unstract/filesystem/uv.lock is excluded by !**/*.lock
  • unstract/tool-registry/uv.lock is excluded by !**/*.lock
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • backend/pyproject.toml (1 hunks)
  • platform-service/pyproject.toml (1 hunks)
  • prompt-service/pyproject.toml (2 hunks)
  • pyproject.toml (1 hunks)
  • tools/classifier/requirements.txt (1 hunks)
  • tools/text_extractor/requirements.txt (1 hunks)
  • unstract/filesystem/pyproject.toml (1 hunks)
  • unstract/tool-registry/pyproject.toml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • unstract/tool-registry/pyproject.toml
  • platform-service/pyproject.toml
  • backend/pyproject.toml
  • prompt-service/pyproject.toml
  • unstract/filesystem/pyproject.toml
  • tools/text_extractor/requirements.txt

@github-actions
Copy link
Contributor

github-actions bot commented Sep 4, 2025

filepath function $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_logs}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_client\_init}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_run\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{11}}$$ $$\textcolor{#23d18b}{\tt{11}}$$

@sonarqubecloud
Copy link

sonarqubecloud bot commented Sep 4, 2025

@chandrasekharan-zipstack chandrasekharan-zipstack merged commit 91e78db into main Sep 4, 2025
8 checks passed
@chandrasekharan-zipstack chandrasekharan-zipstack deleted the deps/unstract-sdk-v0.77.0 branch September 4, 2025 12:16
@chandrasekharan-zipstack chandrasekharan-zipstack changed the title UN-2725 [DEPS] - Support for OpenAI's GPT-5 models UN-2725 [FEAT] - Support for OpenAI's GPT-5 models Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants