[pull] main from Unstructured-IO:main#3
Open
pull[bot] wants to merge 156 commits intoem3ndez:mainfrom
Open
Conversation
We're now using asyncio for page split concurrency, but because the client itself is not async, we need to manage our own event loop. This complains if your environment already has a running event loop. For instance, setting `split_pdf_page=True` in a jupyter cell will give you `RuntimeError: This event loop is already running`. Turns out there's a simple library to allow for nested event loops. We just apply the monkeypatch in split_pdf_hook.py and the error goes away. To verify, you'll need to run `pip install -e .` to install the local version of the client. Run `make run-jupyter` and open up the sample notebook in `_jupyter/`. Try making a request with page splitting enabled and you'll see the above error. Then, check out this branch, install locally again, restart your jupyter kernel, and the error is fixed.
Changes: * Bring the publish step back to the speakeasy workflow, regenerate the github action with `speakeasy configure publishing` * Remove incorrect readme note about parent_id being disabled * Knock the package version back down to 0.23.0 for continuity with PyPI versions
I added it to setup.py which of course is autogenerated.
> [!IMPORTANT] > Linting report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/linting-report/816dffe1d4e68668beb03d2fbf94c6b2> > OpenAPI Change report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/changes-report/d6609910d3434cf4bff19c21160c8da9> # SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.299.7 (2.338.12) https://github.com/speakeasy-api/speakeasy ## PYTHON CHANGELOG Co-authored-by: speakeasybot <bot@speakeasyapi.dev>
We're seeing an issue where the nest_asyncio.apply() workaround for nested loops is breaking when we're dealing with a `uvloop`. We need to investigate further, or remove the bandaid solution. In the meantime, we can unblock a simple import of the client in these environments by doing the apply only when the hook is run.
> [!IMPORTANT] > Linting report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/linting-report/482056910a6dbf60d3f53a1dc5af27a5> > OpenAPI Change report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/changes-report/37c93e98e1b0c8ec5531af6551eab9b0> # SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.300.0 (2.338.14) https://github.com/speakeasy-api/speakeasy ## PYTHON CHANGELOG Co-authored-by: speakeasybot <bot@speakeasyapi.dev>
Our docs page will be the source of truth, and the readme can have some general information from the autogenerated sections. Changes: * Link to our docs at the top * Remove the manually created usage snippet and swap out for the autogenerated one. This way, the readme stays up to date while we focus on the docs page. * Bring back autogenerated error handling - this will populate the next time the client regens
> [!IMPORTANT] > Linting report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/linting-report/01db84f5f80c422ff8c0ad8355dd0e2d> > OpenAPI Change report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/changes-report/09afe056dfd31b90a4ab59effebe7169> # SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.300.1 (2.339.1) https://github.com/speakeasy-api/speakeasy ## PYTHON CHANGELOG Co-authored-by: speakeasybot <bot@speakeasyapi.dev>
In the PR introducing more logging two things turned out lacking: - there was a false success log in split-page after error hook - the retry logs were only for 5XX responses where speakeasy also has retry set for ConnectionError exceptions This PR removes the faulty success log, moves success/failure logs to the `LoggerHook` and introduces logging for `ConnectionError` retries. --------- Co-authored-by: Filip Knefel <filip@unstructured.io>
> [!IMPORTANT] > Linting report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/linting-report/465e078af6f8af458707bb4481cdc70e> > OpenAPI Change report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/changes-report/5e2e1bb90eaabe5859394ad003a71875> # SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.300.1 (2.339.1) https://github.com/speakeasy-api/speakeasy ## PYTHON CHANGELOG Co-authored-by: speakeasybot <bot@speakeasyapi.dev>
> [!IMPORTANT] > Linting report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/linting-report/c1fec31921ab085211de39f108a9ca23> > OpenAPI Change report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/changes-report/a757f2f00046b6a6e109d6fcfb1abc22> # SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.308.1 (2.342.6) https://github.com/speakeasy-api/speakeasy ## OpenAPI Change Summary ``` ├─┬Info │ └──[🔀] version (1:80) ├─┬Paths │ └─┬/general/v0/general │ └─┬POST │ └─┬Responses │ ├──[➕] codes (1:1207) │ └─┬200 │ └─┬application/json │ └─┬Schema │ └──[➕] description (1:957) └─┬Components ├──[➕] schemas (1:10240) ├─┬partition_parameters │ ├─┬ocr_languages │ │ └──[🔀] description (1:4062) │ ├─┬xml_keep_tags │ │ └──[🔀] description (1:6020) │ ├─┬unique_element_ids │ │ └──[🔀] description (1:5728) │ ├─┬gz_uncompressed_content_type │ │ └──[🔀] description (1:3238) │ ├─┬include_page_breaks │ │ └──[🔀] description (1:3589) │ ├─┬coordinates │ │ └──[🔀] description (1:2526) │ ├─┬chunking_strategy │ │ └──[🔀] description (1:6380) │ ├─┬languages │ │ └──[🔀] description (1:3801) │ └─┬extract_image_block_types │ └──[🔀] description (1:2948) └─┬HTTPValidationError ├──[➖] title (1:1374) ├──[➕] example (1:1594) └─┬detail ├──[➖] items (1:1259)❌ ├──[➖] type (1:1317)❌ ├──[➖] title (1:1335) ├──[➕] oneOf (1:1482) └──[➕] oneOf (1:1560) ``` | Document Element | Total Changes | Breaking Changes | |------------------|---------------|------------------| | info | 1 | 0 | | paths | 2 | 0 | | components | 17 | 2 | ## PYTHON CHANGELOG ## unions: 2.82.8 - 2024-06-10 ### 🐛 Bug Fixes - ensure union type definitions define types in a way compatible with multiple python versions *(commit by [@TristanSpeakEasy](https://github.com/tristanspeakeasy))* ## core: 4.6.11 - 2024-06-14 ### 👷 Build System - fixed indentation as tabs in python makefile *(commit by [@TristanSpeakEasy](https://github.com/tristanspeakeasy))* ## core: 4.6.10 - 2024-06-13 ### 👷 Build System - move to new method of publishing for python *(commit by [@TristanSpeakEasy](https://github.com/tristanspeakeasy))* --------- Co-authored-by: speakeasybot <bot@speakeasyapi.dev> Co-authored-by: Austin Walker <austin@unstructured.io>
Co-authored-by: Austin Walker <austin@unstructured.io>
> [!IMPORTANT] > Linting report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/linting-report/3aec785f32070a79daac48f3e257f88d> > OpenAPI Change report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/changes-report/1ed855e6e969ec81ba5d8815b2cad456> # SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.308.1 (2.342.6) https://github.com/speakeasy-api/speakeasy ## PYTHON CHANGELOG Co-authored-by: speakeasybot <bot@speakeasyapi.dev>
* Set the split_pdf_page default to true and run `make client-generate`
locally.
* Update the readme, add another reference back to our docs
* Change some warning logs to info. The user should not be warned about
default behavior for non pdf files
# Testing
Use the client locally and verify that split mode is the default, and
that the client behavior is consistent with older versions.
* Set up (or activate) your pyenv for the client: `pyenv virtualenv 3.12
unstructured-client; pyenv activate unstructured-client`
* Check out this branch and install: `pip install -e .`
* Run this sample script in the top level of the client repo. Try
different files in `_sample_docs` and verify that the logging and
results look acceptable.
```
from unstructured_client import UnstructuredClient
from unstructured_client.models import shared, operations
import json
api_key = "free-api-key"
filename = "_sample_docs/layout-parser-paper.pdf"
s = UnstructuredClient(
api_key_auth=api_key,
)
with open(filename, "rb") as f:
files=shared.Files(
content=f.read(),
file_name=filename,
)
req = operations.PartitionRequest(
shared.PartitionParameters(
files=files,
strategy=shared.Strategy.AUTO
),
)
try:
resp = s.general.partition(req)
print(json.dumps(resp.elements, indent=4))
except Exception as e:
print(e)
```
> [!IMPORTANT] > Linting report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/linting-report/807de4722a65543cde9dbb2bcba4e7cf> > OpenAPI Change report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/changes-report/497ec57d30decc20c1a8ad59bc711aa3> # SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.308.1 (2.342.6) https://github.com/speakeasy-api/speakeasy ## OpenAPI Change Summary ``` └─┬Components └─┬partition_parameters └─┬split_pdf_page └──[🔀] default (1:9405)❌ ``` | Document Element | Total Changes | Breaking Changes | |------------------|---------------|------------------| | components | 1 | 1 | ## PYTHON CHANGELOG Co-authored-by: speakeasybot <bot@speakeasyapi.dev>
Verified that this shows a speedup by doing a local pip install and
running the following snippet before and after the change:
```
from unstructured_client import UnstructuredClient
from unstructured_client.models import shared
s = UnstructuredClient(
server_url=SERVER_URL,
api_key_auth=API_KEY,
)
filename = "../_sample_docs/layout-parser-paper.pdf"
with open(filename, "rb") as f:
# Note that this currently only supports a single file
files=shared.Files(
content=f.read(),
file_name=filename,
)
req = shared.PartitionParameters(
files=files,
strategy="hi_res",
)
start_time = time.time()
resp = s.general.partition(req)
end_time = time.time()
print(f"Elapsed time: {end_time - start_time} seconds")
```
> [!IMPORTANT] > Linting report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/linting-report/5969e484e45b07f1c14c78f621ee718a> > OpenAPI Change report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/changes-report/f72de0f38cd597a806a71502c5287fb1> # SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.322.1 (2.354.2) https://github.com/speakeasy-api/speakeasy ## OpenAPI Change Summary ``` ├─┬Info │ └──[🔀] version (264:12) ├─┬Paths │ └─┬/general/v0/general │ └──POST ├──Servers ├──Servers ├─┬Components │ ├─┬partition_parameters │ │ ├──files │ │ ├─┬output_format │ │ │ ├──[➕] enum (177:15) │ │ │ ├──[➕] enum (178:15) │ │ │ ├──[➖] enum (1:4212)❌ │ │ │ └──[➖] enum (1:4232)❌ │ │ ├─┬chunking_strategy │ │ │ └─┬ANYOF │ │ │ ├──[➕] enum (68:19) │ │ │ ├──[➕] enum (69:19) │ │ │ ├──[➕] enum (70:19) │ │ │ ├──[➕] enum (71:19) │ │ │ ├──[➖] enum (1:6248)❌ │ │ │ ├──[➖] enum (1:6257)❌ │ │ │ ├──[➖] enum (1:6268)❌ │ │ │ └──[➖] enum (1:6285)❌ │ │ └─┬strategy │ │ ├──[➕] enum (232:15) │ │ ├──[➕] enum (233:15) │ │ ├──[➕] enum (234:15) │ │ ├──[➕] enum (235:15) │ │ ├──[➖] enum (1:5403)❌ │ │ ├──[➖] enum (1:5377)❌ │ │ ├──[➖] enum (1:5385)❌ │ │ └──[➖] enum (1:5395)❌ │ ├─┬Element │ │ └──[🔀] example (6:9) │ ├─┬ServerError │ │ └──[🔀] example (36:9) │ ├─┬HTTPValidationError │ │ └──[🔀] example (19:9) │ └──ApiKeyAuth └─┬Extensions └──[🔀] x-speakeasy-retries (325:3) ``` | Document Element | Total Changes | Breaking Changes | |------------------|---------------|------------------| | paths | 0 | 0 | | servers | 0 | 0 | | components | 23 | 10 | | info | 1 | 0 | ## PYTHON CHANGELOG ## core: 4.6.13 - 2024-06-21 ### 🔧 Chores - update contribution section wording *(commit by [@disintegrator](https://github.com/disintegrator))* ## core: 0.2.4 - 2024-06-21 ### 🔧 Chores - update contribution section wording *(commit by [@disintegrator](https://github.com/disintegrator))* ## core: 4.6.12 - 2024-06-20 ### 🐛 Bug Fixes - test response status codes in sdk methods in order of specificity *(commit by [@disintegrator](https://github.com/disintegrator))* ## core: 0.2.3 - 2024-06-20 ### 🐛 Bug Fixes - test response status codes in sdk methods in order of specificity *(commit by [@disintegrator](https://github.com/disintegrator))* ## core: 0.2.2 - 2024-06-19 ### 🐛 Bug Fixes - various fixes for field naming, typedict serialization and tests *(commit by [@TristanSpeakEasy](https://github.com/tristanspeakeasy))* ## core: 0.2.1 - 2024-06-19 ### 🐛 Bug Fixes - generation of reserved field names *(commit by [@TristanSpeakEasy](https://github.com/tristanspeakeasy))* ## core: 0.2.0 - 2024-06-18 ### 🐝 New Features - added support for structural typing *(commit by [@TristanSpeakEasy](https://github.com/tristanspeakeasy))* Co-authored-by: speakeasybot <bot@speakeasyapi.dev> Co-authored-by: Austin Walker <awalk89@gmail.com>
> [!IMPORTANT] > Linting report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/linting-report/4f0ce155b8184add31556e4e9df60a84> > OpenAPI Change report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/changes-report/25efa7d832228704d26082e3827858f9> # SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.327.0 (2.359.6) https://github.com/speakeasy-api/speakeasy ## OpenAPI Change Summary ``` ├─┬Paths │ └─┬/general/v0/general │ └──POST ├──Servers ├──Servers ├─┬Components │ ├─┬Element │ │ └──[🔀] example (283:22) │ ├─┬HTTPValidationError │ │ └──[🔀] example (65:22) │ ├─┬ServerError │ │ └──[🔀] example (289:22) │ ├─┬partition_parameters │ │ ├─┬output_format │ │ │ ├──[➕] enum (150:27) │ │ │ ├──[➕] enum (151:27) │ │ │ ├──[➖] enum (177:27)❌ │ │ │ └──[➖] enum (178:27)❌ │ │ ├──files │ │ ├─┬chunking_strategy │ │ │ └─┬ANYOF │ │ │ ├──[➕] enum (203:31) │ │ │ ├──[➕] enum (204:31) │ │ │ ├──[➕] enum (205:31) │ │ │ ├──[➕] enum (206:31) │ │ │ ├──[➖] enum (68:31)❌ │ │ │ ├──[➖] enum (69:31)❌ │ │ │ ├──[➖] enum (70:31)❌ │ │ │ └──[➖] enum (71:31)❌ │ │ └─┬strategy │ │ ├──[➕] enum (179:27) │ │ ├──[➕] enum (180:27) │ │ ├──[➕] enum (181:27) │ │ ├──[➕] enum (178:27) │ │ ├──[➖] enum (232:27)❌ │ │ ├──[➖] enum (233:27)❌ │ │ ├──[➖] enum (234:27)❌ │ │ └──[➖] enum (235:27)❌ │ └──ApiKeyAuth └─┬Extensions └──[🔀] x-speakeasy-retries (300:22) ``` | Document Element | Total Changes | Breaking Changes | |------------------|---------------|------------------| | paths | 0 | 0 | | servers | 0 | 0 | | components | 23 | 10 | ## PYTHON CHANGELOG Co-authored-by: speakeasybot <bot@speakeasyapi.dev>
> [!IMPORTANT] > Linting report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/linting-report/86d52c444980b204248909ddaee6938f> > OpenAPI Change report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/changes-report/507dd83771f282dbf344b92b10df023f> # SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.330.0 (2.361.10) https://github.com/speakeasy-api/speakeasy ## PYTHON CHANGELOG ## core: 4.8.0 - 2024-07-05 ### 🐝 New Features - add timeout config to pythonv2 operations and sdk *(commit by [@ryan-timothy-albert](https://github.com/ryan-timothy-albert))* ## core: 0.2.9 - 2024-07-04 ### 🔧 Chores - reduce response matching boilerplate *(commit by [@disintegrator](https://github.com/disintegrator))* ## core: 0.2.8 - 2024-07-02 ### 🐛 Bug Fixes - use None as arg default instead of UNSET *(commit by [@disintegrator](https://github.com/disintegrator))* ## core: 4.7.0 - 2024-06-27 ### 🐝 New Features - add env variable global security support *(commit by [@ryan-timothy-albert](https://github.com/ryan-timothy-albert))* ## core: 4.6.14 - 2024-06-27 ### 🐛 Bug Fixes - remove unnecessary accept_header_override documentation elements *(commit by [@ThomasRooney](https://github.com/ThomasRooney))* ## core: 0.2.6 - 2024-06-27 ### 🐛 Bug Fixes - add "input" to reserved keywords in pythonv2 *(commit by [@disintegrator](https://github.com/disintegrator))* ## core: 0.2.5 - 2024-06-26 ### 🐛 Bug Fixes - disallow positional arguments in python v2 SDKs *(commit by [@disintegrator](https://github.com/disintegrator))* Co-authored-by: speakeasybot <bot@speakeasyapi.dev>
# New parameter
Add a client side param called `split_pdf_page_range` which takes a list
of two integers, `[start_page, end_page]`. If `split_pdf_page` is `True`
and a range is set, slice the doc from `start_page` up to and including
`end_page`. Only this page range will be sent to the API. The subset of
pages is still split up as needed.
# Other changes
Allow our custom hooks to properly access list parameters, so we're able
to intercept `split_pdf_page_range`. We need extra handling to get list
params out of the request in `parse_form_data`, and to rebuild the
payload in `create_request_body`.
# Testing
Check out this branch and set up a request to your local API:
```
client = UnstructuredClient(api_key_auth="", server_url="localhost:8000")
filename = "_sample_docs/layout-parser-paper.pdf"
with open(filename, "rb") as f:
files = shared.Files(
content=f.read(),
file_name=filename,
)
req = shared.PartitionParameters(
files=files,
strategy="fast",
split_pdf_page=True,
split_pdf_page_range=[1, 16],
)
resp = client.general.partition(req)
```
Test out various page ranges and confirm that the returned elements are
within the range. Invalid ranges should throw a ValueError (pages are
out of bounds, or end_page < start_page).
# SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.335.0 (2.370.2) https://github.com/speakeasy-api/speakeasy ## OpenAPI Change Summary ``` └─┬Components └─┬partition_parameters └──[➕] properties (270:17) ``` | Document Element | Total Changes | Breaking Changes | |------------------|---------------|------------------| | components | 1 | 0 | ## PYTHON CHANGELOG ## globalSecurity: 2.83.5 - 2024-03-15 ### 🐛 Bug Fixes - fixed hoisting of operation security *(commit by [@TristanSpeakEasy](https://github.com/tristanspeakeasy))* ## openEnums: 0.1.0 - 2024-05-14 ### 🐝 New Features - add support for "open" enums *(commit by [@disintegrator](https://github.com/disintegrator))* ## responseFormat: 0.1.0 - 2024-03-02 ### 🐝 New Features - add support for response formats and flat responses *(commit by [@TristanSpeakEasy](https://github.com/TristanSpeakeasy))* ## examples: 2.81.3 - 2023-10-17 ### 🔧 Chores - remove multi word generated examples *(commit by [@ThomasRooney](https://github.com/ThomasRooney))* ## nameOverrides: 2.81.2 - 2024-03-25 ### 🐛 Bug Fixes - x-speakeasy-name-overrides being missed when used under an allOf *(commit by [@ThomasRooney](https://github.com/ThomasRooney))* ## retries: 2.82.2 - 2024-04-10 ### 🐛 Bug Fixes - add method to correctly case retryConnectionErrors *bool for Python generation *(commit by [@AshGodfrey](https://github.com/AshGodfrey))* ## constsAndDefaults: 0.1.3 - 2024-03-01 ### 🐛 Bug Fixes - null enums are coerced into null consts *(commit by [@disintegrator](https://github.com/disintegrator))* ## core: 4.8.1 - 2024-07-09 ### 🐛 Bug Fixes - Use 0666 file mode for writing configuration and lock files *(commit by [@bflad](https://github.com/bflad))* ## unions: 2.82.8 - 2024-06-10 ### 🐛 Bug Fixes - ensure union type definitions define types in a way compatible with multiple python versions *(commit by [@TristanSpeakEasy](https://github.com/tristanspeakeasy))* ## globalServerURLs: 2.82.2 - 2024-03-06 ### 🔧 Chores - expand server selection test coverage *(commit by [@2ynn](https://github.com/2ynn))* Co-authored-by: speakeasybot <bot@speakeasyapi.dev>
The autogenerated example is inconsistent with our hosted docs. Remove this section and copy the usage snippet from the docs. Also, add a note for the new page range feature.
For easier testing of local API changes. Run the server at port 5000, and then use `make client-generate-local` to see how the SDK changes.
The default server url is changing to serverless. Therefore, if you get a 401, we should suggest that you meant to use the free api. Also, bump the minor version in anticipation of the url change.
> [!IMPORTANT] > Linting report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/linting-report/4ef3c768cbc0fbb7b563d54428c392bd> > OpenAPI Change report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/changes-report/16a148a70bdf953ba9878e409aaf5b40> # SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.348.1 (2.380.1) https://github.com/speakeasy-api/speakeasy ## OpenAPI Change Summary ``` ├─┬Info │ └──[🔀] version (4:14) └─┬Servers └──[➕] servers (6:7) ``` | Document Element | Total Changes | Breaking Changes | |------------------|---------------|------------------| | info | 1 | 0 | | servers | 1 | 0 | Co-authored-by: speakeasybot <bot@speakeasyapi.dev>
This PR: - adds `split_pdf_page_allow_failed` parameter
> [!IMPORTANT] > Linting report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/linting-report/71b44e40cb066dc55aa5e45b3e48d59d> > OpenAPI Change report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/changes-report/e55599ebf9e42a00dff8584ddcb9e210> # SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.349.0 (2.382.0) https://github.com/speakeasy-api/speakeasy ## OpenAPI Change Summary ``` └─┬Components └─┬partition_parameters ├──[➕] properties (287:17) ├─┬hi_res_model_name │ ├─┬ANYOF │ │ ├──[🔀] type (128:33)❌ │ │ └──[➖] default (127:36)❌ │ └─┬ANYOF │ └──[🔀] type (126:33)❌ ├─┬similarity_threshold │ └─┬ANYOF │ └──[➖] default (264:36)❌ ├─┬starting_page_number │ ├─┬ANYOF │ │ ├──[🔀] type (175:33)❌ │ │ └──[➖] default (174:36)❌ │ └─┬ANYOF │ └──[🔀] type (173:33)❌ ├─┬combine_under_n_chars │ ├─┬ANYOF │ │ ├──[🔀] type (222:33)❌ │ │ └──[➖] default (221:36)❌ │ └─┬ANYOF │ └──[🔀] type (220:33)❌ ├─┬gz_uncompressed_content_type │ ├─┬ANYOF │ │ ├──[🔀] type (121:33)❌ │ │ └──[➖] default (120:36)❌ │ └─┬ANYOF │ └──[🔀] type (119:33)❌ ├─┬include_orig_elements │ ├─┬ANYOF │ │ └──[🔀] type (227:33)❌ │ └─┬ANYOF │ ├──[🔀] type (229:33)❌ │ └──[➖] default (228:36)❌ ├─┬max_characters │ ├─┬ANYOF │ │ ├──[🔀] type (236:33)❌ │ │ └──[➖] default (235:36)❌ │ └─┬ANYOF │ └──[🔀] type (234:33)❌ ├─┬new_after_n_chars │ ├─┬ANYOF │ │ ├──[🔀] type (248:33)❌ │ │ └──[➖] default (247:36)❌ │ └─┬ANYOF │ └──[🔀] type (246:33)❌ ├─┬chunking_strategy │ ├─┬ANYOF │ │ ├──[➕] enum (206:31) │ │ ├──[➕] enum (207:31) │ │ ├──[➕] enum (208:31) │ │ ├──[➕] enum (209:31) │ │ └──[🔀] type (204:33)❌ │ └─┬ANYOF │ ├──[➖] enum (206:31)❌ │ ├──[➖] enum (207:31)❌ │ ├──[➖] enum (208:31)❌ │ ├──[➖] enum (209:31)❌ │ ├──[🔀] type (211:33)❌ │ └──[➖] default (210:36)❌ └─┬encoding ├─┬ANYOF │ ├──[🔀] type (107:33)❌ │ └──[➖] default (106:36)❌ └─┬ANYOF └──[🔀] type (105:33)❌ ``` | Document Element | Total Changes | Breaking Changes | |------------------|---------------|------------------| | components | 37 | 32 | Co-authored-by: speakeasybot <bot@speakeasyapi.dev>
PDF page splitting uses asyncio but the SDK is not async. Therefore, we had to manage our own event loop, which can lead to issues in other event loop contexts. Uvloop is one context that does not allow us to use nested event loops. When we find ourselves in a uvloop.Loop, we have to fallback to non splitting mode. #135 will make the whole SDK async so we don't have to hack this. Closes #133
> [!IMPORTANT] > Linting report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/linting-report/38839b02490ec839f218de5cb3d8322b> > OpenAPI Change report available at: <https://app.speakeasyapi.dev/org/unstructured/unstructured5xr/changes-report/ce3f859bd05781d66a86db427a72746f> # SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.352.1 (2.385.1) https://github.com/speakeasy-api/speakeasy ## OpenAPI Change Summary ``` └─┬Info └──[🔀] version (4:14) ``` | Document Element | Total Changes | Breaking Changes | |------------------|---------------|------------------| | info | 1 | 0 | Co-authored-by: speakeasybot <bot@speakeasyapi.dev> Co-authored-by: Austin Walker <awalk89@gmail.com>
Needed to do this manually to resolve some conflicts in our autogen ignore files. Steps: - Remove all endpoint python files from `.genignore` - Run speakeasy generate - Stage and commit the `HookContext` diffs in these files - Revert the other autogen changes that we aren't ready for
This will trigger our publish workflow. The timestamps, etc, dom't matter, we just need to add a block with the new release version.
# The Issue We discovered a behavior change in the Python SDK after we merged the platform/serverless api specs. All of a sudden, the SDK level server_url param silently stopped working, and we were forced to set custom urls per function. ``` # The passed url is ignored and we go to our default Serverless URL. # You suddenly get an invalid api key error if you expected to talk to, e.g, freemium s = UnstructuredClient(server_url="my_own_url") s.general.partition() # This does the right thing s = UnstructuredClient() s.general.partition(server_url="my_own_url") ``` We had to patch some generated code in order to keep backwards compatibility, and set the SDK level `server_url` the way we used to. This works! However, any file in `.genignore` will not get updated and eventually the SDK fails to generate because of drift. The better solution is to figure out why the generated code changed on us, and fix it "upstream". # The Fix Our SDK points to two services - the workflow API at `platform.unstructuredapp.io` and the older partition endpoint at `api.unstructuredapp.io`. We merged these two openapi specs in order to generate a combined SDK, but this meant that urls could only be resolved per operation. There is no longer a global default, so a statement like `UnstructuredClient(server_url="my_own_url")` is ambiguous. The solution to all this is to go back to one default server - the platform url. The partition url is just one endpoint so it's much easier to handle as a one off. This restores the `server_url` behavior we had, without us having to fight with the autogenerated code. # The Diff This pr is huge because I regenerated the relevant files. There are only a few changes that drive all of it: ## `overlay_client.yaml` After merging the two `openapi.yaml` specs, remove all child `servers` blocks and just keep one global config. Now every endpoint is a part of `platform.unstructuredapp.io` ## `general.py` This is now the only custom patch. In the `partition` (and `partition_async`) call, we need to swap to the right url. We do this only if the user has not already changed the default. ## `destinations.py`, `jobs.py`, etc These are the other endpoint files that are no longer patched. After regenerating, you can see the `base_url` logic cleans itself up. Either the user passed a `server_url` in the call, or we fetch the globally configured url. ## `test_server_urls.py` Made some tweaks to these test cases. This locks in our compatibility and asserts that we always use the right url. Users can set a custom url at the SDK init, or at the operation. We need to cover this behavior within `general.partition` since this has the special logic. Otherwise, make sure both url approaches work for any of the other platform operations.
We have a generate failure because we've pulled `general.py` out for some custom changes. (See #270) This will hit the occasional bump as new code is added to this file upstream. The steps for fixing this: - Remove `general.py` from .genignore - Run speakeasy generate - Stage and commit the autogenerated changes, working around our custom code - Commit without the `.genignore` change I set `gen.yaml` to version 0.41.0. This will cause the next generate job to propagate the new version and publish it.
> [!IMPORTANT] > Linting report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/linting-report/2dce73368bb6e91d3ce8672cc3c3f3b3> > OpenAPI Change report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/changes-report/176ed6a25c9581cb9376db90c0e11b81> # SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.589.0 (2.664.0) https://github.com/speakeasy-api/speakeasy ## Versioning Version Bump Type: [minor] - 🤖 (automated) ## OpenAPI Change Summary ``` ├─┬Info │ └──[🔀] version (18:16) ├─┬Paths │ ├──[➖] path (1:29015)❌ │ ├──[➕] path (1859:5) │ ├─┬/api/v1/workflows/{workflow_id} │ │ ├─┬GET │ │ │ └─┬Servers │ │ │ └──[➖] servers (1:25855)❌ │ │ ├─┬PUT │ │ │ └─┬Servers │ │ │ └──[➖] servers (1:26933)❌ │ │ └─┬DELETE │ │ └─┬Servers │ │ └──[➖] servers (1:27788)❌ │ ├─┬/api/v1/destinations/{destination_id}/connection-check │ │ ├─┬GET │ │ │ └─┬Servers │ │ │ └──[➖] servers (1:6473)❌ │ │ └─┬POST │ │ └─┬Servers │ │ └──[➖] servers (1:7527)❌ │ ├─┬/api/v1/destinations/ │ │ ├─┬GET │ │ │ └─┬Servers │ │ │ └──[➖] servers (1:1286)❌ │ │ └─┬POST │ │ └─┬Servers │ │ └──[➖] servers (1:2281)❌ │ ├─┬/api/v1/sources/{source_id}/connection-check │ │ ├─┬GET │ │ │ └─┬Servers │ │ │ └──[➖] servers (1:13608)❌ │ │ └─┬POST │ │ └─┬Servers │ │ └──[➖] servers (1:14639)❌ │ ├─┬/api/v1/jobs/{job_id} │ │ └─┬GET │ │ └─┬Servers │ │ └──[➖] servers (1:16737)❌ │ ├─┬/api/v1/workflows/ │ │ ├─┬GET │ │ │ └─┬Servers │ │ │ └──[➖] servers (1:23938)❌ │ │ └─┬POST │ │ └─┬Servers │ │ └──[➖] servers (1:24886)❌ │ ├─┬/api/v1/jobs/{job_id}/details │ │ └─┬GET │ │ └─┬Servers │ │ └──[➖] servers (1:19944)❌ │ ├─┬/api/v1/jobs/ │ │ └─┬GET │ │ └─┬Servers │ │ └──[➖] servers (1:15813)❌ │ ├─┬/api/v1/workflows/{workflow_id}/run │ │ └─┬POST │ │ └─┬Servers │ │ └──[➖] servers (1:28877)❌ │ ├─┬/general/v0/general │ │ └─┬POST │ │ └─┬Servers │ │ └──[➖] servers (1:32149)❌ │ ├─┬/api/v1/jobs/{job_id}/download │ │ └─┬GET │ │ └─┬Servers │ │ └──[➖] servers (1:18991)❌ │ ├─┬/api/v1/sources/ │ │ ├─┬GET │ │ │ └─┬Servers │ │ │ └──[➖] servers (1:8600)❌ │ │ └─┬POST │ │ └─┬Servers │ │ └──[➖] servers (1:9565)❌ │ ├─┬/api/v1/sources/{source_id} │ │ ├─┬GET │ │ │ └─┬Servers │ │ │ └──[➖] servers (1:10545)❌ │ │ ├─┬PUT │ │ │ └─┬Servers │ │ │ └──[➖] servers (1:11619)❌ │ │ └─┬DELETE │ │ └─┬Servers │ │ └──[➖] servers (1:12502)❌ │ ├─┬/api/v1/jobs/{job_id}/failed-files │ │ └─┬GET │ │ └─┬Servers │ │ └──[➖] servers (1:20899)❌ │ ├─┬/api/v1/users/secrets │ │ └─┬POST │ │ └─┬Servers │ │ └──[➖] servers (1:30692)❌ │ ├─┬/api/v1/destinations/{destination_id} │ │ ├─┬GET │ │ │ └─┬Servers │ │ │ └──[➖] servers (1:3306)❌ │ │ ├─┬PUT │ │ │ └─┬Servers │ │ │ └──[➖] servers (1:4420)❌ │ │ └─┬DELETE │ │ └─┬Servers │ │ └──[➖] servers (1:5322)❌ │ └─┬/api/v1/jobs/{job_id}/cancel │ └─┬POST │ └─┬Servers │ └──[➖] servers (1:17596)❌ └─┬Components ├──[➖] schemas (1:75109)❌ └──[➕] schemas (3726:40) ``` | Document Element | Total Changes | Breaking Changes | |------------------|---------------|------------------| | info | 1 | 0 | | paths | 30 | 29 | | components | 2 | 1 | ## PYTHON CHANGELOG ## examples: 3.0.2 - 2025-07-07 ### 🐛 Bug Fixes - Fix missing title and description in main usage examples when using x-speakeasy-globals *(commit by [@kanwardeep](https://github.com/Kanwardeep))* ## core: 5.19.4 - 2025-07-02 ### 🐛 Bug Fixes - ensure utils import is always added when globals are present in sdk.py template *(commit by [@AshGodfrey](https://github.com/AshGodfrey))* Co-authored-by: speakeasybot <bot@speakeasyapi.dev>
Co-authored-by: Austin Walker <austin@unstructured.io>
We have a number of integration tests that just hit SaaS with different vlm strategies. None of these are particular to testing the SDK layer, and just serve to slow us down. This is a common pattern in here, expect more test cleanup to come!
> [!IMPORTANT] > Linting report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/linting-report/5ee18d961db95d5e148e5948487d67a6> > OpenAPI Change report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/changes-report/a853e3ed6b865395f7380b5640669991> # SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.595.0 (2.670.1) https://github.com/speakeasy-api/speakeasy ## Versioning Version Bump Type: [minor] - 🤖 (automated) ## OpenAPI Change Summary ``` ├─┬Info │ └──[🔀] version (18:16) ├─┬Paths │ ├──[➖] path (1:25967)❌ │ └──[➖] path (1:25193)❌ └─┬Components ├──[➖] schemas (1:50631)❌ └──[➖] schemas (1:50177)❌ ``` | Document Element | Total Changes | Breaking Changes | |------------------|---------------|------------------| | info | 1 | 0 | | paths | 2 | 2 | | components | 2 | 2 | ## PYTHON CHANGELOG ## core: 5.19.5 - 2025-07-24 ### 🔧 Chores - make usage snippets parsable *(commit by [@ThomasRooney](https://github.com/ThomasRooney))* --------- Co-authored-by: speakeasybot <bot@speakeasyapi.dev> Co-authored-by: Austin Walker <austin@unstructured.io> Co-authored-by: Austin Walker <awalk89@gmail.com>
> [!IMPORTANT] > Linting report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/linting-report/036f3355bedfe111d36dd97c7efaab52> > OpenAPI Change report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/changes-report/0d8f059d854cec546287baba871b4bba> # SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.598.0 (2.674.1) https://github.com/speakeasy-api/speakeasy ## Versioning Version Bump Type: [patch] - 🤖 (automated) ## PYTHON CHANGELOG ## core: 5.19.6 - 2025-08-01 ### 🐛 Bug Fixes - potential issue referencing models before declaration *(commit by [@mfbx9da4](https://github.com/mfbx9da4))* Co-authored-by: speakeasybot <bot@speakeasyapi.dev>
Use the openapi overlay file to add `x-speakeasy-unknown-values` to the
models for `SourceConnectorType` and `DestinationConnectorType`. This
allows the client to send any string outside of the enum definitions.
This provides forward compatibility in the client, we can create sources
with types that don't exist yet, without requiring a new client version.
Example:
This throws an error that the type is not in our enum. Now, it just
warns that this is an unknown value, but it's sent to the server anyway.
```
res = unstructured_client.sources.create_source(
request={
"create_source_connector": {
"name": "My fancy new source",
"type": "future_source_type",
"config": {
...
...
}
}
}
)
```
The change will take effect when the client regenerates and uses the new
overlay config. By setting
`gen.yaml` to `0.42.2`, the new client will propagate this version
change.
> [!IMPORTANT] > Linting report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/linting-report/0a5f5e6fa72220658eebd2029258b3f1> > OpenAPI Change report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/changes-report/ab67ee934e789814b24e905291d94612> # SDK update Based on: - OpenAPI Doc - Speakeasy CLI 1.598.3 (2.674.3) https://github.com/speakeasy-api/speakeasy ## Versioning Version Bump Type: [patch] - 🤖 (automated) ## PYTHON CHANGELOG ## core: 5.19.7 - 2025-08-06 ### 🐛 Bug Fixes - add return type hint for methods returning None *(commit by [@AshGodfrey](https://github.com/AshGodfrey))* Co-authored-by: speakeasybot <bot@speakeasyapi.dev>
> [!IMPORTANT] > Linting report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/linting-report/7a385740ded90a1b1df8b2685f4e1181> > OpenAPI Change report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/changes-report/eaa1b1b1959d9dbe3784727572638377> # SDK update ## Versioning Version Bump Type: [patch] - 🤖 (automated) <details> <summary>OpenAPI Change Summary</summary> ``` └─┬Components ├─┬CreateDestinationConnector │ └─┬config │ └──[➕] anyOf (2768:15) ├─┬DestinationConnectorInformation │ └─┬config │ └──[➕] anyOf (3428:15) ├─┬UpdateSourceConnector │ └─┬config │ └──[➕] anyOf (6675:15) ├─┬UpdateDestinationConnector │ └─┬config │ └──[➕] anyOf (6597:15) ├─┬SourceConnectorInformation │ └─┬config │ └──[➕] anyOf (6467:15) └─┬CreateSourceConnector └─┬config └──[➕] anyOf (2855:15) ``` | Document Element | Total Changes | Breaking Changes | |------------------|---------------|------------------| | components | 6 | 0 | </details> ## PYTHON CHANGELOG No relevant generator changes Based on [Speakeasy CLI](https://github.com/speakeasy-api/speakeasy) 1.601.0 --------- Co-authored-by: speakeasybot <bot@speakeasyapi.dev> Co-authored-by: Austin Walker <austin@unstructured.io>
- This is done by updating the version constraint in `gen.yaml` and then running `make client-generate-sdk` - I've removed other new changes in order to keep this pr small - By manually touching `RELEASES.md` we'll trigger a package release in this repo
> [!IMPORTANT] > Linting report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/linting-report/2db2473a26de998125d5cf53ccfaf5e1> > OpenAPI Change report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/changes-report/1b820d9c7c5a64aedcba48f48cbf1b8b> # SDK update ## Versioning Version Bump Type: [patch] - 🤖 (automated) ## Python SDK Changes: * `unstructured_client.workflows.get_workflow()`: `response.reprocess_all` **Changed** **Breaking**⚠️ * `unstructured_client.sources.create_source()`: * `request.create_source_connector.config` **Changed** **Breaking**⚠️ * `response.config.[snowflake_source_connector_config].schema` **Changed** * `unstructured_client.general.partition()`: * `request.partition_parameters` **Changed** **Breaking**⚠️ * `unstructured_client.destinations.create_destination()`: * `request.create_destination_connector.config` **Changed** **Breaking**⚠️ * `response.config` **Changed** * `unstructured_client.workflows.update_workflow()`: * `request.update_workflow.template_id` **Added** * `response.reprocess_all` **Changed** **Breaking**⚠️ * `unstructured_client.workflows.list_workflows()`: `response.[].reprocess_all` **Changed** **Breaking**⚠️ * `unstructured_client.destinations.update_destination()`: * `request.update_destination_connector.config` **Changed** **Breaking**⚠️ * `response.config` **Changed** * `unstructured_client.workflows.create_workflow()`: * `request.create_workflow.template_id` **Added** * `response.reprocess_all` **Changed** **Breaking**⚠️ * `unstructured_client.sources.update_source()`: * `request.update_source_connector.config` **Changed** **Breaking**⚠️ * `response.config.[snowflake_source_connector_config].schema` **Changed** * `unstructured_client.destinations.list_destinations()`: * `request.destination_type` **Changed** * `response.[].config` **Changed** * `unstructured_client.sources.list_sources()`: `response.[].config.[snowflake_source_connector_config].schema` **Changed** * `unstructured_client.sources.get_source()`: `response.config.[snowflake_source_connector_config].schema` **Changed** * `unstructured_client.templates.get_template()`: **Added** * `unstructured_client.jobs.list_jobs()`: `response.[].output_node_files.[]` **Changed** * `unstructured_client.jobs.download_job_output()`: * `request.node_id` **Changed** * `unstructured_client.jobs.create_job()`: **Added** * `unstructured_client.jobs.get_job()`: `response.output_node_files.[]` **Changed** * `unstructured_client.workflows.run_workflow()`: `response.output_node_files.[]` **Changed** * `unstructured_client.destinations.get_destination()`: `response.config` **Changed** * `unstructured_client.templates.list_templates()`: **Added** <details> <summary>OpenAPI Change Summary</summary> ``` ├─┬Info │ └──[🔀] version (18:16) ├─┬Paths │ ├──[➕] path (1356:5) │ ├──[➕] path (1410:5) │ ├─┬/api/v1/jobs/{job_id}/download │ │ └─┬GET │ │ └─┬Parameters │ │ ├──[🔀] description (1181:28) │ │ ├──[🔀] required (1182:25)❌ │ │ └─┬Schema │ │ ├──[➖] type (1:15945)❌ │ │ ├──[➖] format (1:15985)❌ │ │ ├──[🔀] description (1194:30) │ │ ├──[➕] anyOf (1185:17) │ │ └──[➕] anyOf (1189:17) │ └─┬/api/v1/jobs/ │ └──[➕] post (971:15) └─┬Components ├──[➕] schemas (2511:26) ├──[➕] schemas (2271:42) ├──[➕] schemas (2328:47) ├──[➕] schemas (6970:23) ├──[➕] schemas (6936:27) ├──[➕] schemas (6894:25) ├─┬partition_parameters │ ├──[➕] properties (7981:11) │ ├─┬table_ocr_agent │ │ ├──[➕] examples (7808:15) │ │ ├──[➕] examples (7809:15) │ │ ├──[➕] examples (7810:15) │ │ ├──[➕] enum (7814:15) │ │ ├──[➕] enum (7815:15) │ │ ├──[➕] enum (7816:15) │ │ ├──[➕] enum (7817:15) │ │ ├──[➕] enum (7818:15) │ │ ├──[➕] enum (7819:15) │ │ ├──[➕] type (7806:21)❌ │ │ ├──[🔀] title (7812:22) │ │ ├──[🔀] description (7821:28) │ │ ├──[🔀] default (7822:24)❌ │ │ ├──[➖] anyOf (1:99430)❌ │ │ ├──[➖] anyOf (1:99467)❌ │ │ └─┬Extensions │ │ └──[➕] x-speakeasy-unknown-values (7823:43) │ ├─┬hi_res_model_name │ │ ├──[🔀] title (7726:22) │ │ └──[🔀] description (7727:28) │ ├─┬multipage_sections │ │ ├──[➖] type (1:101553)❌ │ │ ├──[🔀] title (7934:22) │ │ ├──[🔀] description (7935:28) │ │ ├──[➖] default (1:101718)❌ │ │ ├──[➕] anyOf (7926:15) │ │ └──[➕] anyOf (7930:15) │ ├─┬split_pdf_page_range │ │ ├──[➖] items (1:105310)❌ │ │ ├──[🔀] type (8047:21)❌ │ │ ├──[🔀] title (8048:22) │ │ ├──[➖] maxItems (1:105358)❌ │ │ ├──[➖] minItems (1:105343)❌ │ │ ├──[🔀] description (8049:28) │ │ ├──[➕] default (8050:24)❌ │ │ └──[➖] example (1:105372) │ ├─┬gz_uncompressed_content_type │ │ ├──[➕] examples (7667:15) │ │ ├──[➕] type (7665:21)❌ │ │ ├──[➖] title (1:95995) │ │ ├──[➕] format (7672:23)❌ │ │ ├──[🔀] description (7673:28) │ │ ├──[➖] anyOf (1:95930)❌ │ │ └──[➖] anyOf (1:95967)❌ │ ├─┬new_after_n_chars │ │ ├──[➕] type (7938:21)❌ │ │ ├──[🔀] title (7939:22) │ │ ├──[🔀] description (7940:28) │ │ ├──[➕] default (7941:24)❌ │ │ ├──[➖] anyOf (1:101757)❌ │ │ └──[➖] anyOf (1:101795)❌ │ ├─┬split_pdf_cache_tmp_data │ │ ├──[🔀] title (8073:22) │ │ └──[🔀] description (8074:28) │ ├─┬vlm_model │ │ ├──[➖] enum (1:99128)❌ │ │ ├──[➖] enum (1:99173)❌ │ │ ├──[➖] enum (1:99257)❌ │ │ ├──[➖] enum (1:99295)❌ │ │ ├──[➖] enum (1:98875)❌ │ │ ├──[➖] enum (1:98905)❌ │ │ ├──[➖] enum (1:98988)❌ │ │ ├──[➖] enum (1:99036)❌ │ │ ├──[➖] enum (1:99084)❌ │ │ ├──[➖] enum (1:99219)❌ │ │ ├──[➖] enum (1:98845)❌ │ │ ├──[➖] enum (1:98915)❌ │ │ ├──[➖] enum (1:98933)❌ │ │ └──[➖] enum (1:98960)❌ │ ├─┬pdfminer_char_margin │ │ ├──[➕] type (7778:21)❌ │ │ ├──[🔀] title (7779:22) │ │ ├──[🔀] description (7780:28) │ │ ├──[➕] default (7781:24)❌ │ │ ├──[➖] anyOf (1:103676)❌ │ │ └──[➖] anyOf (1:103713)❌ │ ├─┬vlm_model_provider │ │ ├──[➕] examples (7829:15) │ │ └──[➕] examples (7830:15) │ ├─┬split_pdf_concurrency_level │ │ ├──[🔀] type (8087:21)❌ │ │ ├──[🔀] title (8085:22) │ │ ├──[🔀] description (8086:28) │ │ └──[🔀] default (8088:24)❌ │ ├─┬strategy │ │ ├──[➖] examples (1:98182) │ │ ├──[➖] examples (1:98190) │ │ ├──[➖] examples (1:98200) │ │ ├──[➖] enum (1:98246)❌ │ │ ├──[➖] enum (1:98256)❌ │ │ ├──[➖] enum (1:98264)❌ │ │ ├──[➖] enum (1:98276)❌ │ │ ├──[➖] enum (1:98287)❌ │ │ ├──[➖] enum (1:98238)❌ │ │ ├──[➖] type (1:98159)❌ │ │ ├──[🔀] title (7802:22) │ │ ├──[🔀] description (7803:28) │ │ ├──[➖] default (1:98420)❌ │ │ ├──[➕] anyOf (7794:15) │ │ ├──[➕] anyOf (7798:15) │ │ └─┬Extensions │ │ └──[➖] x-speakeasy-unknown-values (1:98460)❌ │ ├─┬overlap_all │ │ ├──[🔀] type (7957:21)❌ │ │ ├──[🔀] title (7958:22) │ │ ├──[🔀] description (7959:28) │ │ └──[🔀] default (7960:24)❌ │ ├─┬pdfminer_line_overlap │ │ ├──[🔀] title (8029:22) │ │ └──[🔀] description (8030:28) │ ├─┬pdfminer_line_margin │ │ ├──[🔀] title (8016:22) │ │ └──[🔀] description (8017:28) │ ├─┬ocr_languages │ │ ├──[➖] items (1:96802)❌ │ │ ├──[➖] type (1:96784)❌ │ │ ├──[🔀] title (7953:22) │ │ ├──[🔀] description (7954:28) │ │ ├──[➖] default (1:96963)❌ │ │ ├──[➕] anyOf (7945:15) │ │ └──[➕] anyOf (7949:15) │ ├─┬max_characters │ │ ├──[➕] items (7750:22)❌ │ │ ├──[➕] type (7749:21)❌ │ │ ├──[🔀] title (7753:22) │ │ ├──[🔀] description (7754:28) │ │ ├──[➕] default (7755:24)❌ │ │ ├──[➖] anyOf (1:101309)❌ │ │ └──[➖] anyOf (1:101347)❌ │ ├─┬include_page_breaks │ │ ├──[➖] type (1:96332)❌ │ │ ├──[🔀] title (7921:22) │ │ ├──[🔀] description (7922:28) │ │ ├──[➖] default (1:96493)❌ │ │ ├──[➕] anyOf (7913:15) │ │ └──[➕] anyOf (7917:15) │ ├─┬split_pdf_page │ │ ├──[🔀] type (8069:21)❌ │ │ ├──[🔀] title (8067:22) │ │ ├──[🔀] description (8068:28) │ │ └──[🔀] default (8070:24)❌ │ ├─┬overlap │ │ ├──[➕] enum (7770:15) │ │ ├──[➕] enum (7771:15) │ │ ├──[🔀] type (7767:21)❌ │ │ ├──[🔀] title (7768:22) │ │ ├──[🔀] description (7773:28) │ │ ├──[🔀] default (7774:24)❌ │ │ └─┬Extensions │ │ └──[➕] x-speakeasy-unknown-values (7775:43) │ ├─┬languages │ │ ├──[➖] items (1:96541)❌ │ │ ├──[🔀] type (7988:21)❌ │ │ ├──[🔀] title (7989:22) │ │ ├──[🔀] description (7990:28) │ │ └──[🔀] default (7991:24)❌ │ ├─┬skip_infer_table_types │ │ ├──[➖] items (1:97662)❌ │ │ ├──[➖] type (1:97644)❌ │ │ ├──[🔀] title (7978:22) │ │ ├──[🔀] description (7979:28) │ │ ├──[➖] default (1:97822)❌ │ │ ├──[➕] anyOf (7970:15) │ │ └──[➕] anyOf (7974:15) │ ├─┬include_slide_notes │ │ ├──[🔀] title (7744:22) │ │ ├──[🔀] description (7745:28) │ │ └──[🔀] default (7746:24)❌ │ ├─┬pdfminer_word_margin │ │ ├──[🔀] title (8003:22) │ │ ├──[🔀] description (8004:28) │ │ └──[➖] default (1:104628)❌ │ ├─┬split_pdf_allow_failed │ │ ├──[➕] items (7785:22)❌ │ │ ├──[🔀] type (7784:21)❌ │ │ ├──[🔀] title (7788:22) │ │ ├──[🔀] description (7789:28) │ │ └──[🔀] default (7790:24)❌ │ ├─┬include_orig_elements │ │ ├──[🔀] title (7739:22) │ │ ├──[🔀] description (7740:28) │ │ └─┬ANYOF │ │ └──[🔀] type (7732:25)❌ │ ├─┬extract_image_block_types │ │ ├──[➖] items (1:95673)❌ │ │ ├──[➖] type (1:95655)❌ │ │ ├──[🔀] title (7704:22) │ │ ├──[🔀] description (7705:28) │ │ ├──[➖] default (1:95882)❌ │ │ ├──[➕] anyOf (7696:15) │ │ └──[➕] anyOf (7700:15) │ ├─┬pdf_infer_table_structure │ │ ├──[🔀] title (7964:22) │ │ ├──[🔀] description (7965:28) │ │ └──[🔀] default (7966:24)❌ │ ├─┬similarity_threshold │ │ ├──[🔀] title (8042:22) │ │ ├──[🔀] description (8043:28) │ │ └──[➕] default (8044:24)❌ │ ├─┬split_pdf_cache_tmp_data_dir │ │ ├──[🔀] type (8081:21)❌ │ │ ├──[🔀] title (8079:22) │ │ ├──[🔀] description (8080:28) │ │ └──[🔀] default (8082:24)❌ │ ├─┬unique_element_ids │ │ ├──[➖] type (1:99696)❌ │ │ ├──[🔀] title (7864:22) │ │ ├──[🔀] description (7865:28) │ │ ├──[🔀] default (7866:24)❌ │ │ ├──[➕] anyOf (7856:15) │ │ └──[➕] anyOf (7860:15) │ ├─┬output_format │ │ ├──[➖] enum (1:97039)❌ │ │ ├──[➖] enum (1:97059)❌ │ │ ├──[➕] items (7759:22)❌ │ │ ├──[🔀] type (7758:21)❌ │ │ ├──[🔀] title (7762:22) │ │ ├──[🔀] description (7763:28) │ │ ├──[🔀] default (7764:24)❌ │ │ └─┬Extensions │ │ └──[➖] x-speakeasy-unknown-values (1:97259)❌ │ ├─┬files │ │ ├──[➖] examples (1:94708) │ │ ├──[➕] items (7709:22)❌ │ │ ├──[🔀] type (7708:21)❌ │ │ ├──[➕] title (7712:22) │ │ ├──[➖] format (1:94909)❌ │ │ ├──[🔀] description (7713:28) │ │ └──[➕] default (7714:24)❌ │ ├─┬starting_page_number │ │ ├──[➕] items (8056:22)❌ │ │ ├──[➕] type (8053:21)❌ │ │ ├──[🔀] title (8054:22) │ │ ├──[➕] maxItems (8060:25)❌ │ │ ├──[➕] minItems (8059:25)❌ │ │ ├──[🔀] description (8055:28) │ │ ├──[➕] example (8061:24) │ │ ├──[➖] anyOf (1:97862)❌ │ │ └──[➖] anyOf (1:97900)❌ │ ├─┬encoding │ │ ├──[➕] type (7982:21)❌ │ │ ├──[🔀] title (7983:22) │ │ ├──[🔀] description (7984:28) │ │ ├──[➕] default (7985:24)❌ │ │ ├──[➖] anyOf (1:95455)❌ │ │ └──[➖] anyOf (1:95492)❌ │ └─┬xml_keep_tags │ ├──[➕] examples (7828:15) │ ├──[➕] examples (7829:15) │ ├──[➕] examples (7830:15) │ ├──[➕] enum (7839:15) │ ├──[➕] enum (7840:15) │ ├──[➕] enum (7834:15) │ ├──[➕] enum (7835:15) │ ├──[➕] enum (7836:15) │ ├──[➕] enum (7837:15) │ ├──[➕] enum (7838:15) │ ├──[🔀] type (7826:21)❌ │ ├──[🔀] title (7832:22) │ ├──[🔀] description (7842:28) │ ├──[➖] default (1:100205)❌ │ └─┬Extensions │ └──[➕] x-speakeasy-unknown-values (7843:43) ├─┬DestinationConnectorInformation │ └─┬config │ ├──[➕] anyOf (3778:15) │ ├─┬ANYOF │ │ └──[🔀] $ref (2272:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (2129:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (2225:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (2795:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (3553:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (3367:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (3639:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (3890:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (3968:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (4770:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (4970:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5105:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5152:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5248:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5539:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5596:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5863:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5806:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5998:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6446:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (7318:17)❌ │ └─┬ANYOF │ └──[🔀] $ref (4171:17)❌ ├─┬AstraDBConnectorConfigInput │ ├──[➕] properties (2198:11) │ ├─┬token │ │ ├──[➖] type (1:27451)❌ │ │ ├──[🔀] title (2189:22) │ │ ├──[➕] default (2190:24)❌ │ │ ├──[➕] anyOf (2182:15) │ │ └──[➕] anyOf (2185:15) │ ├─┬collection_name │ │ ├──[🔀] type (2199:21)❌ │ │ ├──[🔀] title (2200:22) │ │ ├──[➖] pattern (1:27159)❌ │ │ └──[➕] default (2201:24)❌ │ ├─┬keyspace │ │ ├──[➕] type (2212:21)❌ │ │ ├──[🔀] title (2213:22) │ │ ├──[🔀] default (2214:24)❌ │ │ ├──[➖] anyOf (1:27193)❌ │ │ └──[➖] anyOf (1:27213)❌ │ └─┬flatten_metadata │ ├──[🔀] type (2176:21)❌ │ ├──[🔀] title (2177:22) │ ├──[➕] pattern (2178:24)❌ │ └──[➖] default (1:27560)❌ ├─┬WorkflowInformation │ └─┬reprocess_all │ ├──[➕] type (7451:21)❌ │ ├──[➕] default (7453:24)❌ │ ├──[➖] anyOf (1:92249)❌ │ └──[➖] anyOf (1:92270)❌ ├─┬DatabricksVolumesConnectorConfigInput │ └─┬schema │ └──[➕] default (3609:24)❌ ├─┬UpdateWorkflow │ ├──[➕] properties (7235:11) │ ├─┬workflow_type │ │ ├──[➕] title (7233:22) │ │ └─┬ANYOF │ │ └──[🔀] $ref (7223:15)❌ │ └─┬workflow_nodes │ ├──[🔀] title (7244:22) │ └─┬ANYOF │ ├──[➖] items (1:89762)❌ │ └──[🔀] type (7238:25)❌ ├─┬SnowflakeDestinationConnectorConfig │ └──[➕] required (6514:11)❌ ├─┬SnowflakeSourceConnectorConfig │ └──[➕] required (6663:11)❌ ├─┬NodeFileMetadata │ ├──[➕] required (5243:11)❌ │ ├──[➕] required (5244:11)❌ │ ├──[➕] properties (5234:11) │ └──[➕] properties (5230:11) ├─┬SnowflakeDestinationConnectorConfigInput │ └──[➕] required (6586:11)❌ ├─┬CreateDestinationConnector │ └─┬config │ ├──[➕] anyOf (3102:15) │ ├─┬ANYOF │ │ └──[🔀] $ref (2329:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (2173:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (2248:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (2850:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (3596:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (3460:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (3667:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (3916:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (3986:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (4819:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5037:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5128:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5185:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5286:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5567:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5639:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5929:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5834:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6060:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6518:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (7347:17)❌ │ └─┬ANYOF │ └──[🔀] $ref (4239:17)❌ ├─┬SnowflakeSourceConnectorConfigInput │ └──[➕] required (6742:11)❌ ├─┬CreateWorkflow │ ├──[➕] properties (3252:11) │ ├─┬workflow_type │ │ └──[🔀] $ref (3238:29)❌ │ └─┬workflow_nodes │ ├──[🔀] title (3261:22) │ └─┬ANYOF │ ├──[➖] items (1:39512)❌ │ └──[🔀] type (3255:25)❌ ├─┬UpdateDestinationConnector │ └─┬config │ ├──[➕] anyOf (7082:15) │ ├─┬ANYOF │ │ └──[🔀] $ref (2329:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (2173:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (2248:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (2850:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (3596:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (3460:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (3667:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (3916:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (3986:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (4819:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5037:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5128:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5185:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5286:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5567:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5639:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5929:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5834:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6060:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6518:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (7347:17)❌ │ └─┬ANYOF │ └──[🔀] $ref (4239:17)❌ ├─┬DatabricksVDTDestinationConnectorConfigInput │ └─┬schema │ └──[➕] default (3526:24)❌ ├─┬AstraDBConnectorConfig │ ├──[➕] properties (2150:11) │ ├─┬token │ │ ├──[➖] type (1:26879)❌ │ │ ├──[🔀] title (2144:22) │ │ ├──[➕] anyOf (2137:15) │ │ └──[➕] anyOf (2140:15) │ ├─┬collection_name │ │ ├──[🔀] type (2151:21)❌ │ │ ├──[🔀] title (2152:22) │ │ └──[➕] default (2153:24)❌ │ └─┬keyspace │ ├──[➕] type (2132:21)❌ │ ├──[🔀] title (2133:22) │ ├──[➖] anyOf (1:26681)❌ │ └──[➖] anyOf (1:26701)❌ ├─┬WorkflowJobType │ └──[➕] enum (7474:11) └─┬DestinationConnectorType └──[➕] enum (3816:11) ``` | Document Element | Total Changes | Breaking Changes | |------------------|---------------|------------------| | info | 1 | 0 | | paths | 10 | 3 | | components | 356 | 207 | </details> ## PYTHON CHANGELOG No relevant generator changes Based on [Speakeasy CLI](https://github.com/speakeasy-api/speakeasy) 1.601.0 --------- Co-authored-by: speakeasybot <bot@speakeasyapi.dev> Co-authored-by: Jordan Homan <jordan@unstructured.io>
> [!IMPORTANT] > Linting report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/linting-report/16e0c8159aa07f60f346209d2661d329> > OpenAPI Change report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/changes-report/875f5043b32383cb8853d3dfa9fa5717> # SDK update ## Versioning Version Bump Type: [patch] - 🤖 (automated) <details> <summary>OpenAPI Change Summary</summary> ``` └─┬Info └──[🔀] version (18:16) ``` | Document Element | Total Changes | Breaking Changes | |------------------|---------------|------------------| | info | 1 | 0 | </details> ## PYTHON CHANGELOG No relevant generator changes Based on [Speakeasy CLI](https://github.com/speakeasy-api/speakeasy) 1.601.0 --------- Co-authored-by: speakeasybot <bot@speakeasyapi.dev> Co-authored-by: Jordan Homan <jordan@unstructured.io>
> [!IMPORTANT] > Linting report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/linting-report/22e5ea0a87c022e301ddc0b9479c97bc> > OpenAPI Change report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/changes-report/37746683fbc81629886ad5fa3bdcde18> # SDK update ## Versioning Version Bump Type: [patch] - 🤖 (automated) ## Python SDK Changes: * `unstructured_client.destinations.create_destination()`: * `request.create_destination_connector.config.[open_search_connector_config_input]` **Added** * `response.config.[open_search_connector_config]` **Added** * `unstructured_client.destinations.get_destination()`: `response.config.[open_search_connector_config]` **Added** * `unstructured_client.destinations.list_destinations()`: * `request.destination_type` **Changed** * `response.[].config.[open_search_connector_config]` **Added** * `unstructured_client.destinations.update_destination()`: * `request.update_destination_connector.config.[open_search_connector_config_input]` **Added** * `response.config.[open_search_connector_config]` **Added** * `unstructured_client.sources.create_source()`: * `request.create_source_connector.config.[open_search_connector_config_input]` **Added** * `response.config.[open_search_connector_config]` **Added** * `unstructured_client.sources.get_source()`: `response.config.[open_search_connector_config]` **Added** * `unstructured_client.sources.list_sources()`: * `request.source_type` **Changed** * `response.[].config.[open_search_connector_config]` **Added** * `unstructured_client.sources.update_source()`: * `request.update_source_connector.config.[open_search_connector_config_input]` **Added** * `response.config.[open_search_connector_config]` **Added** <details> <summary>OpenAPI Change Summary</summary> ``` └─┬Components ├──[➕] schemas (5420:36) ├──[➕] schemas (5518:41) ├─┬UpdateDestinationConnector │ └─┬config │ ├──[➕] anyOf (7295:15) │ ├─┬ANYOF │ │ └──[🔀] $ref (5519:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5773:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5845:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6135:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6040:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6266:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6724:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (7563:17)❌ │ └─┬ANYOF │ └──[🔀] $ref (4249:17)❌ ├─┬SourceConnectorType │ └──[➕] enum (7091:11) ├─┬UpdateSourceConnector │ └─┬config │ ├──[➕] anyOf (7376:15) │ ├─┬ANYOF │ │ └──[🔀] $ref (5519:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5681:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5950:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6396:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6495:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6597:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6875:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (4444:17)❌ │ └─┬ANYOF │ └──[🔀] $ref (7831:17)❌ ├─┬CreateSourceConnector │ └─┬config │ ├──[➕] anyOf (3195:15) │ ├─┬ANYOF │ │ └──[🔀] $ref (5519:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5681:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5950:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6396:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6495:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6597:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6875:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (4444:17)❌ │ └─┬ANYOF │ └──[🔀] $ref (7831:17)❌ ├─┬DestinationConnectorType │ └──[➕] enum (3840:11) ├─┬DestinationConnectorInformation │ └─┬config │ ├──[➕] anyOf (3787:15) │ ├─┬ANYOF │ │ └──[🔀] $ref (5421:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5745:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5802:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6069:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6012:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6204:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6652:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (7534:17)❌ │ └─┬ANYOF │ └──[🔀] $ref (4181:17)❌ ├─┬CreateDestinationConnector │ └─┬config │ ├──[➕] anyOf (3105:15) │ ├─┬ANYOF │ │ └──[🔀] $ref (5519:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5773:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (5845:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6135:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6040:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6266:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (6724:17)❌ │ ├─┬ANYOF │ │ └──[🔀] $ref (7563:17)❌ │ └─┬ANYOF │ └──[🔀] $ref (4249:17)❌ └─┬SourceConnectorInformation └─┬config ├──[➕] anyOf (7041:15) ├─┬ANYOF │ └──[🔀] $ref (5421:17)❌ ├─┬ANYOF │ └──[🔀] $ref (5617:17)❌ ├─┬ANYOF │ └──[🔀] $ref (5888:17)❌ ├─┬ANYOF │ └──[🔀] $ref (6329:17)❌ ├─┬ANYOF │ └──[🔀] $ref (6464:17)❌ ├─┬ANYOF │ └──[🔀] $ref (6543:17)❌ ├─┬ANYOF │ └──[🔀] $ref (6796:17)❌ ├─┬ANYOF │ └──[🔀] $ref (4324:17)❌ └─┬ANYOF └──[🔀] $ref (7784:17)❌ ``` | Document Element | Total Changes | Breaking Changes | |------------------|---------------|------------------| | components | 64 | 54 | </details> ## PYTHON CHANGELOG No relevant generator changes Based on [Speakeasy CLI](https://github.com/speakeasy-api/speakeasy) 1.601.0 Co-authored-by: speakeasybot <bot@speakeasyapi.dev>
Resolves CVE-2025-66471 and CVE-2025-66418 <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Addresses dependency security and stability updates. > > - **Upgrade** `urllib3` from `2.5.0` to `2.6.2` in `poetry.lock` (includes extras changes) > - **Bump** package version to `0.42.7` in `pyproject.toml` > - **Test fix**: add Python 3.9 workaround in `_test_contract/conftest.py` to eagerly import `unstructured_client.utils.retries` to avoid a lazy-import race `KeyError` > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit e46c9cb. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>
<!-- CURSOR_SUMMARY --> > [!NOTE] > **Upgrade & regen** > > - Bumps Python SDK to `0.42.7` and updates `__user_agent__`/locks (`gen.yaml`, `.speakeasy/*.lock`) > - Regenerates models/docs; minor union order tweaks across destination models > > **AstraDB connector models** > > - Adds pydantic `extra="allow"` with `__pydantic_extra__` and `additional_properties` accessors for `AstraDBConnectorConfig` and `AstraDBConnectorConfigInput` > - Serialization updated to include additional fields; docs reflect new `__pydantic_extra__` > > **Docs & samples** > > - Updates `docs/sdks/destinations/README.md` and `codeSamples.yaml` to use `operations.*` typed request objects for create/update destination examples > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit e91d423. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
> [!IMPORTANT] > Linting report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/linting-report/3ed620997acae18a1e26ead1df0f8446> > OpenAPI Change report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/changes-report/31f32cae2ab448877e4dff6f510a4fd1> # SDK update ## Versioning Version Bump Type: [patch] - 🤖 (automated) <details open> <summary>OpenAPI Change Summary</summary> No specification changes </details> ## PYTHON CHANGELOG No relevant generator changes Based on [Speakeasy CLI](https://github.com/speakeasy-api/speakeasy) 1.601.0 Co-authored-by: speakeasybot <bot@speakeasyapi.dev>
> [!IMPORTANT] > Linting report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/linting-report/acc9ed4ac70b9e127a4693cc52193c86> > OpenAPI Change report available at: <https://app.speakeasy.com/org/unstructured/unstructured5xr/changes-report/3ce57b3ff042c75da24df753a4e9de1f> # SDK update ## Versioning Version Bump Type: [patch] - 🤖 (automated) <details> <summary>OpenAPI Change Summary</summary> ``` └─┬Info └──[🔀] version (18:16) ``` | Document Element | Total Changes | Breaking Changes | |------------------|---------------|------------------| | info | 1 | 0 | </details> ## PYTHON CHANGELOG No relevant generator changes Based on [Speakeasy CLI](https://github.com/speakeasy-api/speakeasy) 1.601.0 Co-authored-by: speakeasybot <bot@speakeasyapi.dev>
This PR replaces `pypdf` with `pypdfium2` when splitting a pdf file into chunks. This a more robust and faster library, avoiding occasional `RecursionError` that can happen with `pypdf`.
# SDK update ## Versioning Version Bump Type: [patch] - 🤖 (automated) [View full SDK changelog](https://app.speakeasy.com/org/unstructured/unstructured5xr/changes-report/5a3cd247d1de374a010b2d312a09a5e4) <details> <summary>OpenAPI Change Summary</summary> Based on [Speakeasy CLI](https://github.com/speakeasy-api/speakeasy) 1.601.0 Co-authored-by: Austin Walker <austin@unstructured.io>
## Summary
- Adds `httpx.RemoteProtocolError` to the list of retriable exceptions
in `retries.py` (both sync and async paths)
- When `retry_connection_errors=True`, server disconnects mid-request
(e.g. "Server disconnected without sending a response") are now retried
with backoff, matching the existing behavior for `ConnectError` and
`TimeoutException`
- Previously, `RemoteProtocolError` fell through to the catch-all
`Exception` handler and was wrapped as `PermanentError`, immediately
failing without retry
## Context
When a server crashes mid-request (e.g. SIGSEGV from thread-unsafe
native library access), the client receives an
`httpx.RemoteProtocolError("Server disconnected without sending a
response.")`. Despite `retry_connection_errors=True` being configured,
this error was not retried because the SDK only handled `ConnectError`
and `TimeoutException` as retriable transport errors.
The httpx exception hierarchy is:
```
RemoteProtocolError → ProtocolError → TransportError → RequestError → HTTPError
ConnectError → TransportError → RequestError → HTTPError (already retried)
TimeoutException → TransportError → RequestError → HTTPError (already retried)
```
`RemoteProtocolError` is the same class of transient transport error as
the already-retried exceptions.
## Test plan
- [x] Added unit tests for sync and async retry paths
- [ ] `RemoteProtocolError` retried when `retry_connection_errors=True`
— succeeds on 2nd attempt
- [ ] `RemoteProtocolError` raises immediately when
`retry_connection_errors=False`
- [ ] Existing `ConnectError` retry behavior preserved
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Changes retry behavior for mid-request disconnects, which can increase
duplicate-request risk for non-idempotent operations when
`retry_connection_errors=True`. Scope is limited to transport-error
handling and covered by new unit tests for sync/async paths.
>
> **Overview**
> **Retries now treat `httpx.RemoteProtocolError` as a retriable
transport failure** when `retry_connection_errors=True`, aligning it
with existing `ConnectError`/`TimeoutException` handling in both `retry`
and `retry_async`.
>
> Adds unit tests validating the new sync/async retry behavior (and the
disabled case), and bumps the SDK version to `0.42.11` with
corresponding changelog/release entries and user-agent/version updates.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
a7dc972. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary - Adds missing trailing spaces to `- OpenAPI Doc` line in the 0.42.11 RELEASES.md entry - Adds missing trailing newline at end of file - The Speakeasy publish action failed with `error parsing last release info` because the format didn't match the expected pattern This unblocks the 0.42.11 PyPI publish. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > Low risk: documentation-only formatting tweaks (trailing spaces/newline) to satisfy Speakeasy release parsing; no runtime code changes. > > **Overview** > Fixes the `2026-03-25` (`v0.42.11`) entry in `RELEASES.md` to match Speakeasy’s expected release format by restoring the trailing spaces on `- OpenAPI Doc` and ensuring the file ends with a proper final newline (so the last `PyPI v0.42.11` line is parsed correctly). > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 39fd2ae. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tc.) (#334) ## Summary - Replaces individual `except` blocks for `ConnectError`, `RemoteProtocolError`, and `TimeoutException` with a single catch for their parent class `httpx.TransportError` - This covers `ReadError` (TCP connection reset mid-response with empty message), `WriteError`, and all other transport-level failures - Previously, `ReadError` fell through to the catch-all `Exception` handler and was wrapped as `PermanentError`, failing immediately without retry ## Context Follow-up to #332. After deploying the `RemoteProtocolError` fix, we observed `httpx.ReadError` (empty message) failures when api pods crashed mid-response. The TCP connection was reset during the response read phase, which httpx classifies as `ReadError` rather than `RemoteProtocolError`. The httpx exception hierarchy: ``` TransportError ├── ConnectError (was retried) ├── RemoteProtocolError (was retried since #332) ├── ReadError (was NOT retried — now fixed) ├── WriteError (was NOT retried — now fixed) ├── PoolTimeout (was NOT retried — now fixed) └── ... TimeoutException (was retried, subclass of TransportError) ├── ConnectTimeout ├── ReadTimeout ├── WriteTimeout └── PoolTimeout ``` Catching `TransportError` is the correct level — all transport errors are transient and should be retried when `retry_connection_errors=True`. ## Test plan - [x] Parametrized tests for all TransportError subclasses (sync + async) - [ ] Each subclass retried when `retry_connection_errors=True` - [ ] Each subclass raises immediately when `retry_connection_errors=False` 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Expands which network failures are treated as retryable, which can change error/latency behavior for callers and potentially mask persistent transport issues until backoff is exhausted. > > **Overview** > **Broadened retry handling for transport failures.** The retry wrapper now catches `httpx.TransportError` in both sync and async paths, so additional transport-level errors (e.g. `ReadError`, `WriteError`, and timeout subclasses) are retried when `retry_connection_errors=True` instead of being treated as permanent. > > Tests were updated to parameterize across multiple `TransportError` subclasses for both sync and async retry behavior, and the package version/release notes were bumped to `0.42.12`. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit bdd403c. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
<!-- CURSOR_SUMMARY --> > [!NOTE] > **Medium Risk** > Medium risk because it changes packaging/build tooling and CI execution (Poetry→uv, setuptools build) and adjusts split-PDF hook timeout/cleanup behavior, which can affect test stability and request handling. > > **Overview** > Migrates the project from Poetry to `uv`: CI now installs via `uv sync --locked`, the Makefile runs lint/tests with `uv run`, and publishing is switched to `uv build`/`uv publish` with a hardened `scripts/publish.sh` (strict bash + Python >=3.11 guard). Python support is narrowed to 3.11+ (CI matrix and `pylintrc`), dependency versions are updated, and `poetry.lock`/`poetry.toml` are removed in favor of a setuptools-based `pyproject.toml` with dynamic versioning. > > Improves split-PDF behavior and test robustness: the split hook now propagates request timeouts into chunk requests, scales the outer future timeout by concurrency “waves”, and ensures per-operation state is cleaned up on both success and dummy-request failures; corresponding unit/integration tests were updated (including relaxed equivalence checks for `hi_res` OCR outputs and longer client timeouts). Adds regression-guard unit tests to enforce key packaging/CI/publish invariants and multipart file serialization, and removes an unused/disabled encryption test suite. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 3e0d3d3. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->
<!-- CURSOR_SUMMARY --> > [!NOTE] > **Medium Risk** > Touches core split-PDF execution and retry/timeout cleanup logic; mistakes could impact partition reliability or leak resources, though changes are well-covered by expanded unit/integration tests and logging. > > **Overview** > Improves split-PDF correctness and debuggability by adding **operation-aware observability** (plan/batch/chunk lifecycle logs) and propagating split metadata via `X-Unstructured-Split-*` headers into errors/logs. > > Hardens split execution: per-operation state is isolated, transport exceptions/cancellations are handled explicitly (with optional partial-results behavior via `split_pdf_allow_failed`), and timeout/cleanup paths now safely cancel in-flight work even when event loops are closed. > > Preserves chunk-level transport retries by deriving a split-specific retry config that always retries `httpx.TransportError` for chunk calls, even when SDK-level connection retries are disabled. CI/test tooling is updated (new platform integration job/target, more verbose integration output, and bumped GitHub Action versions), and the package is released as `0.43.1`. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit e65ce5b. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->
Updated pypi trusted publishing settings too: <img width="821" height="590" alt="image" src="https://github.com/user-attachments/assets/0c058ea8-d4e9-4ca3-ba2e-17253e89a16f" /> <!-- CURSOR_SUMMARY --> > [!NOTE] > **Medium Risk** > Moderate risk because it rewires the release/publish pipeline (trigger, permissions, artifact flow, and version gating), which could break publishing if misconfigured. It reduces secret-handling risk by removing reliance on a long-lived `PYPI_TOKEN`. > > **Overview** > Switches PyPI releases from the Speakeasy publish workflow + `PYPI_TOKEN` secret to a GitHub Releases-triggered pipeline that **builds with `uv`**, validates the release tag matches `unstructured_client._version`, and **publishes via trusted publishing (OIDC)** using `pypa/gh-action-pypi-publish`. > > Removes PyPI publishing configuration from Speakeasy (`.speakeasy/workflow*.yaml`) and stops passing `pypi_token` into the SDK generation workflow, while bumping SDK/package versioning to `0.43.2` (generator config + `_version.py`) and adding regression tests that enforce the new release workflow invariants. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 4d38845. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )