FEAT: streaming support in fetchone for varcharmax data type#219
Merged
gargsaumya merged 12 commits intomainfrom Sep 15, 2025
Merged
FEAT: streaming support in fetchone for varcharmax data type#219gargsaumya merged 12 commits intomainfrom
gargsaumya merged 12 commits intomainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR adds comprehensive streaming support for VARCHAR(MAX) data types by introducing a new LOB (Large Object) streaming mechanism in the C++ bindings and updating the Python cursor layer to handle long strings more efficiently.
Key changes:
- Implements streaming-based data retrieval for large VARCHAR(MAX) columns to handle values that exceed buffer limits
- Refactors SQL type mapping to use zero column size for long strings, triggering proper LOB handling
- Adds comprehensive test coverage for VARCHAR(MAX) scenarios including boundary conditions, large values, and edge cases
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| mssql_python/pybind/ddbc_bindings.cpp | Adds FetchLobColumnData function for streaming large column data and updates SQLGetData_wrap to use streaming for VARCHAR(MAX) |
| mssql_python/cursor.py | Updates _map_sql_type to use SQL_VARCHAR/SQL_WVARCHAR with zero column size for long strings |
| tests/test_004_cursor.py | Adds comprehensive test suite for VARCHAR(MAX) covering various data sizes, edge cases, and transaction scenarios |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
sumitmsft
requested changes
Sep 9, 2025
Contributor
sumitmsft
left a comment
There was a problem hiding this comment.
Left a few comments. Please resolve
f6b7389 to
e21b47e
Compare
Collaborator
bewithgaurav
left a comment
There was a problem hiding this comment.
need a re-review post solving conflicts
11dac52 to
960edef
Compare
4ee4c77 to
960edef
Compare
78dc1e8 to
598a6be
Compare
598a6be to
7f67326
Compare
Contributor
Author
The conflicts are now resolved. You can go ahead and re-review. |
sumitmsft
previously approved these changes
Sep 12, 2025
bewithgaurav
previously approved these changes
Sep 15, 2025
### Work Item / Issue Reference <!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below (e.g. AB#37452) For external contributors: Insert Github Issue number below (e.g. #149) Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > [AB#38110](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/38110) [AB#34162](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/34162) <!-- External contributors: GitHub Issue --> > GitHub Issue: #<ISSUE_NUMBER> ------------------------------------------------------------------- ### Summary <!-- Insert your summary of changes below. Minimum 10 characters required. --> This pull request improves NVARCHAR data handling in the SQL Server Python bindings and adds comprehensive tests for NVARCHAR(MAX) scenarios. The main changes include switching to streaming for large NVARCHAR values, optimizing direct fetch for smaller values, and adding tests for edge cases and boundaries to ensure correctness. **NVARCHAR data handling improvements:** * Updated the logic in `ddbc_bindings.cpp` to use streaming for large NVARCHAR/NCHAR columns (over 4000 characters or unknown size) and direct fetch for smaller values, optimizing performance and reliability. * Refactored data conversion for NVARCHAR fetches, using `std::wstring` for conversion and simplifying platform-specific handling for both macOS/Linux and Windows. * Improved handling of empty strings and NULLs for NVARCHAR columns, ensuring correct Python types are returned and logging is more descriptive. **Testing enhancements:** * Added new tests in `test_004_cursor.py` for NVARCHAR(MAX) covering short strings, boundary conditions (4000 chars), streaming (4100+ chars), large values (100,000 chars), empty strings, NULLs, and transaction rollback scenarios to verify correct behavior across all edge cases. **VARCHAR/CHAR fetch improvements:** * Improved direct fetch logic for small VARCHAR/CHAR columns and fixed string conversion to use the actual data length, preventing potential issues with null-termination and buffer size. [[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R1825-R1830) [[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L1841-L1850) <!-- ### PR Title Guide > For feature requests FEAT: (short-description) > For non-feature requests like test case updates, config updates , dependency updates etc CHORE: (short-description) > For Fix requests FIX: (short-description) > For doc update requests DOC: (short-description) > For Formatting, indentation, or styling update STYLE: (short-description) > For Refactor, without any feature changes REFACTOR: (short-description) > For release related changes, without any feature changes RELEASE: #<RELEASE_VERSION> (short-description) ### Contribution Guidelines External contributors: - Create a GitHub issue first: https://github.com/microsoft/mssql-python/issues/new - Link the GitHub issue in the "GitHub Issue" section above - Follow the PR title format and provide a meaningful summary mssql-python maintainers: - Create an ADO Work Item following internal processes - Link the ADO Work Item in the "ADO Work Item" section above - Follow the PR title format and provide a meaningful summary -->
fba171c
sumitmsft
approved these changes
Sep 15, 2025
jahnvi480
approved these changes
Sep 15, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Work Item / Issue Reference
Summary
This pull request significantly improves the handling of large object (LOB) data types (such as large strings and binary data) in the MSSQL Python driver, especially for fetching and streaming variable-length data. The changes introduce robust streaming logic for LOB columns, prevent data truncation, and ensure correct type handling for both single-row and batch fetches. Additionally, the code now detects LOB columns and automatically switches to per-row streaming when necessary, improving reliability and correctness for large datasets.
LOB Streaming and Fetching Improvements:
FetchLobColumnDatafunction inddbc_bindings.cppto stream LOB data (CHAR, WCHAR, and BINARY types) in chunks, correctly handling nulls, null-terminators, and platform-specific encoding. This prevents truncation and errors when fetching large columns.SQLGetData_wrapto use streaming for LOB columns or when data length is unknown/too large, for both narrow and wide character types, as well as binary data. This ensures correct retrieval of all data regardless of size. [1] [2] [3]Batch Fetch Logic Enhancements:
FetchBatchDatato detect LOB columns and use streaming fetch for those columns, avoiding exceptions and ensuring all data is retrieved for large columns in batch operations. [1] [2] [3] [4] [5]FetchMany_wrapto pre-scan columns for LOB types and, if any are found, fall back to row-by-row streaming fetch for those rows; otherwise, it proceeds with standard batch fetching.Type Mapping and Constants:
_map_sql_typeincursor.pyto map long string types toSQL_WVARCHAR/SQL_VARCHARwith length 0 for streaming, aligning with the new LOB streaming logic.SQL_MAX_LOB_SIZE(8000) as the threshold for LOB streaming, centralizing the logic for when to treat columns as LOBs.These changes collectively make LOB handling more robust, reduce the risk of data truncation, and improve compatibility across platforms.