refactor(memory): Refactor memory pool adaptor#119
Open
Eyizoha wants to merge 5 commits intoalibaba:mainfrom
Open
refactor(memory): Refactor memory pool adaptor#119Eyizoha wants to merge 5 commits intoalibaba:mainfrom
Eyizoha wants to merge 5 commits intoalibaba:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Background and Motivation:
Currently, due to the Memory Pool Adaptor implementation, the lifecycle of BatchReader must outlive the ReadBatch it returns. Otherwise, batch destruction will cause dangling pointer access. Specifically, to adapt the Paimon memory pool to Arrow or ORC memory pool interfaces, the BatchReader internally wraps the incoming Paimon memory pool using an adaptor and passes it to Arrow or ORC APIs. As a result, the memory pool object referenced by PoolBuffer inside the batch is not the original memory pool object directly passed to the Paimon API, but rather the adaptor object held internally by the BatchReader.
This requires users to keep the BatchReader alive until all returned batches (specifically, the internal PoolBuffers) are released. This constraint couples Reader and Batch in a way that is unintuitive and error-prone for users, and may also limit certain resource optimization techniques.
Solution:
To address this issue, this PR refactors the memory pool adaptor pattern by placing the adaptor directly into the Paimon memory pool, ensuring they share consistent lifecycles. This allows readers to use the adaptors directly without creating and holding them independently. Adaptor creation is lazy/one-time and follows a factory pattern, facilitating the plug-and-play of ORC format library (and potentially other adaptors in the future).
Tests
Existing tests cover this change. Additionally, manual testing has verified the decoupling of reader and batch lifecycles.
API and Format
This change introduces two new methods to the memory pool that return pointers to Arrow and ORC adaptors respectively. The change maintains forward compatibility.
Documentation
No new documentation required.