ENH: Allow irregular arrays by using padding#720
Open
basnijholt wants to merge 104 commits intomainfrom
Open
Conversation
CodSpeed Instrumentation Performance ReportMerging #720 will degrade performances by 5.77%Comparing Summary
Benchmarks breakdown
|
Codecov Report✅ All modified and coverable lines are covered by tests.
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
for more information, see https://pre-commit.ci
- Convert irregular outputs with np.ma.masked sentinels to proper MaskedArrays in _output_from_mapspec_task - Fix DictArray.__getitem__ to handle slicing of irregular arrays with proper padding - Update tests to use explicit mapspecs instead of add_mapspec_axis (which generates invalid mapspecs for irregular dimensions) - Fix test expectations to match actual auto-inference behavior - Handle both MaskedArrays and object arrays with sentinel values in reduction functions All tests in test_irregular_arrays.py now pass correctly.
- Add SparseIrregularArray class for memory-efficient storage of irregular arrays - Integrate sparse array support into DictArray with auto-detection - Auto-use sparse when total size >10k elements or density <10% - Add comprehensive test coverage for sparse arrays (3 test files, 37 tests) - Achieve 999x memory reduction for highly variable array sizes (e.g., 99 arrays of size 1 + 1 array of size 1000) The sparse implementation provides an array-like interface while storing only actual data values, dramatically reducing memory usage when array sizes vary from 1 to billions of elements.
This reverts commit bac3c7f.
Replace inefficient masking implementation with vectorized approach using np.frompyfunc. This significantly improves performance for irregular arrays: - Previous: O(n) time/space with flatten, iterate, copy operations - New: Vectorized sentinel detection with minimal memory overhead - Simpler code: 7 lines vs 20+ lines The new implementation correctly identifies np.ma.masked sentinels and creates proper MaskedArrays while being more efficient.
- Add create_mask_for_masked_values() helper to _utils.py - Add _ensure_masked_array_for_irregular() to StorageBase - Update DictArray.__getitem__ to return MaskedArrays for irregular data - Add comprehensive tests for irregular array sum operations When irregular arrays contain np.ma.masked sentinel values, they are now automatically wrapped in MaskedArrays with proper masks, providing a cleaner API for users to handle varying-length data.
8a023fa to
8f6f3da
Compare
df75fda to
966d99e
Compare
966d99e to
2df3a24
Compare
Contributor
✅ PR Title Formatted CorrectlyThe title of this PR has been updated to match the correct format. Thank you! |
basnijholt
commented
Oct 2, 2025
| if args.skipped: | ||
| if first and not args.missing and not args.existing: | ||
| shape = args.arrays[0].full_shape | ||
| assert "shape" in locals() |
Collaborator
Author
There was a problem hiding this comment.
need to verify this
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Alternative to #672