Skip to content

ENH: Allow irregular arrays by using padding#720

Open
basnijholt wants to merge 104 commits intomainfrom
irregular-array
Open

ENH: Allow irregular arrays by using padding#720
basnijholt wants to merge 104 commits intomainfrom
irregular-array

Conversation

@basnijholt
Copy link
Collaborator

Alternative to #672

@codspeed-hq
Copy link

codspeed-hq bot commented Apr 11, 2025

CodSpeed Instrumentation Performance Report

Merging #720 will degrade performances by 5.77%

Comparing irregular-array (31a5c84) with main (3b17489)

Summary

❌ 2 (👁 2) regressions
✅ 4 untouched

Benchmarks breakdown

Benchmark BASE HEAD Change
👁 test_map_sequential_with_dict_storage 214.6 ms 227.8 ms -5.77%
👁 test_map_sequential_with_dict_storage_eager 233.1 ms 246.2 ms -5.3%

@codecov
Copy link

codecov bot commented Apr 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
pipefunc/_pipefunc.py 100.00% <100.00%> (ø)
pipefunc/_utils.py 100.00% <100.00%> (ø)
pipefunc/map/_mapspec.py 100.00% <100.00%> (ø)
pipefunc/map/_progress.py 100.00% <100.00%> (ø)
pipefunc/map/_result.py 100.00% <100.00%> (ø)
pipefunc/map/_run.py 100.00% <100.00%> (ø)
pipefunc/map/_run_info.py 100.00% <100.00%> (ø)
pipefunc/map/_storage_array/_base.py 100.00% <100.00%> (ø)
pipefunc/map/_storage_array/_dict.py 100.00% <100.00%> (ø)
pipefunc/map/_storage_array/_file.py 100.00% <100.00%> (ø)
... and 3 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

basnijholt and others added 18 commits April 10, 2025 22:50
- Convert irregular outputs with np.ma.masked sentinels to proper MaskedArrays in _output_from_mapspec_task
- Fix DictArray.__getitem__ to handle slicing of irregular arrays with proper padding
- Update tests to use explicit mapspecs instead of add_mapspec_axis (which generates invalid mapspecs for irregular dimensions)
- Fix test expectations to match actual auto-inference behavior
- Handle both MaskedArrays and object arrays with sentinel values in reduction functions

All tests in test_irregular_arrays.py now pass correctly.
- Add SparseIrregularArray class for memory-efficient storage of irregular arrays
- Integrate sparse array support into DictArray with auto-detection
- Auto-use sparse when total size >10k elements or density <10%
- Add comprehensive test coverage for sparse arrays (3 test files, 37 tests)
- Achieve 999x memory reduction for highly variable array sizes (e.g., 99 arrays of size 1 + 1 array of size 1000)

The sparse implementation provides an array-like interface while storing only actual data values, dramatically reducing memory usage when array sizes vary from 1 to billions of elements.
Replace inefficient masking implementation with vectorized approach using
np.frompyfunc. This significantly improves performance for irregular arrays:

- Previous: O(n) time/space with flatten, iterate, copy operations
- New: Vectorized sentinel detection with minimal memory overhead
- Simpler code: 7 lines vs 20+ lines

The new implementation correctly identifies np.ma.masked sentinels and
creates proper MaskedArrays while being more efficient.
- Add create_mask_for_masked_values() helper to _utils.py
- Add _ensure_masked_array_for_irregular() to StorageBase
- Update DictArray.__getitem__ to return MaskedArrays for irregular data
- Add comprehensive tests for irregular array sum operations

When irregular arrays contain np.ma.masked sentinel values, they are now
automatically wrapped in MaskedArrays with proper masks, providing a cleaner
API for users to handle varying-length data.
arr[(slice(None), slice(None))]

with pytest.raises(IndexError, match=r"Index \(1,\) out of bounds"):
arr[(0, 1)]

Check notice

Code scanning / CodeQL

Statement has no effect Note test

This statement has no effect.
@basnijholt basnijholt force-pushed the irregular-array branch 2 times, most recently from df75fda to 966d99e Compare October 2, 2025 18:03
basnijholt added a commit that referenced this pull request Oct 2, 2025
basnijholt added a commit that referenced this pull request Oct 2, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2025

✅ PR Title Formatted Correctly

The title of this PR has been updated to match the correct format. Thank you!

if args.skipped:
if first and not args.missing and not args.existing:
shape = args.arrays[0].full_shape
assert "shape" in locals()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to verify this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant