ENH: `error_handling` options (`raise` and `continue`) by basnijholt · Pull Request #854 · pipefunc/pipefunc

basnijholt · 2025-07-07T21:55:16Z

Supersedes #842.

codspeed-hq · 2025-07-07T22:00:41Z

CodSpeed Performance Report

Merging #854 will not alter performance

_{Comparing continue-on-errors2 (ffb7f3d) with main (2a859f7)}

Summary

✅ 6 untouched

codecov · 2025-07-07T22:06:58Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines	Coverage Δ
pipefunc/_error_handling.py	`100.00% <100.00%> (ø)`
pipefunc/_pipefunc.py	`100.00% <100.00%> (ø)`
pipefunc/_pipefunc_utils.py	`100.00% <100.00%> (ø)`
pipefunc/_pipeline/_base.py	`100.00% <100.00%> (ø)`
pipefunc/exceptions.py	`100.00% <100.00%> (ø)`
pipefunc/map/_adaptive_scheduler_slurm_executor.py	`100.00% <100.00%> (ø)`
pipefunc/map/_prepare.py	`100.00% <ø> (ø)`
pipefunc/map/_progress.py	`100.00% <100.00%> (ø)`
pipefunc/map/_run.py	`100.00% <100.00%> (ø)`
pipefunc/map/_run_eager.py	`100.00% <100.00%> (ø)`
... and 3 more

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Add pipefunc/_error_handling.py with common error handling utilities: - check_for_error_inputs(): Check if inputs contain ErrorSnapshot objects - create_propagated_error(): Create PropagatedErrorSnapshot instances - handle_error_inputs(): Combined error checking and propagation - cloudpickle_function_state/cloudunpickle_function_state(): Common pickling helpers - Refactor duplicate error detection code in pipefunc/map/_run.py to use common functions - Update ErrorSnapshot and PropagatedErrorSnapshot pickling to use common helpers - Pass error_handling parameter to prepare_run in async and eager execution modules - Add missing error_handling parameter descriptions to docstrings All tests pass with both parallel=False and parallel=True execution modes.

pipefunc/_error_handling.py

pipefunc/exceptions.py

- Consolidate development notes into DEVELOPMENT_NOTES.md - Update PLAN.md to reflect completed implementation and refactoring - Remove redundant documentation files - Document parallel execution fix and code refactoring work

…ly implemented

…fixed

- Created ErrorInfo dataclass to improve type safety and structure - Added factory methods from_full_error() and from_partial_error() - Updated PropagatedErrorSnapshot to use ErrorInfo instead of dict - Modified pickling/unpickling methods to handle the new dataclass - All error handling tests pass with both parallel=False and parallel=True

…ling - Created test_error_handling_coverage.py with tests for: - ErrorInfo dataclass creation methods - PropagatedErrorSnapshot with default reason - handle_error_inputs with raise mode - cloudpickle functions with missing keys - ErrorSnapshot save/load functionality - PropagatedErrorSnapshot pickling with complex error_info - Created test_error_handling_final_coverage.py with tests for: - PropagatedErrorSnapshot string repr with partial errors - get_root_causes with partial errors (empty list case) Both pipefunc._error_handling.py and pipefunc.exceptions.py now have 100% test coverage.

- Test error handling with dict, file_array, and zarr_memory storage - Verify that ErrorSnapshot objects are preserved in dict storage - Document expected behavior for storage backends that may not preserve error objects - Add comprehensive error handling documentation with examples

pipefunc/_error_handling.py

pipefunc/exceptions.py

…rror object storage This commit addresses two critical issues in the error handling system that were causing failures in parallel execution with file-based storage backends. ## Issue 1: Missing `in_executor` parameter in error handling The `in_executor` parameter was removed in commit 0d8e419b but was still being used in the error handling logic. This parameter is crucial for distinguishing between sequential (main process) and parallel (subprocess) execution contexts. ### Root Cause - Error handling needs to know execution context to properly wrap exceptions - In sequential mode: exceptions are raised directly (in_executor=False) - In parallel mode: exceptions are caught and wrapped as ErrorSnapshot (in_executor=True) - The parameter was accidentally removed during refactoring ### Fix Restored the conditional logic in `_run_iteration()`: ```python in_executor = executor is not None if error_handling == "raise": yield from func(*args, **kwargs, _return_mapspec_names=True, _in_executor=in_executor) ``` ## Issue 2: FileArray not preserving ErrorSnapshot objects in parallel execution FileArray storage was returning `np.ma.masked` instead of `PropagatedErrorSnapshot` objects when errors occurred in parallel execution, causing test failures. ### Root Cause Analysis 1. **Storage Logic**: FileArray uses XOR logic to determine when to dump: ```python if force_dump or (array.dump_in_subprocess ^ in_post_process): ``` 2. **The Problem**: - FileArray has `dump_in_subprocess=True` - In subprocess: `True ^ False = True` → dumps correctly - In main process: `True ^ True = False` → doesn't dump - PropagatedErrorSnapshot created in main process wasn't being stored 3. **Execution Flow**: - Sequential: ErrorSnapshot created and stored → PropagatedErrorSnapshot created and stored - Parallel: ErrorSnapshot created in subprocess → stored correctly PropagatedErrorSnapshot created in main process → NOT stored due to XOR logic ### Fix Modified `_update_array()` to always dump error objects regardless of process context: ```python from pipefunc.exceptions import ErrorSnapshot, PropagatedErrorSnapshot # Always dump error objects, or follow normal dump rules is_error_object = isinstance(_output, (ErrorSnapshot, PropagatedErrorSnapshot)) if is_error_object or force_dump or (array.dump_in_subprocess ^ in_post_process): # ... dump logic ... ``` ## Testing and Validation Created extensive test suite to understand the issue: - `test_file_array_error.py`: Verified FileArray can store/retrieve ErrorSnapshot - `test_error_propagation.py`: Confirmed error propagation works with dict storage - `test_error_propagation_trace.py`: Traced execution flow differences - `test_debug_storage.py`: Compared parallel vs sequential execution Key findings: - Dict storage worked correctly (no serialization involved) - FileArray failed only in parallel mode when storing PropagatedErrorSnapshot - The issue was specific to the XOR dumping logic, not serialization ## Impact This fix ensures: 1. Error handling works correctly in both sequential and parallel execution 2. All storage backends (dict, file_array, zarr) properly preserve error objects 3. Error propagation chain is maintained across pipeline stages 4. Tests in `test_error_handling_storage.py` now pass for all storage backends ## Technical Details - ErrorSnapshot: Captures exception info with metadata (timestamp, user, machine, IP) - PropagatedErrorSnapshot: Created when a function receives ErrorSnapshot as input - FileArray: Uses cloudpickle for serialization, returns np.ma.masked for missing files - The XOR logic `dump_in_subprocess ^ in_post_process` is preserved for normal data but bypassed for error objects to ensure they are always persisted

- Fix _default_output_picker to handle ErrorSnapshot objects properly - When a function returns an error, it's now returned as-is for all output names in a tuple - Add comprehensive tests covering tuple outputs with error handling including: - Basic tuple outputs with element-wise mapspecs - Tuple outputs with reduction mapspecs - Custom output pickers with error handling - Parallel and sequential execution - Complex multi-dimensional mapspecs - Edge cases where all outputs in a tuple get the same error This ensures that error_handling='continue' works correctly with functions that have tuple output names.

The error_handling parameter was present in run_map_async and run_map_eager_async but missing from the Pipeline.map_async method signature and documentation. This caused the docstring check to fail.

- Cache _ERROR_INFOS_NONE singleton to avoid NamedTuple creation per iteration - Cache _RESOURCES_SKIPPED singleton to avoid dataclass creation per iteration - Reorder eval_resources to check callable(func.resources) first, avoiding unnecessary dict lookups when resources is not callable (most common case) - Remove unnecessary dict(kwargs) copy in _call_user

…cache logic - Cache _ERROR_CONTEXT_RAISE_NONE singleton for the most common case (raise mode with no error info) to avoid dataclass creation per iteration - Inline _get_or_set_cache logic for cache=None case to avoid inner function definition overhead on every call - Together with previous optimizations, reduces overhead by ~6.5% locally

…inlined cache logic" This reverts commit 7b64c4c.

- Fix resume validation to reject runs with different error_handling modes to prevent silently reusing ErrorSnapshots from continue-mode runs - Fix scan_inputs_for_errors crash on 0-D object arrays by handling them specially (np.where doesn't work on 0-D arrays) - Fix output_picker exceptions bypassing continue handling by wrapping picker calls with error handling in both _pick_output and _dump_single_output Tests added: - test_empty_inputs_continue_mode - test_resume_rejects_different_error_handling_mode - test_output_picker_exception_in_continue_mode - test_output_picker_exception_raises_in_raise_mode - test_scan_inputs_for_errors_0d_object_array - test_scan_inputs_for_errors_0d_numeric_array

Add FINAL_REVIEW.md documenting all bugs fixed and issues investigated for the error handling "continue" mode feature: - 3 confirmed bugs with regression tests - 4 potential issues investigated (1 inefficiency, 2 by design, 1 inconclusive) Also add test_error_handling_potential_issues.py with tests that document the investigated potential issues and their findings.

Add assessment of new concerns from PR_REVIEW.md: - Performance optimization already in place (StorageBase dtype check) - Thread-safety: by design, test validates correct per-thread snapshots - Deep recursion: theoretical risk only for 1000+ stage pipelines - Partial array errors: documented behavior in get_root_causes() Reorganize remaining TODOs into "Investigated" vs "Still Unvalidated" sections.

…tion regex - Updated in to allow filtering for any resource scope when using with SLURM. This prevents useless job submissions for map-scope resources when upstream errors exist. - Added regression test . - Fixed a regex pattern in to match the actual error message. - Added with investigation details. - Updated to mark the issue as resolved.

Error objects were being dumped twice (once in subprocess, once in main process) because they bypassed the XOR logic in _update_array. This was inefficient I/O overhead. Fix: Remove the is_error_object bypass - error objects now follow the same XOR dump logic as normal values, ensuring exactly one dump per output. Test: test_error_objects_dump_count_with_mock (previously xfail) now passes.

Merge two redundant tests into one focused test after the `func` argument was removed from `should_filter_error_indices`. The previous tests created multiple PipeFunc objects with different resource scopes but never used them since the function no longer depends on scope.

- Add tests for output_picker exceptions with file storage (mapped functions) - Add tests for output_picker exceptions in non-mapped functions with file storage (covers _dump_single_output lines 649-659) - Add test for all-error indices being run locally instead of submitted to executor (covers _all_indices_propagate_errors path)

- Remove FINAL_REVIEW.md (PR review document) - Remove docs/dev/ (development notes) - Move test_output_picker_error_snapshot.py to tests/unit/error_handling/

github-actions · 2025-12-12T06:50:05Z

✅ PR Title Formatted Correctly

The title of this PR has been updated to match the correct format. Thank you!

basnijholt · 2025-12-12T07:14:53Z

@MitchellAcoustics try out v0.90.0! 😄

basnijholt added 2 commits July 7, 2025 14:48

ENH: error_handling options (raise and continue)

59d5771

wip

897d3f7

.

04ae7fe

LennartGevers mentioned this pull request Jul 10, 2025

BUG: FIX Pipeline.update_defaults with inputs that do not support __eq__ #857

Merged

basnijholt added 7 commits July 25, 2025 09:26

Merge remote-tracking branch 'origin/main' into continue-on-errors2

3126b8e

Add PLAN.md

d083c9f

Add plan and TDD tests

bf07c42

Get tests/map/test_error_handling.py passing

c2916fc

Pass all in tests/map/test_error_handling_basic.py

e837952

Fix parallel issue

ad7df4d

github-advanced-security bot found potential problems Jul 25, 2025

View reviewed changes

pipefunc/_error_handling.py Fixed Show fixed Hide fixed

pipefunc/exceptions.py Fixed Show fixed Hide fixed

pipefunc/exceptions.py Fixed Show fixed Hide fixed

pipefunc/exceptions.py Fixed Show fixed Hide fixed

pipefunc/exceptions.py Fixed Show fixed Hide fixed

basnijholt added 7 commits July 25, 2025 12:21

Update documentation for error handling feature

cef16d2

- Consolidate development notes into DEVELOPMENT_NOTES.md - Update PLAN.md to reflect completed implementation and refactoring - Remove redundant documentation files - Document parallel execution fix and code refactoring work

Update PLAN.md with completed implementation status and refactoring

b8134b7

Remove incorrect limitation about reduction operations - they are ful…

1515f11

…ly implemented

Remove outdated tests/IMPLEMENTATION_NOTES.md - all issues have been …

71e92d9

…fixed

github-advanced-security bot found potential problems Jul 25, 2025

View reviewed changes

pipefunc/_error_handling.py Dismissed Show dismissed Hide dismissed

pipefunc/_error_handling.py Dismissed Show dismissed Hide dismissed

pipefunc/exceptions.py Dismissed Show dismissed Hide dismissed

basnijholt added 8 commits July 25, 2025 15:10

Use in_executor again

382c767

comment

fb11879

fix: add missing error_handling parameter to Pipeline.map_async method

e0760fd

The error_handling parameter was present in run_map_async and run_map_eager_async but missing from the Pipeline.map_async method signature and documentation. This caused the docstring check to fail.

remove code that was not needed

9aaf4fd

Merge remote-tracking branch 'origin/main' into continue-on-errors2

0a3b28f

Merge branch 'main' into continue-on-errors2

e0d88f6

basnijholt added 25 commits November 25, 2025 12:18

Revert "perf: further optimize hot path with cached ErrorContext and …

361d4e0

…inlined cache logic" This reverts commit 7b64c4c.

FINAL_REVIEW.md by Claude

176564b

Add Codex concerns

48f1d2d

docs: clarify non-issues in FINAL_REVIEW.md

6dff808

docs: update PR_REVIEW.md with investigation findings

a894388

docs: remove PR_REVIEW.md (superseded by FINAL_REVIEW.md)

a43f835

docs: remove SLURM.md (findings merged into FINAL_REVIEW.md)

217f75c

test: clean up error handling potential issues tests

7c9ee39

test: restore test logic while cleaning up comments

6f41087

Merge remote-tracking branch 'origin/main' into continue-on-errors2

d71078d

chore: clean up branch for merge

c634220

- Remove FINAL_REVIEW.md (PR review document) - Remove docs/dev/ (development notes) - Move test_output_picker_error_snapshot.py to tests/unit/error_handling/

Merge branch 'main' into continue-on-errors2

6fd8491

speed test

ebd5876

clenaup

a8f19e7

rm dead

f08c61d

Try to be more dry without losing perf

ffb7f3d

basnijholt force-pushed the continue-on-errors2 branch from 0d74d31 to ffb7f3d Compare December 12, 2025 06:49

basnijholt merged commit 11f11fa into main Dec 12, 2025
37 checks passed

basnijholt deleted the continue-on-errors2 branch December 12, 2025 07:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: `error_handling` options (`raise` and `continue`)#854

ENH: `error_handling` options (`raise` and `continue`)#854
basnijholt merged 158 commits intomainfrom
continue-on-errors2

basnijholt commented Jul 7, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Jul 7, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 12, 2025

Uh oh!

Uh oh!

basnijholt commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

basnijholt commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #854 will not alter performance

Summary

Uh oh!

codecov bot commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 12, 2025

✅ PR Title Formatted Correctly

Uh oh!

Uh oh!

basnijholt commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

basnijholt commented Jul 7, 2025 •

edited

Loading

codspeed-hq bot commented Jul 7, 2025 •

edited

Loading

codecov bot commented Jul 7, 2025 •

edited

Loading