Skip to content

rasterize: slice line_props/point_props per tile in dask backends (#2020)#2023

Merged
brendancol merged 2 commits into
mainfrom
deep-sweep-performance-rasterize-2026-05-17
May 18, 2026
Merged

rasterize: slice line_props/point_props per tile in dask backends (#2020)#2023
brendancol merged 2 commits into
mainfrom
deep-sweep-performance-rasterize-2026-05-17

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2020.

Summary

  • The dask backends _run_dask_numpy and _run_dask_cupy previously embedded the full per-type line_props and point_props tables in every delayed tile task closure, even though only a small subset of geometries hits any one tile. The polygon path already handles this correctly via poly_props[pmask]; the line and point paths did not.
  • Added _slice_props_for_tile which slices props to the unique geom indices referenced by a tile and returns a remapped int32 local-index array. Wired it into both dask backends alongside the existing _segments_for_tile / _points_for_tile filters.
  • Updated the dask+cupy path to slice on the driver and let _ensure_cupy upload the per-tile slice inside the worker. The previous driver-side cupy.asarray(line_props) only saved redundant uploads when the same arrays were referenced across tasks; per-tile slicing trades that for far smaller graph payloads.

Why

Without this fix the graph payload scales as O(N_geoms * N_tiles). At 5 000 points with 8 numeric columns across 100 tiles this puts ~30 MB of point_props duplicates in the graph; for localized lines under the same shape it is ~32 MB. Workers ship a tiny pixel-extent fix.

Measured

Workload Graph size (old) Graph size (new)
5 000 points, 8 cols, 100 tiles ~30 MB 0.26 MB
10 000 localized lines, 10 cols, 100 tiles ~80 MB 1.10 MB
10 000 sprawling lines, 1 col, 100 tiles 33 MB 39 MB (no help — table is referenced everywhere by design)

Sprawling random lines that touch most tiles cannot benefit because every tile genuinely references most rows of line_props; this is fundamental. The fix targets the realistic localized case.

Tests

xrspatial/tests/test_rasterize_tile_props_slice_2020.py (9 new tests):

  • TestSlicePropsForTile: empty input keeps column shape; sliced[local_idx] round-trips to props[geom_idx]; duplicate geom indices compact to one row.
  • TestDaskGraphPayloadBounded: pickled graph for 5 000 points / 5 000 localized lines stays under a few MB.
  • TestDaskBackendOutputUnchanged: numpy vs dask outputs match exactly with lines + points, with merge='sum', and with polygon-only input.

All 184 existing rasterize tests pass. Dask + cupy parity confirmed end-to-end against the numpy backend on a mixed lines + points GeoDataFrame.

Test plan

  • pytest xrspatial/tests/test_rasterize.py xrspatial/tests/test_rasterize_accuracy.py xrspatial/tests/test_rasterize_tile_props_slice_2020.py (193 passed, 2 pre-existing skips)
  • Manual dask+cupy parity probe vs numpy on lines + points
  • CUDA _scanline_fill_gpu resource probe: 39 regs/thread, 24 KB local mem/thread (unchanged)

)

The dask backends embedded the full per-type props tables into every
delayed tile task, despite the polygon path already filtering via
poly_props[pmask]. For workloads with many localized geometries this
amplified scheduler overhead by O(N_geoms * N_tiles).

Adds _slice_props_for_tile that returns the subset of props referenced
by a tile and a remapped local-index array. Wired into _run_dask_numpy
and _run_dask_cupy alongside the existing segment / point tile filters.

Measured: 5000 points x 8 cols / 100 tiles graph shrinks ~30 MB -> 0.3 MB;
localized lines ~32 MB -> 1.1 MB. Dask + cupy parity verified.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 18, 2026
@brendancol
Copy link
Copy Markdown
Contributor Author

PR Review: rasterize: slice line_props/point_props per tile in dask backends (#2020)

Graph-payload fix that brings the line/point paths in line with what the polygon path already does. The new _slice_props_for_tile helper is small, and the round-trip property is verified by a test. A few comments below; none block merge.

Blockers

None.

Suggestions

  • xrspatial/tests/test_rasterize_tile_props_slice_2020.py: no explicit straddling-line test. The seed-42 numpy_vs_dask test probably exercises tile-crossing lines incidentally, but a targeted test (one line that crosses a known seam, asserting the burned pixels are contiguous on both sides) would catch a future regression where a geometry gets dropped from one of the two tiles its bbox overlaps. With 30 geometries on a 512x512 raster with 64x64 chunks the current coverage is statistical, not structural.

  • xrspatial/rasterize.py:1860-1864: the dask+cupy comment notes the tradeoff, but the PR body's "sprawling lines" measurement (33 MB vs 39 MB) only covers the numpy path. For the cupy path the change replaces one driver-side cupy.asarray(line_props) with per-tile _ensure_cupy calls inside workers. Worth one extra sentence in the comment saying that for sprawling-line workloads the GPU sees N_tiles small uploads instead of one big one, so the cost shifts from graph payload to PCIe traffic.

Nits

  • xrspatial/rasterize.py:1602-1603: the comment says np.unique "remaps geom_idx in-place". It does not; the inverse map is a new array. Suggest: "np.unique returns sorted unique values and an inverse map that points each entry in geom_idx to its position in unique_idx."

  • xrspatial/rasterize.py:1600-1601: the empty branch returns geom_idx untouched, so the dtype of the returned local_idx is whatever the caller passed (int32 in practice from the upstream filters), while the non-empty branch always returns int32. Worth normalising: return geom_idx.astype(np.int32, copy=False), props[:0].

  • xrspatial/rasterize.py:1728-1738 and 1893-1898: identical 6-line slice block in both backends. Could fold into a tiny helper that returns the rewritten ts / tp tuples plus the sliced props, but the current form keeps the call sites readable. Leave as-is unless a third backend lands.

  • xrspatial/tests/test_rasterize_tile_props_slice_2020.py:113: pickle.dumps(dict(graph)) materialises the graph dict before pickling. pickle.dumps(graph) works directly with HighLevelGraph, but the explicit dict() matches what dask's distributed scheduler does internally, so this is fine.

What looks good

  • Helper is symmetric with the polygon path. Round-trip is verified by test_remap_round_trips_props.
  • Empty-input branch preserves the column dimension, which downstream [:, j] indexing needs.
  • Graph-payload regression tests assert concrete byte bounds, so a future regression will fail loudly.
  • Cupy comment correctly explains why the driver-side cupy.asarray was removed.
  • All 184 prior rasterize tests pass per the PR body; 9 new tests pass locally on this branch.

Checklist

  • Algorithm matches reference (N/A: graph-payload fix, no algorithmic change)
  • Backends produce consistent results (numpy+dask parity tested; cupy parity verified manually per PR body)
  • NaN handling unchanged
  • Edge cases covered: empty geom_idx, single-geom, multi-column props
  • Dask chunk boundaries: covered by the segment-bbox filter, but no explicit straddling test (see suggestion)
  • No premature materialization or unnecessary copies
  • Benchmark not required; payload bound is locked in by the regression tests
  • README feature matrix unaffected
  • Docstrings present and accurate (one wording nit above)

Add an explicit straddling-line test that asserts a line crossing a
tile seam renders on both sides. Fix the np.unique comment (inverse
map is a new array, not in-place), normalise the empty-branch
local_idx dtype to int32, and note in the cupy comment that for
sprawling-line workloads the cost shifts from graph payload to
per-tile PCIe uploads.
@brendancol brendancol merged commit cd95746 into main May 18, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rasterize: dask line_props/point_props embedded full in every tile task

1 participant