Use new faster auto lowpp class implementation by mdboom · Pull Request #1244 · NVIDIA/cuda-python

mdboom · 2025-11-17T14:38:29Z

This is still somewhat a WIP, but I would like to get more CI.

copy-pr-bot · 2025-11-17T14:38:32Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

mdboom · 2025-11-17T14:38:37Z

/ok to test

mdboom · 2025-11-17T19:14:15Z

/ok to test

cuda_bindings/cuda/bindings/cycufile.pxd

cuda_bindings/cuda/bindings/cufile.pyx

cuda_bindings/cuda/bindings/cycufile.pxd

mdboom · 2025-11-21T20:51:38Z

/ok to test

mdboom · 2025-11-24T15:25:25Z

/ok to test

mdboom · 2025-11-24T17:23:15Z

/ok to test

leofang

Left a few quick questions, but otherwise LGTM. I think this PR can be un-drafted now? @mdboom do you plan to add more changes? Maybe you want to piggyback the dtype fix here?

cuda_bindings/cuda/bindings/cufile.pyx

mdboom · 2025-12-01T13:52:39Z

/ok to test

mdboom · 2025-12-01T20:46:12Z

/ok to test

mdboom · 2025-12-01T21:23:43Z

/ok to test

mdboom · 2025-12-02T20:03:07Z

I have confirmed with the tool in #1067 that there is no change in the ABI.

Unfortunately, the changes in this PR are /not/ tested in CI, given this on all of the tests that use these types. We will need to perform some manual testing.

@pytest.mark.skipif(not isSupportedFilesystem(), reason="cuFile handle_register requires ext4 or xfs filesystem")

copy-pr-bot · 2025-12-02T20:03:24Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

mdboom · 2025-12-02T21:58:34Z

/ok to test

cuda_bindings/docs/source/release/13.X.Y-notes.rst

leofang · 2025-12-02T22:36:20Z

cuda_bindings/cuda/bindings/cufile.pyx

-        return self._data.read_size_kb_hist
+        cdef view.array arr = view.array(shape=(32,), itemsize=sizeof(uint64_t), format="Q", mode="c", allocate_buffer=False)
+        arr.data = <char *>(&(self._ptr[0].read_size_kb_hist))
+        return arr


Q: Can we wrap it as a numpy array here to avoid breaking?

leofang · 2025-12-02T22:36:53Z

cuda_bindings/cuda/bindings/cufile.pyx

-        return self._data.write_size_kb_hist
+        cdef view.array arr = view.array(shape=(32,), itemsize=sizeof(uint64_t), format="Q", mode="c", allocate_buffer=False)
+        arr.data = <char *>(&(self._ptr[0].write_size_kb_hist))
+        return arr


mdboom · 2025-12-03T12:48:59Z

/ok to test

mdboom · 2025-12-03T13:06:36Z

I have reverted the backward compatibility by returning numpy arrays, rather than Cython arrays, from members that have numeric array types.

I tested this again on a machine that doesn't skip the cufile tests and all is passing there.

There is still one breaking change here, noted in the release notes, that IMHO is just a real bug -- PerGpuStats was declared as a AUTO_LOWPP_CLASS rather than an AUTO_LOWPP_ARRAY, but it is always used as an array. You can see in the test code the weird backflips that were required to access it as an array that go away if you just declare it as such. Unfortunately, this is required here because I can't make PerGpuStats[0] return a Numpy array as before under the new implementation, but that never really made sense anyway.

mdboom · 2025-12-03T13:09:58Z

/ok to test

github-actions · 2025-12-03T14:56:43Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

This comment has been minimized.

Sign in to view

mdboom force-pushed the fast-auto-lowpp-class2 branch from 468f640 to 06930f5 Compare November 17, 2025 16:55

Test the new AUTO_LOWPP_CLASS generation

f01c309

mdboom force-pushed the fast-auto-lowpp-class2 branch from 06930f5 to f01c309 Compare November 17, 2025 19:14

leofang reviewed Nov 17, 2025

View reviewed changes

cuda_bindings/cuda/bindings/cycufile.pxd Outdated Show resolved Hide resolved

leofang reviewed Nov 17, 2025

View reviewed changes

cuda_bindings/cuda/bindings/cycufile.pxd Show resolved Hide resolved

leofang reviewed Nov 17, 2025

View reviewed changes

cuda_bindings/cuda/bindings/cycufile.pxd Show resolved Hide resolved

leofang reviewed Nov 17, 2025

View reviewed changes

cuda_bindings/cuda/bindings/cufile.pyx Show resolved Hide resolved

kkraus14 reviewed Nov 17, 2025

View reviewed changes

cuda_bindings/cuda/bindings/cycufile.pxd Show resolved Hide resolved

leofang assigned mdboom Nov 18, 2025

leofang added enhancement Any code-related improvements P0 High priority - Must do! cuda.bindings Everything related to the cuda.bindings module labels Nov 18, 2025

leofang added this to the cuda-python 13-next, 12-next milestone Nov 18, 2025

Updates for numeric array

42c603e

More cufile changes

92a564f

Fix externs

3dc1fe4

leofang reviewed Nov 24, 2025

View reviewed changes

cuda_bindings/cuda/bindings/cufile.pyx Outdated Show resolved Hide resolved

cuda_bindings/cuda/bindings/cufile.pyx Show resolved Hide resolved

leofang mentioned this pull request Nov 27, 2025

cybind delayed updates (cufile) #1290

Closed

Update with bugfix and dtype fix

146021b

leofang approved these changes Dec 1, 2025

View reviewed changes

Free-threading fixes

ffbc5c9

Fix free-threading

5f71c54

leofang mentioned this pull request Dec 1, 2025

Prerelease NVML bindings #1284

Merged

leofang approved these changes Dec 2, 2025

View reviewed changes

mdboom marked this pull request as ready for review December 2, 2025 20:03

kkraus14 approved these changes Dec 2, 2025

View reviewed changes

mdboom added 5 commits December 2, 2025 16:52

Fix types

8ae081e

More bugfixes

57af93d

Make pergpustats an array

67615ef

More fixes

72b2a1f

Fix test

104a5bb

mdboom force-pushed the fast-auto-lowpp-class2 branch from 5507171 to 104a5bb Compare December 2, 2025 21:52

Add backward-incompatibility note

62f4c44

rwgk reviewed Dec 2, 2025

View reviewed changes

cuda_bindings/docs/source/release/13.X.Y-notes.rst Outdated Show resolved Hide resolved

leofang reviewed Dec 2, 2025

View reviewed changes

mdboom marked this pull request as draft December 3, 2025 12:42

mdboom added 2 commits December 3, 2025 07:47

Improve backward compatibility

1a26c2b

Restore comment

b6f6f7d

Merge remote-tracking branch 'upstream/main' into fast-auto-lowpp-class2

365d94f

mdboom marked this pull request as ready for review December 3, 2025 14:47

mdboom merged commit f04df74 into NVIDIA:main Dec 3, 2025
117 of 119 checks passed

mdboom deleted the fast-auto-lowpp-class2 branch December 9, 2025 16:12

Conversation

mdboom commented Nov 17, 2025

Uh oh!

copy-pr-bot bot commented Nov 17, 2025

Uh oh!

mdboom commented Nov 17, 2025

Uh oh!

This comment has been minimized.

mdboom commented Nov 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mdboom commented Nov 21, 2025

Uh oh!

mdboom commented Nov 24, 2025

Uh oh!

mdboom commented Nov 24, 2025

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mdboom commented Dec 1, 2025

Uh oh!

mdboom commented Dec 1, 2025

Uh oh!

mdboom commented Dec 1, 2025

Uh oh!

mdboom commented Dec 2, 2025

Uh oh!

copy-pr-bot bot commented Dec 2, 2025

Uh oh!

mdboom commented Dec 2, 2025

Uh oh!

Uh oh!

leofang Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

leofang Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

mdboom commented Dec 3, 2025

Uh oh!

mdboom commented Dec 3, 2025

Uh oh!

mdboom commented Dec 3, 2025

Uh oh!

Uh oh!

github-actions bot commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants