Skip to content

Use new faster auto lowpp class implementation#1244

Merged
mdboom merged 16 commits intoNVIDIA:mainfrom
mdboom:fast-auto-lowpp-class2
Dec 3, 2025
Merged

Use new faster auto lowpp class implementation#1244
mdboom merged 16 commits intoNVIDIA:mainfrom
mdboom:fast-auto-lowpp-class2

Conversation

@mdboom
Copy link
Contributor

@mdboom mdboom commented Nov 17, 2025

This is still somewhat a WIP, but I would like to get more CI.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Nov 17, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mdboom
Copy link
Contributor Author

mdboom commented Nov 17, 2025

/ok to test

@github-actions

This comment has been minimized.

@mdboom mdboom force-pushed the fast-auto-lowpp-class2 branch from 468f640 to 06930f5 Compare November 17, 2025 16:55
@mdboom mdboom force-pushed the fast-auto-lowpp-class2 branch from 06930f5 to f01c309 Compare November 17, 2025 19:14
@mdboom
Copy link
Contributor Author

mdboom commented Nov 17, 2025

/ok to test

@leofang leofang added enhancement Any code-related improvements P0 High priority - Must do! cuda.bindings Everything related to the cuda.bindings module labels Nov 18, 2025
@leofang leofang added this to the cuda-python 13-next, 12-next milestone Nov 18, 2025
@mdboom
Copy link
Contributor Author

mdboom commented Nov 21, 2025

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Nov 24, 2025

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Nov 24, 2025

/ok to test

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few quick questions, but otherwise LGTM. I think this PR can be un-drafted now? @mdboom do you plan to add more changes? Maybe you want to piggyback the dtype fix here?

@mdboom
Copy link
Contributor Author

mdboom commented Dec 1, 2025

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Dec 1, 2025

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Dec 1, 2025

/ok to test

@leofang leofang mentioned this pull request Dec 1, 2025
@mdboom
Copy link
Contributor Author

mdboom commented Dec 2, 2025

I have confirmed with the tool in #1067 that there is no change in the ABI.

Unfortunately, the changes in this PR are /not/ tested in CI, given this on all of the tests that use these types. We will need to perform some manual testing.

@pytest.mark.skipif(not isSupportedFilesystem(), reason="cuFile handle_register requires ext4 or xfs filesystem")

@mdboom mdboom marked this pull request as ready for review December 2, 2025 20:03
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Dec 2, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mdboom mdboom force-pushed the fast-auto-lowpp-class2 branch from 5507171 to 104a5bb Compare December 2, 2025 21:52
@mdboom
Copy link
Contributor Author

mdboom commented Dec 2, 2025

/ok to test

return self._data.read_size_kb_hist
cdef view.array arr = view.array(shape=(32,), itemsize=sizeof(uint64_t), format="Q", mode="c", allocate_buffer=False)
arr.data = <char *>(&(self._ptr[0].read_size_kb_hist))
return arr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Can we wrap it as a numpy array here to avoid breaking?

return self._data.write_size_kb_hist
cdef view.array arr = view.array(shape=(32,), itemsize=sizeof(uint64_t), format="Q", mode="c", allocate_buffer=False)
arr.data = <char *>(&(self._ptr[0].write_size_kb_hist))
return arr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@mdboom mdboom marked this pull request as draft December 3, 2025 12:42
@mdboom
Copy link
Contributor Author

mdboom commented Dec 3, 2025

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Dec 3, 2025

I have reverted the backward compatibility by returning numpy arrays, rather than Cython arrays, from members that have numeric array types.

I tested this again on a machine that doesn't skip the cufile tests and all is passing there.

There is still one breaking change here, noted in the release notes, that IMHO is just a real bug -- PerGpuStats was declared as a AUTO_LOWPP_CLASS rather than an AUTO_LOWPP_ARRAY, but it is always used as an array. You can see in the test code the weird backflips that were required to access it as an array that go away if you just declare it as such. Unfortunately, this is required here because I can't make PerGpuStats[0] return a Numpy array as before under the new implementation, but that never really made sense anyway.

@mdboom
Copy link
Contributor Author

mdboom commented Dec 3, 2025

/ok to test

@mdboom mdboom marked this pull request as ready for review December 3, 2025 14:47
@mdboom mdboom merged commit f04df74 into NVIDIA:main Dec 3, 2025
117 of 119 checks passed
@github-actions
Copy link

github-actions bot commented Dec 3, 2025

Doc Preview CI
Preview removed because the pull request was closed or merged.

@mdboom mdboom deleted the fast-auto-lowpp-class2 branch December 9, 2025 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings Everything related to the cuda.bindings module enhancement Any code-related improvements P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants