-
Notifications
You must be signed in to change notification settings - Fork 26.3k
asarray: Add support for NumPy scalars
#90914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90914
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 91a2bfc: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
asarray: Add support for NumPy scalars
lezcano
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned that this method already works for numpy scalars if you add the dtype. Wouldn't it be possible (and simpler) to extract the type from the NumPy scalar and, if the dtype was not set, set it to that of the numpy scalar, and if the dtype was set, assert that both the dtypes are the same?
|
I would say that that is another solution to this problem. Here's a comparison between the 2:
* Slicing is needed so we return a 0-dimensional tensor. Observing the steps above, I believe they are not significantly different (in added logic).
|
rgommers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ysiraichi. This LGTM, module a small comment on the docs. I don't have a clear preference between the two ways of implementing, they seem pretty similar and both work. Probably easiest to stay with the current logic.
torch/_torch_docs.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This note is probably too prominent, and most users won't really care about the details here. The paragraph above suffices I think.
torch/_torch_docs.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "but a NumPy scalar, a scalar or", how about "but a Python or NumPy scalar, or"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds better, indeed.
lezcano
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. There are many errors, but they don't seem related? Perhaps rebasing would make them go away?
torch/_torch_docs.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little confusing. The current documentation defines the follow priorities:
- tensor, NumPy array, DLPack capsule
- object implementing Python's buffer protocol
- scalar or sequence of scalars
Doesn't this PR want to add NumPy scalar to the first category, taking precedence over objects that implement the buffer protocol? What happens if a sequence of NumPy scalars is given? Also, this paragraph says that the datatype of the returned tensor is inferred from the scalar values, but for a NumPy scalar isn't the returned tensor's dtype mapped from the NumPy scalar's dtype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this PR want to add NumPy scalar to the first category, taking precedence over objects that implement the buffer protocol?
Yes, that's right.
What happens if a sequence of NumPy scalars is given?
They are treated as a Python sequence (i.e. datatype is inferred only if it's not explicitly specified).
Also, this paragraph says that the datatype of the returned tensor is inferred from the scalar values, but for a NumPy scalar isn't the returned tensor's dtype mapped from the NumPy scalar's dtype?
That's correct.
What if we added a new paragraph before this last one, like:
"When obj is a NumPy scalar, the returned tensor will be a 0-dimensional tensor that lives on the CPU device and doesn't share its memory (i.e. copy=True). Its datatype won't change unless otherwise specified."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That addition LGTM, but let's see what Mike has to say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems pretty reasonable. Small tweak suggestion:
"When obj is a NumPy scalar, the returned tensor will be a 0-dimensional tensor on the CPU and that doesn't share its memory (i.e. copy=True). By default datatype will be the PyTorch datatype corresponding to the NumPy's scalar's datatype."
?
a6549cf to
f68e282
Compare
|
PTAL @mruberry |
|
@mruberry This is a friendly reminder. Do you have some time to take a look at this PR? |
test/test_tensor_creation_ops.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this also test that tensor.dtype is float64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Good catch
mruberry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! See one testing suggestion
|
@pytorchbot rebase |
|
@pytorchbot successfully started a rebase job. Check the current status here |
|
Successfully rebased |
e11d8e9 to
91a2bfc
Compare
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / macos-12-py3-x86-64-lite-interpreter / build Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Follow up from: Quansight-Labs/numpy_pytorch_interop#3
This PR adds support for NumPy scalars for
torch.asarray.Before: treats the scalar as an object that implements the buffer protocol. Thus, interprets the data as the default data type (
float32)After: identifies the NumPy scalar, and does the "right" thing. i.e. creates a 0-dimensional tensor from the NumPy array that doesn't share its memory