-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
ENH: Move identity to the ArrayMethod to allow customization #20970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1d873b0 to
8865cb8
Compare
|
This is almost ready, but it adds some (internal/experimental) API, and I am e.g. not sure about the names. I do think this is the right place though (putting anything of this into the type resolver seems like too much). One thing I now noticed is that we don't currently disable "reorderable" for object arrays or when The question is whether this should not reject reordering? Further, if I allow stride negation, I suppose So that reorderable stuff, needs one more test or so, and the API some bike-shedding. @mhvk if you have a bit of time, could you have a quick look? |
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a look but not quite complete, since I had some more general questions:
- For object dtype, would it make sense to not have a default either? It just seems asking for trouble...
- More on the implementation: should
reorderablebe separated from the identity? Conceptually, they are only somewhat related (e.g., in your string example, identity would be an empty string, but not reorderable). More generally, it would seem that the identify and whether a loop is reorderable can both depend on the dtype, but not necessarily the same way. Also, the code seems a bit contrived by combining the two.
Note that I realize you are just following what was there, but you are now on a path towards making it public API, so this seems the time to try to get it right... - Given above, perhaps an even more general question is whether the default identity and reorderability should be provided when adding a loop? That is the place where one truly knows what they are!
- Or, even more general: is providing a function to retrieve the identity the best way, or might it be easier to provide the ability to have a custom function to do the initialization of the output? A bit like
__array_prepare__used to be...
| * the `NPY_METH_ITEM_IS_DEFAULT` should also be set, but it is distinct. | ||
| * By default NumPy provides a "default" for `object` dtype, but does not use | ||
| * it as an identity. | ||
| * The `NPY_METH_IS_REORDERABLE` flag should be set if the operatio is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: operation
| } | ||
|
|
||
|
|
||
| /* Declared in `ufunc_object.c` */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to eventually be moved to some .h file so that it can be more normally included? Perhaps worth a TODO comment?
| return 0; | ||
| } | ||
|
|
||
| if (identity_obj != Py_None) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, but I'd phrase swap the logic and test if (identity_obj == Py_None) here, to make the flow more obvious. Could even combine with the if statement above.
| * calling ufunc. | ||
| */ | ||
| static int | ||
| default_get_identity_function(PyArrayMethod_Context *context, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels weird to combine getting the identity and setting the reorderable - see main comment.
| if (context->descriptors[0]->type_num != NPY_OBJECT) { | ||
| *flags |= NPY_METH_ITEM_IS_IDENTITY; | ||
| } | ||
| int res = PyArray_Pack(context->descriptors[0], item, identity_obj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is really the best to do this packing inside this function, instead of trying to interpret the result. See main comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a slight preference for using the raw data, also because I am still wondering if we might want dtypes without any scalar at all, and creating an array just to get the identity seems a bit heavy.
Alternatively, I could actually also Pack the initial provided by the user and use the raw version for both, although this version has the nice property that allowing arrays is changing one line to CopyObject :).
Yeah, my worry is that in most cases this is completely fine for I suspect we could remove the reorderable problem if we force C-order iteration (slowing things down, but keeping them working). That way, for N-D without reorderable, we define the order rather than error out.
I agree that they are two different things, but the reason I put them in the same callback is that they need to be queried at (roughly) the same time. One approach could be to assume that In that case I could pass in the information whether the reduction is empty (rather than pass out flags).
Not sure I follow. This moves the information to the A possibility would be to use the default loop only for the current legacy loops and otherwise not provide a default at all (i.e. default to an exception).
Hmmm, I first thought this might be interesting for One thing I don't like about |
Technically, we should ensure that we do all summations starting with -0 unless there is nothing to sum (in which case the result is clearly 0). This is a start, since the initial value is still filled in as 0 by the reduce machinery. However, it fixes the odd case where an inplace addition: x1 = np.array(-0.0) x2 = np.array(-0.0) x1 += x2 May go into the reduce code path (becaus strides are 0). We could avoid the reduce path there, but -0 here is strictly correct anyway and also a necessary step towards fixing `sum([-0., -0., -0.])` which still requires `initial=-0.0` to be passed manually right now. There are new `xfail` marked tests also checking the path without initial. Presumably, we should be using -0.0 as initial value, but 0.0 as default (if an array is empty) here. This may be more reasonably possible after numpygh-20970. Closes numpygh-21213, numpygh-21212
Technically, we should ensure that we do all summations starting with -0 unless there is nothing to sum (in which case the result is clearly 0). This is a start, since the initial value is still filled in as 0 by the reduce machinery. However, it fixes the odd case where an inplace addition: x1 = np.array(-0.0) x2 = np.array(-0.0) x1 += x2 May go into the reduce code path (becaus strides are 0). We could avoid the reduce path there, but -0 here is strictly correct anyway and also a necessary step towards fixing `sum([-0., -0., -0.])` which still requires `initial=-0.0` to be passed manually right now. There are new `xfail` marked tests also checking the path without initial. Presumably, we should be using -0.0 as initial value, but 0.0 as default (if an array is empty) here. This may be more reasonably possible after numpygh-20970. Closes numpygh-21213, numpygh-21212
2aa9f48 to
8cf5f64
Compare
|
OK, took me a bit to settle on things. But I think I am happy with it now:
We pass in |
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems quite nice! And array method now quite logical. My main question is whether the caching could not be made more general by having it be a pointer to data (which also avoids copying it).
Another is whether one can push the separation of reorderable and identity a bit further.
| translate_given_descrs_func *translate_given_descrs; | ||
| translate_loop_descrs_func *translate_loop_descrs; | ||
| /* Chunk used by the legacy fallback arraymethod mainly */ | ||
| char initial[sizeof(npy_clongdouble)]; /* initial value storage */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not be better to have this be a pointer, to be allocated? How does one avoid possible initials that are too long to fit (e.g., strings)?
...
Actually, looking further, I see it is only used for numbers, so clearly it is safe from this perspective. Could you add this to the comment?
It still seems a bit more elegant to use a pointer to allocated memory (and deallocate that on deletion, obviously), but perhaps that is overkill.
p.s. Possibly one could cache the PyArray_Pack piece.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I could do an allocation, this just seemed a bit simpler for a common case. In general, I want to be able to do dynamic allocation (say strings might have to be NULLed with some length, or an arbitrary precision number needs to be set to 1).
So I cannot generally cache, but for numbers make a lot of sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True that it is not quite general - still, since the allocation is done elsewhere and data is copied from this location, I'm not sure there is a net saving in complexity in having what seems a rather odd difficult to explain piece of PyArrayMethodObject_tag -- it just doesn't feel properly extensible/re-usable, unlike the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the one part that still feels super clunky. I complain in part because the rest feels really nice and logical yet here were suddenly have some opaque blob of bytes!
Looking at the code a bit more, a pointer to some memory allocated on first call would not be difficult right? You'd just have to be careful to deallocate it in arraymethod_dealloc, like now done with name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it would be easy to do such a pointer. Maybe our difference is that I still consider PyArrayMethodObject to be fully opaque and private to NumPy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the basic issue remains that having a chunk of random bytes feels clunky, but you are right that to me the struct really seems something that could become public (or, if not, a really useful thing for developers to look at), so one should keep it neat. And if it becomes public, having a pointer seems far more useful. Probably even better if it is a pointer to something well-defined like an array scalar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too disturbed about this, it is legacy and at some point could be pushed down into the legacy code. Have you given any thought to thread safety in the code? I don't think this is going to lead to problems, since the assignment should be atomic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, it is filled always with the same value and when the GIL is held, so I think it is fine. Will adept the comment to "only" not "mainly". Because you are right: reusing it outside of the "legacy" path might be prone to threadsafety issues.
| flags |= NPY_METH_IS_REORDERABLE; | ||
| } | ||
| if (identity_obj != Py_None) { | ||
| /* NOTE: We defer, just in case it fails in weird cases: */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe, this calls PyArray_Pack(identity_value). If you wrote a weird user dtype, maybe that actually fails for certain values (say -1). In that case, I don't want to fail at ufunc initialization time, since maybe it never worked, but nobody would call ufunc.reduce later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that makes sense! Maybe just update the comment with what you wrote above!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to just delete it here and expand the other one (changes pending).
For NumPy, it is convenient right now to write generic code. If we were not doing that, the functions would just use static things anway, i.e. be written as (very roughly):
static int
copy_initial_data<T>() {
*<T *>initial = 0;
return 0;
}
so yes, I didn't mean to make this available (at least for now).
numpy/core/tests/test_ufunc.py
Outdated
| out = np.array(None, dtype=object) | ||
| # When the loop is float but `out` is object this does not happen: | ||
| np.add.reduce([], out=out, dtype=np.float64) | ||
| assert type(out[()]) is not int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also test what it should be? (float?)
numpy/core/tests/test_ufunc.py
Outdated
| # This is an unsafe cast, but we currently always allow that. | ||
| # Note that the double loop is picked, but the cast fails. | ||
| np.add.reduce(arr, dtype=np.intp, out=out) | ||
| # `initial=None` disables the use of an identity here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just to ensure refcounts on it do not contribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanding the comment, the point is that the cast error may happen while copying the "first values", but that only happens when there is no initial.
96cbf4a to
bdfa38b
Compare
|
Added another test using the ScaledFloat stuff, that should also cover a whole lot more really. Will check on coverage tomorrow. Hopefully, I addressed most issues. I did not do anything about the way I cache things: Yes, this is not public API, but simply something we do within NumPy (and maybe that is OK, since NumPy numbers are a bit special in that future dtypes would mostly use a single implementation per DType). Thanks for the thorough reviews, Marten! |
0fba540 to
8d88ea4
Compare
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg - last try to argue for a pointer and allocating memory rather than having a blob! See in-line for a bit more concrete suggestions
| translate_given_descrs_func *translate_given_descrs; | ||
| translate_loop_descrs_func *translate_loop_descrs; | ||
| /* Chunk used by the legacy fallback arraymethod mainly */ | ||
| char initial[sizeof(npy_clongdouble)]; /* initial value storage */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the one part that still feels super clunky. I complain in part because the rest feels really nice and logical yet here were suddenly have some opaque blob of bytes!
Looking at the code a bit more, a pointer to some memory allocated on first call would not be difficult right? You'd just have to be careful to deallocate it in arraymethod_dealloc, like now done with name.
|
|
||
| if (PyTypeNum_ISNUMBER(context->descriptors[0]->type_num)) { | ||
| /* For numbers we can cache to avoid going via Python ints */ | ||
| memcpy(context->method->initial, initial, context->descriptors[0]->elsize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, my sense still would be to just allocate the right number of bytes here, copy over initial to those, and store the pointer (maybe context->method->cached_initial?).
Alternatively, making it all even less opaque, I guess one could repack initial here as a array or numpy scalar (which now has the right dtype) and cache that? Then, the replacement function could simply return the pointer to the data contained in that cached array and in the deallocation, all one would need is a Py_XDECREF(meth->cached_initial).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, my sense is still that this is simpler as long as it is just a small, private, optimization for now.
To me allocating it feels like we should provide infrastructure so that users can make use of that machinery. And I am not really motivated to aim for such a mechanism at this point (seems like little gain for additional complexity).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it was hidden in an obviously private place, not in a struct that feels like something that should become public, I'd agree!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need a third opinion :). Yes we may make (part of) that struct public, and then we would have to hide this away.
But I like keeping the cache and I don't like adding the complexity of an allocation at this time...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this shouldn't hold up what is a really nice PR -- maybe if someone else can have a quick look and see whether it is worth the effort doing an allocation, that would indeed be best. In the end it is really a "does this pass the smell test" matter and my sense of smell is not necessarily the best, especially for C.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is a bit awkward, but I think it is OK. Eventually I would like to see it go away in a future version of the experimental API. Maybe that field should be the last in ArrayMethodObject, so removing it would be less disruptive.
|
Would be nice to move this one forward soon (and the decref PR, although unrelated). Will mark it again for triage-review, just to nudge it :). |
Yeah, I could rebase to fix it (although the built is currently a bit broken due to gh-23034). |
|
Happy for this to go in with @mattip's comments addressed. Perhaps particularly in making the names really reflect what is legacy and what is not (no hack for the new stuff!). |
This also fixes/changes that the identity value is defined by the
reduce dtype and not by the result dtype. That does not make a
different in any sane use-case, but there is a theoretical change:
arr = np.array([])
out = np.array(None) # object array
np.add.reduce(arr, out=out, dtype=np.float64)
Where the output result was previously an integer 0, due to the
output being of `object` dtype, but now it is the correct float
due to the _operation_ being done in `float64`, so that the output
should be `np.zeros((), dtype=np.float64).astype(object)`.
Also add it to the wrapped array-method (ufunc) implementation so that a Unit dtype can reasonably use an identity for reductions.
Need to look into whether to cut out the dynamic discovery of reorderability though.
Its a new version (it changed ;)), plus I took the liberty to move things around a bit ABI wise.
As requested by Matti in review.
Co-authored-by: Matti Picus <matti.picus@gmail.com>
It should only be used by the legacy method, so also reflect that in the field name.
fcbafee to
52e1384
Compare
|
Ah, I hand't renamed the field, so done that now. I will look into the header changes as a followup. (Also rebased, for the sake of circle CI mainly). |
mattip
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks. One small nit, but that can be part of the follow on PR to deduplicate the headers.
|
Since this updates the experimental dtype API and we're planning to also add a new hook for a clear function in a followup to #22924, it might make sense to batch those two API breaks together. That said, I'm happy to clean up the dtypes in I agree that it would be best if the internal headers and external headers could share common definitions. That will avoid repeats of #21979 and the inconsistency between the headers I caught in this PR in the future. |
|
Yeah, the cleanup PR doesn't include exposing the new API yet. If the other PR can move forward in the next days, we can effectively squash the changes. But, I would prefer uncoupling the two things: Lets just get this over with, and deal with whatever small fallout exists for the experimental DTypes. |
This also fixes/changes that the identity value is defined by the
reduce dtype and not by the result dtype. That does not make a
different in any sane use-case, but there is a theoretical change:
Where the output result was previously an integer 0, due to the
output being of
objectdtype, but now it is the correct floatdue to the operation being done in
float64, so that the outputshould be
np.zeros((), dtype=np.float64).astype(object).This is a requirement for more advanced DTypes to pick a reasonable loop. E.g. a if there was a reduce for a structured dtype, the structured dtypes default/initial element is unlikely to fill in from a 0.
Another example is the physical unit, that may reject casting
0to0 meters, but needs the0 metersas identity.(Note that depending on the order of merges, the wrapping array-method stuff requires a specialized version here to make my Unit dtype reductions work.)