Skip to content

Conversation

@seberg
Copy link
Member

@seberg seberg commented Feb 1, 2022

This also fixes/changes that the identity value is defined by the
reduce dtype and not by the result dtype. That does not make a
different in any sane use-case, but there is a theoretical change:

arr = np.array([])
out = np.array(None)  # object array
np.add.reduce(arr, out=out, dtype=np.float64)

Where the output result was previously an integer 0, due to the
output being of object dtype, but now it is the correct float
due to the operation being done in float64, so that the output
should be np.zeros((), dtype=np.float64).astype(object).


This is a requirement for more advanced DTypes to pick a reasonable loop. E.g. a if there was a reduce for a structured dtype, the structured dtypes default/initial element is unlikely to fill in from a 0.
Another example is the physical unit, that may reject casting 0 to 0 meters, but needs the 0 meters as identity.

(Note that depending on the order of merges, the wrapping array-method stuff requires a specialized version here to make my Unit dtype reductions work.)

@seberg seberg force-pushed the reduce-identity-array-meth branch 4 times, most recently from 1d873b0 to 8865cb8 Compare February 4, 2022 03:14
@seberg
Copy link
Member Author

seberg commented Feb 4, 2022

This is almost ready, but it adds some (internal/experimental) API, and I am e.g. not sure about the names. I do think this is the right place though (putting anything of this into the type resolver seems like too much).

One thing I now noticed is that we don't currently disable "reorderable" for object arrays or when initial=None is passed:

In [2]: arr = np.array([["a", "b"], ["c", "d"]], dtype=object)

In [3]: arr.sum()
Out[3]: 'abcd'

In [4]: arr.T.sum()
Out[4]: 'abcd'

In [5]: arr.T.sum(initial=None)
Out[5]: 'abcd'

The question is whether this should not reject reordering? Further, if I allow stride negation, I suppose arr = np.array(["a", "b", "c", "d"], dtype=object); arr[::-1].sum() might still give abcd rather than dcba? (I can't reproduce this right now.)

So that reorderable stuff, needs one more test or so, and the API some bike-shedding. @mhvk if you have a bit of time, could you have a quick look?

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a look but not quite complete, since I had some more general questions:

  1. For object dtype, would it make sense to not have a default either? It just seems asking for trouble...
  2. More on the implementation: should reorderable be separated from the identity? Conceptually, they are only somewhat related (e.g., in your string example, identity would be an empty string, but not reorderable). More generally, it would seem that the identify and whether a loop is reorderable can both depend on the dtype, but not necessarily the same way. Also, the code seems a bit contrived by combining the two.
    Note that I realize you are just following what was there, but you are now on a path towards making it public API, so this seems the time to try to get it right...
  3. Given above, perhaps an even more general question is whether the default identity and reorderability should be provided when adding a loop? That is the place where one truly knows what they are!
  4. Or, even more general: is providing a function to retrieve the identity the best way, or might it be easier to provide the ability to have a custom function to do the initialization of the output? A bit like __array_prepare__ used to be...

* the `NPY_METH_ITEM_IS_DEFAULT` should also be set, but it is distinct.
* By default NumPy provides a "default" for `object` dtype, but does not use
* it as an identity.
* The `NPY_METH_IS_REORDERABLE` flag should be set if the operatio is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: operation

}


/* Declared in `ufunc_object.c` */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to eventually be moved to some .h file so that it can be more normally included? Perhaps worth a TODO comment?

return 0;
}

if (identity_obj != Py_None) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, but I'd phrase swap the logic and test if (identity_obj == Py_None) here, to make the flow more obvious. Could even combine with the if statement above.

* calling ufunc.
*/
static int
default_get_identity_function(PyArrayMethod_Context *context,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels weird to combine getting the identity and setting the reorderable - see main comment.

if (context->descriptors[0]->type_num != NPY_OBJECT) {
*flags |= NPY_METH_ITEM_IS_IDENTITY;
}
int res = PyArray_Pack(context->descriptors[0], item, identity_obj);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is really the best to do this packing inside this function, instead of trying to interpret the result. See main comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference for using the raw data, also because I am still wondering if we might want dtypes without any scalar at all, and creating an array just to get the identity seems a bit heavy.

Alternatively, I could actually also Pack the initial provided by the user and use the raw version for both, although this version has the nice property that allowing arrays is changing one line to CopyObject :).

@seberg
Copy link
Member Author

seberg commented Feb 6, 2022

  • For object dtype, would it make sense to not have a default either? It just seems asking for trouble...

Yeah, my worry is that in most cases this is completely fine for object arrays. Arrays of decimals can be summed and reordered just fine. Removing the "default" would only change things for empty object arrays, the reorderable flag is tricky, because it currently disallows multi-dimensional arrays completely.

I suspect we could remove the reorderable problem if we force C-order iteration (slowing things down, but keeping them working). That way, for N-D without reorderable, we define the order rather than error out.

  • More on the implementation: should reorderable be separated from the identity? Conceptually, they are only somewhat related (e.g., in your string example, identity would be an empty string, but not reorderable). More generally, it would seem that the identify and whether a loop is reorderable can both depend on the dtype, but not necessarily the same way. Also, the code seems a bit contrived by combining the two.
    Note that I realize you are just following what was there, but you are now on a path towards making it public API, so this seems the time to try to get it right...

I agree that they are two different things, but the reason I put them in the same callback is that they need to be queried at (roughly) the same time.

One approach could be to assume that reorderable is fully static. Then we would set it on registration time and keep it fixed forever.
I thought it may be nice to avoid that (might give a bit more dynamism in the future), but I would be fine to drop that one (in principle it could still be added at some point, i.e. if reorderable is not set allow ufunc to query explicitly).

In that case I could pass in the information whether the reduction is empty (rather than pass out flags).

  • Given above, perhaps an even more general question is whether the default identity and reorderability should be provided when adding a loop? That is the place where one truly knows what they are!

Not sure I follow. This moves the information to the ArrayMethod, i.e. to the loop. But if the loop does not provide the method, the default one is used (i.e. the one looking it up from the ufunc).

A possibility would be to use the default loop only for the current legacy loops and otherwise not provide a default at all (i.e. default to an exception).

  • Or, even more general: is providing a function to retrieve the identity the best way, or might it be easier to provide the ability to have a custom function to do the initialization of the output? A bit like __array_prepare__ used to be...

Hmmm, I first thought this might be interesting for gufuncs, but now I am confused. For reductions, we really only need a single element (unless we want to reduce a gufunc at some point). gufuncs are normal ufuncs though, in the sense that they should not really have state (they might have working space or so, but get_loop should be able to deal with that).

One thing I don't like about __array_prepare__ that I do not want to pass in the full arrays. The ufunc does not need to know that it operates on an N-dimensional NumPy array currently. It provides a single element for the identity (in principle, this would be the core dimensions of a gufunc, but I am not sure what this means how this would be used?).

seberg added a commit to seberg/numpy that referenced this pull request Mar 18, 2022
Technically, we should ensure that we do all summations starting with
-0 unless there is nothing to sum (in which case the result is clearly
0).
This is a start, since the initial value is still filled in as 0 by
the reduce machinery.  However, it fixes the odd case where an
inplace addition:

   x1 = np.array(-0.0)
   x2 = np.array(-0.0)
   x1 += x2

May go into the reduce code path (becaus strides are 0).  We could
avoid the reduce path there, but -0 here is strictly correct anyway
and also a necessary step towards fixing `sum([-0., -0., -0.])`
which still requires `initial=-0.0` to be passed manually right now.

There are new `xfail` marked tests also checking the path without
initial.  Presumably, we should be using -0.0 as initial value,
but 0.0 as default (if an array is empty) here.
This may be more reasonably possible after numpygh-20970.

Closes numpygh-21213, numpygh-21212
melissawm pushed a commit to melissawm/numpy that referenced this pull request Apr 12, 2022
Technically, we should ensure that we do all summations starting with
-0 unless there is nothing to sum (in which case the result is clearly
0).
This is a start, since the initial value is still filled in as 0 by
the reduce machinery.  However, it fixes the odd case where an
inplace addition:

   x1 = np.array(-0.0)
   x2 = np.array(-0.0)
   x1 += x2

May go into the reduce code path (becaus strides are 0).  We could
avoid the reduce path there, but -0 here is strictly correct anyway
and also a necessary step towards fixing `sum([-0., -0., -0.])`
which still requires `initial=-0.0` to be passed manually right now.

There are new `xfail` marked tests also checking the path without
initial.  Presumably, we should be using -0.0 as initial value,
but 0.0 as default (if an array is empty) here.
This may be more reasonably possible after numpygh-20970.

Closes numpygh-21213, numpygh-21212
@seberg seberg added the triage review Issue/PR to be discussed at the next triage meeting label Jul 7, 2022
@seberg seberg force-pushed the reduce-identity-array-meth branch 2 times, most recently from 2aa9f48 to 8cf5f64 Compare November 4, 2022 14:50
@seberg
Copy link
Member Author

seberg commented Nov 4, 2022

OK, took me a bit to settle on things. But I think I am happy with it now:

  • There is a single flag NPY_METH_IS_REORDERABLE. We could use this more generally, but for it is only used for reductions. (This removes the ability to decide on reorderability based on the actual dtype instance, that is probably OK and there should be way to add it if it turns out very useful.)
  • The "legacy" ArrayMethods, query the ufunc on initialization for this flag, and for wether an identity should exist.
  • I took the liberty of caching the identity for numeric loops. This shaves off ~5% of overhead, but it is very straight forward, so I like it.
  • Any non-legacy loop will default to no identity and not reorderable.

We pass in reduction_is_empty now, the implementor has to return 1 for "initial value exists" and 0 if not (-1 on critical error).

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems quite nice! And array method now quite logical. My main question is whether the caching could not be made more general by having it be a pointer to data (which also avoids copying it).

Another is whether one can push the separation of reorderable and identity a bit further.

translate_given_descrs_func *translate_given_descrs;
translate_loop_descrs_func *translate_loop_descrs;
/* Chunk used by the legacy fallback arraymethod mainly */
char initial[sizeof(npy_clongdouble)]; /* initial value storage */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not be better to have this be a pointer, to be allocated? How does one avoid possible initials that are too long to fit (e.g., strings)?
...
Actually, looking further, I see it is only used for numbers, so clearly it is safe from this perspective. Could you add this to the comment?

It still seems a bit more elegant to use a pointer to allocated memory (and deallocate that on deletion, obviously), but perhaps that is overkill.

p.s. Possibly one could cache the PyArray_Pack piece.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I could do an allocation, this just seemed a bit simpler for a common case. In general, I want to be able to do dynamic allocation (say strings might have to be NULLed with some length, or an arbitrary precision number needs to be set to 1).

So I cannot generally cache, but for numbers make a lot of sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True that it is not quite general - still, since the allocation is done elsewhere and data is copied from this location, I'm not sure there is a net saving in complexity in having what seems a rather odd difficult to explain piece of PyArrayMethodObject_tag -- it just doesn't feel properly extensible/re-usable, unlike the rest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one part that still feels super clunky. I complain in part because the rest feels really nice and logical yet here were suddenly have some opaque blob of bytes!

Looking at the code a bit more, a pointer to some memory allocated on first call would not be difficult right? You'd just have to be careful to deallocate it in arraymethod_dealloc, like now done with name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it would be easy to do such a pointer. Maybe our difference is that I still consider PyArrayMethodObject to be fully opaque and private to NumPy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the basic issue remains that having a chunk of random bytes feels clunky, but you are right that to me the struct really seems something that could become public (or, if not, a really useful thing for developers to look at), so one should keep it neat. And if it becomes public, having a pointer seems far more useful. Probably even better if it is a pointer to something well-defined like an array scalar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too disturbed about this, it is legacy and at some point could be pushed down into the legacy code. Have you given any thought to thread safety in the code? I don't think this is going to lead to problems, since the assignment should be atomic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, it is filled always with the same value and when the GIL is held, so I think it is fine. Will adept the comment to "only" not "mainly". Because you are right: reusing it outside of the "legacy" path might be prone to threadsafety issues.

flags |= NPY_METH_IS_REORDERABLE;
}
if (identity_obj != Py_None) {
/* NOTE: We defer, just in case it fails in weird cases: */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe, this calls PyArray_Pack(identity_value). If you wrote a weird user dtype, maybe that actually fails for certain values (say -1). In that case, I don't want to fail at ufunc initialization time, since maybe it never worked, but nobody would call ufunc.reduce later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that makes sense! Maybe just update the comment with what you wrote above!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to just delete it here and expand the other one (changes pending).

For NumPy, it is convenient right now to write generic code. If we were not doing that, the functions would just use static things anway, i.e. be written as (very roughly):

static int
copy_initial_data<T>() {
    *<T *>initial = 0;
    return 0;
}

so yes, I didn't mean to make this available (at least for now).

out = np.array(None, dtype=object)
# When the loop is float but `out` is object this does not happen:
np.add.reduce([], out=out, dtype=np.float64)
assert type(out[()]) is not int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also test what it should be? (float?)

# This is an unsafe cast, but we currently always allow that.
# Note that the double loop is picked, but the cast fails.
np.add.reduce(arr, dtype=np.intp, out=out)
# `initial=None` disables the use of an identity here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just to ensure refcounts on it do not contribute?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanding the comment, the point is that the cast error may happen while copying the "first values", but that only happens when there is no initial.

@seberg seberg added triaged Issue/PR that was discussed in a triage meeting and removed triage review Issue/PR to be discussed at the next triage meeting labels Nov 16, 2022
@seberg seberg force-pushed the reduce-identity-array-meth branch from 96cbf4a to bdfa38b Compare November 16, 2022 19:16
@seberg
Copy link
Member Author

seberg commented Nov 16, 2022

Added another test using the ScaledFloat stuff, that should also cover a whole lot more really. Will check on coverage tomorrow. Hopefully, I addressed most issues. I did not do anything about the way I cache things: Yes, this is not public API, but simply something we do within NumPy (and maybe that is OK, since NumPy numbers are a bit special in that future dtypes would mostly use a single implementation per DType).

Thanks for the thorough reviews, Marten!

@seberg seberg force-pushed the reduce-identity-array-meth branch 2 times, most recently from 0fba540 to 8d88ea4 Compare November 16, 2022 19:31
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg - last try to argue for a pointer and allocating memory rather than having a blob! See in-line for a bit more concrete suggestions

translate_given_descrs_func *translate_given_descrs;
translate_loop_descrs_func *translate_loop_descrs;
/* Chunk used by the legacy fallback arraymethod mainly */
char initial[sizeof(npy_clongdouble)]; /* initial value storage */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one part that still feels super clunky. I complain in part because the rest feels really nice and logical yet here were suddenly have some opaque blob of bytes!

Looking at the code a bit more, a pointer to some memory allocated on first call would not be difficult right? You'd just have to be careful to deallocate it in arraymethod_dealloc, like now done with name.


if (PyTypeNum_ISNUMBER(context->descriptors[0]->type_num)) {
/* For numbers we can cache to avoid going via Python ints */
memcpy(context->method->initial, initial, context->descriptors[0]->elsize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, my sense still would be to just allocate the right number of bytes here, copy over initial to those, and store the pointer (maybe context->method->cached_initial?).

Alternatively, making it all even less opaque, I guess one could repack initial here as a array or numpy scalar (which now has the right dtype) and cache that? Then, the replacement function could simply return the pointer to the data contained in that cached array and in the deallocation, all one would need is a Py_XDECREF(meth->cached_initial).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, my sense is still that this is simpler as long as it is just a small, private, optimization for now.
To me allocating it feels like we should provide infrastructure so that users can make use of that machinery. And I am not really motivated to aim for such a mechanism at this point (seems like little gain for additional complexity).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was hidden in an obviously private place, not in a struct that feels like something that should become public, I'd agree!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need a third opinion :). Yes we may make (part of) that struct public, and then we would have to hide this away.
But I like keeping the cache and I don't like adding the complexity of an allocation at this time...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this shouldn't hold up what is a really nice PR -- maybe if someone else can have a quick look and see whether it is worth the effort doing an allocation, that would indeed be best. In the end it is really a "does this pass the smell test" matter and my sense of smell is not necessarily the best, especially for C.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is a bit awkward, but I think it is OK. Eventually I would like to see it go away in a future version of the experimental API. Maybe that field should be the last in ArrayMethodObject, so removing it would be less disruptive.

@seberg
Copy link
Member Author

seberg commented Jan 9, 2023

Would be nice to move this one forward soon (and the decref PR, although unrelated). Will mark it again for triage-review, just to nudge it :).

@seberg seberg removed the triaged Issue/PR that was discussed in a triage meeting label Jan 9, 2023
@seberg
Copy link
Member Author

seberg commented Jan 20, 2023

I think the circleCI failure is due to a merge problem with older python still being used in the build

Yeah, I could rebase to fix it (although the built is currently a bit broken due to gh-23034).

@mhvk
Copy link
Contributor

mhvk commented Jan 20, 2023

Happy for this to go in with @mattip's comments addressed. Perhaps particularly in making the names really reflect what is legacy and what is not (no hack for the new stuff!).

seberg and others added 18 commits January 20, 2023 15:28
This also fixes/changes that the identity value is defined by the
reduce dtype and not by the result dtype.  That does not make a
different in any sane use-case, but there is a theoretical change:

    arr = np.array([])
    out = np.array(None)  # object array
    np.add.reduce(arr, out=out, dtype=np.float64)

Where the output result was previously an integer 0, due to the
output being of `object` dtype, but now it is the correct float
due to the _operation_ being done in `float64`, so that the output
should be `np.zeros((), dtype=np.float64).astype(object)`.
Also add it to the wrapped array-method (ufunc) implementation
so that a Unit dtype can reasonably use an identity for reductions.
Need to look into whether to cut out the dynamic discovery of
reorderability though.
Its a new version (it changed ;)), plus I took the liberty to move
things around a bit ABI wise.
Co-authored-by: Matti Picus <matti.picus@gmail.com>
It should only be used by the legacy method, so also reflect that
in the field name.
@seberg seberg force-pushed the reduce-identity-array-meth branch from fcbafee to 52e1384 Compare January 20, 2023 14:32
@seberg
Copy link
Member Author

seberg commented Jan 20, 2023

Ah, I hand't renamed the field, so done that now. I will look into the header changes as a followup. (Also rebased, for the sake of circle CI mainly).

Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. One small nit, but that can be part of the follow on PR to deduplicate the headers.

@ngoldbaum
Copy link
Member

Since this updates the experimental dtype API and we're planning to also add a new hook for a clear function in a followup to #22924, it might make sense to batch those two API breaks together. That said, I'm happy to clean up the dtypes in numpy-user-dtypes twice if we'd rather have two API breaks.

I agree that it would be best if the internal headers and external headers could share common definitions. That will avoid repeats of #21979 and the inconsistency between the headers I caught in this PR in the future.

@seberg
Copy link
Member Author

seberg commented Jan 20, 2023

Yeah, the cleanup PR doesn't include exposing the new API yet. If the other PR can move forward in the next days, we can effectively squash the changes. But, I would prefer uncoupling the two things: Lets just get this over with, and deal with whatever small fallout exists for the experimental DTypes.

@mattip mattip merged commit 747823c into numpy:main Jan 21, 2023
@mattip
Copy link
Member

mattip commented Jan 21, 2023

Thanks @seberg and @mhvk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

01 - Enhancement triage review Issue/PR to be discussed at the next triage meeting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants