ENH: Descending partitioning#31511
Conversation
There was a problem hiding this comment.
Note that this does not create new-style ArrayMethod registration analogous to the sorts, but rather updates the (apparently fully internal) partition_map and continues using the old-style PartitionFunc
I would like to introduce the new ArrayMethod approach, but currently this is all internal so I am very happy to start with this since it also splits things up into smaller PRs.
(Plus, may help to work on top-k in parallel.)
But yeah, needs tests and docs for sure :).
(Otherwise, I think kind+descending should be fine, except that kind is only a single option anyway, so it's a bit funny and we should maybe just have stable/unstable as for the sorts, even if that is even less interesting here probably.
)
| NPY_INTROSELECT = 0, | ||
| // new style names | ||
| NPY_SELECT_DEFAULT = 0, | ||
| NPY_SELECT_DESCENDING = 1, |
There was a problem hiding this comment.
Hmmm, I wonder if we should just also make it 4, so that nsorts and nselects might just be identical (no point right now, I guess).
There was a problem hiding this comment.
Thanks, yeah, right now the kind is selected by its position in the array... not great, but would go away when we put in ArrayMethods anyway!
There was a problem hiding this comment.
Right, but this is public API, so we can't defer this if that is what you mean? (and if that means changing the other code, so be it).
There was a problem hiding this comment.
Right, yeah, we should probably sort this out just now. I've made the enum effectively identical and changed the other code (I assume we're not actually adding the kwarg yet.) Not sure what to do with the NPY_NSELECTS...
Thanks for reviewing! Yes, this seems to be a good way to break the PRs, and I definitely will follow-up very soon with ArrayMethods. I added tests, will add docs soon! I found a bug where dtypes were silently falling to (default) sort even with descending, so I disabled that ( Edit: I was able to route datetime/timedelta to the existing introselect, and the tests are also enabled now!
Agreed having stable could be nice for consistency! Maybe would be nice to add another select algorithm but stable, from which we can fallback to stable sorts. |
3cdc80b to
400ed71
Compare
|
Added docs, release note, and fixed up the |
seberg
left a comment
There was a problem hiding this comment.
LGTM, thanks. A few smaller things, but most comments are just observations/thoughts really.
I might think about the docs a bit more. I wonder if it is clearer if we just keep the old text with smaller/greater even.
(And if the clarification at the end like "Sorting the partitions split by k-th individually would yield a fully sorted array." might be helpful.)
| import _testbuffer | ||
| except ImportError: | ||
| raise pytest.skip("_testbuffer is not available") | ||
| raise pytest.mark.skip("_testbuffer is not available") |
There was a problem hiding this comment.
| raise pytest.mark.skip("_testbuffer is not available") | |
| pytest.skip("_testbuffer is not available") |
Also above, pytest.skip() raises the SkipException. Although, in the first, we might as well use a @pytest.mark.skipif() decorator and in this one a _testbuffer = pytest.importskip("_testbuffer")
(I am not clear why this is suddenly showing up, but so be it.)
There was a problem hiding this comment.
Thanks, this is done! (For the first one, not sure if the decorator would work, since there an argument is part of the condition)
| axis: SupportsIndex = -1, | ||
| kind: _PartitionKind = "introselect", | ||
| order: None = None, | ||
| kind: _PartitionKind | None = None, order: None = None, |
|
|
||
| if (sortkind == _NPY_SELECT_UNDEFINED) { | ||
| // keywords only if sortkind not passed | ||
| sortkind = descending > 0? NPY_SELECT_DESCENDING: NPY_SELECT_DEFAULT; |
There was a problem hiding this comment.
| sortkind = descending > 0? NPY_SELECT_DESCENDING: NPY_SELECT_DEFAULT; | |
| sortkind = descending > 0 ? NPY_SELECT_DESCENDING : NPY_SELECT_DEFAULT; |
(but just silly nit)
| {"|kind", &PyArray_SelectkindConverter, &sortkind}, | ||
| {"|order", NULL, &order}) < 0) { | ||
| {"|order", NULL, &order}, | ||
| {"|descending", &PyArray_OptionalBoolConverter, &descending}) < 0) { |
There was a problem hiding this comment.
| {"|descending", &PyArray_OptionalBoolConverter, &descending}) < 0) { | |
| {"$descending", &PyArray_OptionalBoolConverter, &descending}) < 0) { |
Probably might as well make it kwonly here as well.
| specified, but unspecified fields will still be used, in the | ||
| order in which they come up in the dtype, to break ties. | ||
| descending : bool, optional | ||
| Partition order. If ``True``, the left partition would contain |
There was a problem hiding this comment.
I wonder if we should just start with "Sort order" (feels like it may be simpler).
There was a problem hiding this comment.
Yeah, that's better, this is done - thanks!
| indices of elements less than or equal to the k-th element and the | ||
| right partition will contain indices of elements greater than or | ||
| equal to the k-th element. If ``False`` or ``None``, the left partition | ||
| will contain indices of elements less than or equal to the k-th element |
There was a problem hiding this comment.
Whoops, this is reversed. But I am wondering if we can make it a bit shorter/closer to the sort docs and maybe even just say "partitioned in ascending/descending order" and rely on the introduction documentation to be clear enough on it?
There was a problem hiding this comment.
Sorry, the reversal is fixed! Shortening the docs would be nice. I think the issue is really that this is moved from the kth docs to here, and it's not natural. Perhaps we can move this to k-th, but it is not really relevant there. Maybe just like the original, a single smaller/greater wording is better, but it's not really correct for descending?
| are located to the left of this element and all equal or greater are | ||
| located to its right. The ordering of the elements in the two partitions | ||
| on the either side of the k-th element in the output array is undefined. | ||
| Partially sorts the array in such a way that the value of the element in k-th |
There was a problem hiding this comment.
| Partially sorts the array in such a way that the value of the element in k-th | |
| Partially sorts the array in such a way that the value of the element in the k-th |
I wonder if we should remove "the value of". Since I think we can just say k-th element and the other docs do (but this was there before).
There was a problem hiding this comment.
Article fixed, thanks! Yeah, I think we should remove it (missed in this push sorry). Let me do that.
|
Sorry, silly errors, thanks for the thorough review! Aside from the docs, I guess the big thing is checking astropy and pandas are fine with the default change for |
|
Pushed another iteration of the docs, not sure if it turned out much better with the parentheses, but the For the |
|
Cool, I am happy to not worry too much about it then. Although, I think it may still make sense to just go with |
|
Thanks, I switched to
Yes, that seems right! Going to make that change, thanks. ( |
seberg
left a comment
There was a problem hiding this comment.
One small round of nits, but we should get this done. I think the new docs read quite well so happy there.
The things around None default, etc. mirrors sorts, so I don't want to think about the best approach _NoValue use is good now.
My agent had a few tiny test suggestions, but I don't think any of them need to be fixed particularly.
| NPY_INTROSELECT = 0, | ||
| // new style names | ||
| NPY_SELECT_DEFAULT = 0, | ||
| NPY_SELECT_DESCENDING = 1, |
There was a problem hiding this comment.
Right, but this is public API, so we can't defer this if that is what you mean? (and if that means changing the other code, so be it).
| indexed by kth of them into their sorted position at once. | ||
| Element index to partition by. The k-th value of the array will | ||
| be in the position it would be in a sorted array, all elements that | ||
| sorted array, all elements that are less than this element (or |
There was a problem hiding this comment.
Docs got a little bit garbled here. "... that sorted array".
|
|
||
| no_nans = arange + 1j * arange | ||
| im_nans = arange + 1j * nans | ||
| re_nans = nans + 1j * arange |
There was a problem hiding this comment.
My agent actually noticed that this doesn't do what it says it does (maybe not new), this was the reason why I mentioned complex(..., ...) in a sense.
Just replacing 1j * nans with complex(0, np.nan) should do the trick (and the last nans just with np.nan or complex(np.nan, 0) just to be obvious)
There was a problem hiding this comment.
Ah yeah, that's right, thanks - I fixed this, but this actually unearthed a bad bug with the complex comparators (they were always a bit broken, but descending exposed it more)... I'm going to make a PR to fix that (as it probably should be backported), so I didn't update the sorting test here, but doing so in the other PR. Sorry it got missed in the descending PR.
847f77e to
715bac8
Compare
seberg
left a comment
There was a problem hiding this comment.
Thanks, one more comment on the kind. I am not quite sure if we should write descending=None or NoValue (in the docs), but we support None now, and it reads maybe better so I am happy with it.
I think the rest is all good. It would be nice to change it to just use the correct full sort (rather than the cmp function), but I think that is a good follow-up.
So with that one narrowing down, I think we can go ahead here.
| @@ -0,0 +1,9 @@ | |||
| New `descending` keyword argument for ``numpy.partition`` and ``numpy.argpartition`` | |||
There was a problem hiding this comment.
| New `descending` keyword argument for ``numpy.partition`` and ``numpy.argpartition`` | |
| New ``descending`` keyword argument for ``numpy.partition`` and ``numpy.argpartition`` |
also below string, etc. single ticks mean that it should be a link. numpy.partition can be single ticks though, because they can link to the actual functions :).
There was a problem hiding this comment.
Ah sorry, I kept switching these! Thank you, this is done.
| NPY_INTROSELECT = 0, | ||
| // new style names | ||
| NPY_SELECT_DEFAULT = 0, | ||
| NPY_SELECT_STABLE = 2, |
There was a problem hiding this comment.
Sorry, maybe this whole "make it identical" wasn't the best idea. But it's also fine. However, we should make sure that we reject unsupported values in get_partition_func which I don't think is currently the case (could do both and just delete the stable for now, but I do think we need to reject things).
There was a problem hiding this comment.
Thanks, yeah, I just switched to a switch statement like we have for sorts, seems more explicit and clearer hopefully! I'm happy to delete stable if you like.
… available for decsending
…scending with unsupported dtypes
Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>
715bac8 to
9f05d17
Compare
Draft to add a
descendingkwarg tonp.partitionandnp.argpartition, analogous to the sorting functions, thus addresses part of #31424. Not sure if theSELECTKINDchanges constitute an ABI or C-API break, but analogous to the sorts as well. cc @seberg, thanks!Note that this does not create new-style ArrayMethod registration analogous to the sorts, but rather updates the (apparently fully internal)
partition_mapand continues using the old-stylePartitionFunc. (In fact, thePyArray_(Arg)Partitionfunctions are unchanged!) Hopefully this is minimal enough and we can just add new-style ArrayMethods (and thus enable user-dtypes to use partitions) in follow-up? (This would then also unblock work fortop_k.)Still needs docs, tests, and a release note!
AI Disclosure
No AI used.