ENH: New-style object sorting with descending support and NaN handling#31431
Conversation
seberg
left a comment
There was a problem hiding this comment.
Thanks, nice start and looks nice and simple for now!
I had to include a sentinel guard for out-of-bounds partitioning in quicksort because object comparisons can be unsafe, hopefully the constexpr avoids any performance deficit (probably will).
Hmmmm, I am a bit surprised, is this for objects that return obj < obj == True?
This seems fine, I am wondering if an angle where we just hard-code object identity to be equal from a sorting perspective wouldn't just make sense, since I think it solves this issue the same.
The biggest churn will be seeing if we can't handle error gracefully... I think that would be really nice, but might be annoying (requiring to threading error handling to every npy::cmp call...).
| int isnan_a = isnan(a); | ||
| int isnan_b = isnan(b); | ||
| if (isnan_a < 0 || isnan_b < 0) { | ||
| return 0; |
There was a problem hiding this comment.
We can rewrite this a bit, I think:
- If LT returns
true, we can swap (no further checks needed) - Since this is an
||we don't have to evaluate bothisnan()most of the time.
There was a problem hiding this comment.
Yes makes sense, thanks! I pushed a refactor.
| static int less(PyObject *a, PyObject *b) | ||
| { | ||
| /* | ||
| * work around gh-3879, we cannot abort an in-progress quicksort |
There was a problem hiding this comment.
Well, we are not using the compare function here now. So we should be able to do this, although we'll have to rely on the compiler to to optimize the error path away on the other ones.
It is annoying I admit, since right now cmp() returns true/false, and then it'll be able to return an error, so all call sites will have to deal with that.
Can you see how that pans out -- because this is one actual advantage we have here?
There was a problem hiding this comment.
Looking at this now! There's a fair bit of call sites but agree that it makes sense to allow threading errors.
| static int isnan(PyObject *a) { | ||
| if (a == NULL) { | ||
| return 1; | ||
| } |
There was a problem hiding this comment.
This is a bit annoying. Ideally, we should just translate NULL to Py_None at some point, since that is what NumPy generally does (it shouldn't normally happen though).
I think that means returning False here, though? (But we also would need the treatment earlier in the less/greater)
(In practice it doesn't matter, but I guess swapping should work with the original raw values and preserve NULL for refcounting reasons.)
There was a problem hiding this comment.
Yes, thanks - I just moved this NULL check to the top of less and greater if that works? There is some more duplication, perhaps we can even just make a _cmp function that takes in a compare op... (as less and greater only differ in the op)?
Just to write it down, as mentioned also yesterday. I think this is perfectly good even if it doesn't allow NA yet.1 Footnotes
|
I think poor orderings, e.g. intransitive comparisons, are the reason it is unsafe - and they can be present in user code of course. We do actually have tests that fail (usually even segfault) if this check is excluded - they mostly seem to explicitly check if poor orderings work.
#define IFLT(X, Y) if ((k = ISLT(X, Y)) < 0) goto fail; \
if (k) |
|
Yeah it isn't great... I really dislike Macros that include a return, but maybe it makes sense here? |
|
Thanks, pushed a macro Edit: nevermind, it seems statement expressions don't work for MSVC. I'm going to revert, sorry! Let me try to do a full refactor without macro. Edit: full refactor with error handling done! Doesn't look as pretty anymore, but perhaps more explicit for some loops anyway... :/ I left the string ones (and mergesort of course) unchanged. |
seberg
left a comment
There was a problem hiding this comment.
Thanks, FWIW, as much as it is too bad that this adds so many lines of code to propagate errors we don't need for most types, I think this is the right path.
At least unless we want to avoid implementing a specialized sort for object.
FWIW, my quick timings this is around 25% faster than what we currently have for object dtype (both random and already sorted -- and for random we may add the isnan check).
Doesn't matter all that much, but maybe it is a nice bonus. (It may be cool to add object to the benchmarks for this, but doesn't have to be here.)
12a530d to
b9f04dd
Compare
|
Thanks, I just rebased with main (to include the new sort benchmarks) and added Sort Benchmaks
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY. Overall object sorts are indeed faster, though there's a regression for ordered stable sorts for object - perhaps error handling is hurting there on a particular code path. Some other flakiness too, seem to be some regressions but smaller sorted blocks are faster? |
|
The object sort for already sorted will have a regression because of the additional check for NaN. I.e. effectively to check if we are already sorted, it's using @charris I was hoping you would chime in briefly on the API here, although I guess we use the |
|
Thanks, sorry for the delay, I checked the benchmarks and they seemed to be stable, so this was a real regression. Yes, I flipped the Sort Benchmarks
Not sure where those are coming from, perhaps the comparator. Also, I did recover With this, all objects need to implement EDIT: The |
This reverts commit 52b7c81.
… object regression
…arator This reverts commit faab2fc.
|
Sorry, I think I misunderstood, we clearly don't need the Sort Benchmarks
|
|
Thanks! Sorry quick question, but do you know what's up with the float16 benchmarks? I don't think we use it for that, but I wonder if adding (I guess this should be settling, but if this "inverted" logic would hinder this, we can also revert it...) |
|
Not quite sure! The 3x is pretty stable across runs, so it does seem to be a real regression. Adding float16-only sort benchmarks
I suspect the |
|
OK, pushed a change to the Sort Benchmarks
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY. Argsort benchmarks
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY. |
seberg
left a comment
There was a problem hiding this comment.
Thanks, as discussed a bit, I am really starting to think it was a terrible idea to think about using <= style comparisons, because if we avoid the __le__ explicitly then I am not quite sure about the implementation.
So, if that makes tests fail (for good reasons or not), then I think it might be best to just not do that cmp_eq thing. In the end, it is only interesting if it avoids NaN checks when no NaNs are involved (and even then it only might be interesting).
| `np.sort` and `np.argsort` with arrays of dtype `object` | ||
| now support passing `descending=True` to sort in descending order. | ||
| Unordered objects, i.e. `obj` such that `obj != obj`, are sorted |
There was a problem hiding this comment.
| `np.sort` and `np.argsort` with arrays of dtype `object` | |
| now support passing `descending=True` to sort in descending order. | |
| Unordered objects, i.e. `obj` such that `obj != obj`, are sorted | |
| `np.sort` and `np.argsort` with arrays of dtype ``object`` | |
| now support passing `descending=True` to sort in descending order. | |
| Unordered objects, i.e. ``obj`` such that ``obj != obj``, are now sorted |
There was a problem hiding this comment.
Thanks, fixed! I also cleaned up the release note a bit, not sure if obj != obj was actually nice, but seems more natural this way...
| b = np.concatenate((a[~nanmask][::-1], a[nanmask])) | ||
| if np.issubdtype(a.dtype, np.object_): | ||
| # cast to float for comparison, as object np.nan != np.nan | ||
| a = a.astype(float) |
There was a problem hiding this comment.
This looks wrong (i.e. the cast is before the actual sort).
| if (j < n && npy::cmp<Tag, reverse>(a[j], a[j + 1])) { | ||
| ret = npy::cmp<Tag, reverse>(a[j], a[j + 1]); | ||
| if (ret < 0) return ret; | ||
| if (j < n && ret) { |
There was a problem hiding this comment.
This may be wrong, i.e. it computes ret even if j < n isn't true. You could inline the (ret = ...) == 1 although not the prettiest maybe.
(I guess we could probably just delete heapsort in practice, but maybe not as part of this PR.)
There was a problem hiding this comment.
This is done, I think!
| return 0; | ||
| } | ||
|
|
||
| ret = PyObject_RichCompareBool(a, b, op); |
There was a problem hiding this comment.
hmmm if isnan(a) && isnan(b) then for this style of comparison, they are considered equal, I think?
I guess in practice that might not even matter for timesort, with just another sorting approach being taken (i.e. if the "already sorted" pass fails for NaNs that might be fine, but I am not quite sure, unless you are? then we should comment)...
I am now thinking I really led you astray here. I don't mind using <=, FWIW, (maybe with a small release note), we can undo if someone notices...
But, at this point it feels like it is adding a lot of annoyances, and I would be just as happy to not do it here. If someone ever wants to optimize it, they could follow up.
But, my guess is your re-factor seems to have optimized from 3 to 2 Python comparisons for the already sorted case but if we change it to something like:
def less_equal_with_nan(a, b):
if b > a:
return 0
elif b != b:
return 1
elif a != a:
return 0
return 1
which to me would seem safe, then we would again end up with 3 comparisons when a <= b is True and at that point the whole use of cmp_eq may be pretty much moot?
There was a problem hiding this comment.
Yeah, it feels a bit moot with the added complexity now, even if it is a bit optimized (from 3 to 2), and totally moot if not in the future. I don't think it warrants this much of tweaks... I've just gone and ahead and reverted these files to before the cmp_eq experiment, so we lose this baggage!
| @@ -0,0 +1,6 @@ | |||
| object array sorting supports `descending=True` | |||
There was a problem hiding this comment.
Maybe mention NaN/Nan-like objects here?
There was a problem hiding this comment.
Changed up to include, thanks!
|
Thanks for reviewing! Yeah, I think reverting is a good choice here, as there was added complexity on a few fronts. At least we know this is tricky to do now :) |
seberg
left a comment
There was a problem hiding this comment.
Made a tiny tweak, mostly removed the with errstate context from the benchmarks (because it didn't make sense to me, but who knows maybe I missed something).
Tiny tweak to the release note, but it's good enough.
One thing that is in a sense missing are tests that actual exercise the error paths, that might be a good follow-up, but I don't want to hold it off due to that.
Thanks, I'll put it in once tests pass, if there is something more, we can follow-up.
Addresses part of #31423. Adds sorting ArrayMethods for
objectthat supportdescending=Trueand new NaN-handling logic using templating. Treats any object such thatobj != objas NaN and sorts those to the end. ping @seberg, thanks!I had to include a sentinel guard for out-of-bounds partitioning in quicksort because object comparisons can be unsafe, hopefully the
constexpravoids any performance deficit (probably will). The docs are a bit drafty maybe but should be ready for initial review at least!AI Disclosure
I used LLMs a fair bit for debugging code snippets (much of which went nowhere, but they caught the out-of-bounds issue :))