ENH: New-style object sorting with descending support and NaN handling by MaanasArora · Pull Request #31431 · numpy/numpy

MaanasArora · 2026-05-14T04:51:18Z

Addresses part of #31423. Adds sorting ArrayMethods for object that support descending=True and new NaN-handling logic using templating. Treats any object such that obj != obj as NaN and sorts those to the end. ping @seberg, thanks!

I had to include a sentinel guard for out-of-bounds partitioning in quicksort because object comparisons can be unsafe, hopefully the constexpr avoids any performance deficit (probably will). The docs are a bit drafty maybe but should be ready for initial review at least!

AI Disclosure

I used LLMs a fair bit for debugging code snippets (much of which went nowhere, but they caught the out-of-bounds issue :))

seberg

Thanks, nice start and looks nice and simple for now!

I had to include a sentinel guard for out-of-bounds partitioning in quicksort because object comparisons can be unsafe, hopefully the constexpr avoids any performance deficit (probably will).

Hmmmm, I am a bit surprised, is this for objects that return obj < obj == True?
This seems fine, I am wondering if an angle where we just hard-code object identity to be equal from a sorting perspective wouldn't just make sense, since I think it solves this issue the same.

The biggest churn will be seeing if we can't handle error gracefully... I think that would be really nice, but might be annoying (requiring to threading error handling to every npy::cmp call...).

seberg · 2026-05-14T07:06:46Z

+        int isnan_a = isnan(a);
+        int isnan_b = isnan(b); 
+        if (isnan_a < 0 || isnan_b < 0) {
+            return 0;


We can rewrite this a bit, I think:

If LT returns true, we can swap (no further checks needed)

Since this is an || we don't have to evaluate both isnan() most of the time.

Yes makes sense, thanks! I pushed a refactor.

seberg · 2026-05-14T07:09:19Z

+    static int less(PyObject *a, PyObject *b)
+    {
+        /*
+         * work around gh-3879, we cannot abort an in-progress quicksort


Well, we are not using the compare function here now. So we should be able to do this, although we'll have to rely on the compiler to to optimize the error path away on the other ones.

It is annoying I admit, since right now cmp() returns true/false, and then it'll be able to return an error, so all call sites will have to deal with that.

Can you see how that pans out -- because this is one actual advantage we have here?

Looking at this now! There's a fair bit of call sites but agree that it makes sense to allow threading errors.

seberg · 2026-05-14T07:15:32Z

+    static int isnan(PyObject *a) {
+        if (a == NULL) {
+            return 1;
+        }


This is a bit annoying. Ideally, we should just translate NULL to Py_None at some point, since that is what NumPy generally does (it shouldn't normally happen though).
I think that means returning False here, though? (But we also would need the treatment earlier in the less/greater)

(In practice it doesn't matter, but I guess swapping should work with the original raw values and preserve NULL for refcounting reasons.)

Yes, thanks - I just moved this NULL check to the top of less and greater if that works? There is some more duplication, perhaps we can even just make a _cmp function that takes in a compare op... (as less and greater only differ in the op)?

seberg · 2026-05-14T08:53:14Z

Treats any object such that obj != obj as NaN and sorts those to the end.

Just to write it down, as mentioned also yesterday. I think this is perfectly good even if it doesn't allow NA yet.¹

Nathan mentioned that for pandas NA support the StringDType actually allows errors to pass because pandas bool(NA) is an error. I don't mind trying to invent a pattern that makes pandas work, but I don't want to start with try/except on a hot path and I am not sure there are other patterns that work by knowing that ((NA < other) is NA. I suspect something may be workable but one has to be exceedingly careful since e.g. (False != False) is False as well). ↩

MaanasArora · 2026-05-15T03:10:53Z

Hmmmm, I am a bit surprised, is this for objects that return obj < obj == True?

I think poor orderings, e.g. intransitive comparisons, are the reason it is unsafe - and they can be present in user code of course. We do actually have tests that fail (usually even segfault) if this check is excluded - they mostly seem to explicitly check if poor orderings work.

The biggest churn will be seeing if we can't handle error gracefully...

grep -r "npy::cmp" reveals there are exactly 100 occurences of npy::cmp in the code! I guess we're excluding mergesorts, but that's just seven (so still 93). Most of them are inlined, so hard to return on error. But I agree a refactor would be very nice, even for user dtypes if we ever export (similar) templates. Perhaps a macro is a reasonable compromise here? I asked Claude to dig for this in CPython, which seems to use one:

#define IFLT(X, Y) if ((k = ISLT(X, Y)) < 0) goto fail;  \
           if (k)

(https://github.com/python/cpython/blob/461b1d96313de02992d284c1782be9aff24586c9/Objects/listobject.c#L1715-L1716)

seberg · 2026-05-15T09:41:25Z

Yeah it isn't great... I really dislike Macros that include a return, but maybe it makes sense here?
Although, I guess even then you can't have the return inside the if, so hmmmm...
But I really would prefer to not do this error checking dance :/.

MaanasArora · 2026-05-15T23:10:41Z

Thanks, pushed a macro NPY_CMP and handled any error propagation through call sites! I think it turned out a bit nice even.

Edit: nevermind, it seems statement expressions don't work for MSVC. I'm going to revert, sorry! Let me try to do a full refactor without macro.

Edit: full refactor with error handling done! Doesn't look as pretty anymore, but perhaps more explicit for some loops anyway... :/ I left the string ones (and mergesort of course) unchanged.

seberg

Thanks, FWIW, as much as it is too bad that this adds so many lines of code to propagate errors we don't need for most types, I think this is the right path.
At least unless we want to avoid implementing a specialized sort for object.

FWIW, my quick timings this is around 25% faster than what we currently have for object dtype (both random and already sorted -- and for random we may add the isnan check).
Doesn't matter all that much, but maybe it is a nice bonus. (It may be cool to add object to the benchmarks for this, but doesn't have to be here.)

MaanasArora · 2026-05-25T07:22:57Z

Thanks, I just rebased with main (to include the new sort benchmarks) and added object to the benchmarks! Here are the results, though I'll post a re-run (and include argsort) because they were a bit flaky:

Sort Benchmaks

Change	Before [`0e18dd2`]	After [`5edfd04`] <object-sorts~1>	Ratio	Benchmark (Parameter)
+	25.5±0.2ms	37.9±0.2ms	1.48	bench_function_base.Sort.time_sort(True, False, 'object', ('ordered',))
+	14.9±0.1ms	19.4±2ms	1.3	bench_function_base.Sort.time_sort(True, True, 'float16', ('sorted_block', 1000))
+	13.6±0.06ms	17.2±0.06ms	1.27	bench_function_base.Sort.time_sort(True, False, 'uint32', ('sorted_block', 10))
+	13.6±0.03ms	17.1±0.06ms	1.26	bench_function_base.Sort.time_sort(True, False, 'int32', ('sorted_block', 10))
+	19.0±0.2ms	23.2±0.8ms	1.22	bench_function_base.Sort.time_sort(True, True, 'float16', ('sorted_block', 100))
+	14.6±0.08ms	17.3±0.1ms	1.19	bench_function_base.Sort.time_sort(False, True, 'float16', ('reversed',))
+	11.2±0.02ms	13.4±0.04ms	1.19	bench_function_base.Sort.time_sort(False, True, 'float32', ('reversed',))
+	14.6±0.09ms	17.3±0.04ms	1.18	bench_function_base.Sort.time_sort(False, False, 'float16', ('uniform',))
+	5.41±0.1ms	6.29±0.3ms	1.16	bench_function_base.Sort.time_sort(False, True, 'int64', ('ordered',))
+	8.20±0.1ms	9.14±0.4ms	1.11	bench_function_base.Sort.time_sort(True, True, 'float16', ('reversed',))
+	15.7±0.2ms	17.3±0.5ms	1.1	bench_function_base.Sort.time_sort(False, False, 'float16', ('ordered',))
+	29.1±0.7ms	32.1±0.7ms	1.1	bench_function_base.Sort.time_sort(True, True, 'float16', ('sorted_block', 10))
+	6.21±0.04ms	6.77±0.03ms	1.09	bench_function_base.Sort.time_sort(False, True, 'float32', ('ordered',))
+	36.9±0.1ms	39.9±0.06ms	1.08	bench_function_base.Sort.time_sort(False, True, 'float16', ('sorted_block', 1000))
+	54.8±0.1ms	58.5±0.2ms	1.07	bench_function_base.Sort.time_sort(False, False, 'int16', ('random',))
+	45.8±0.1ms	48.8±0.2ms	1.07	bench_function_base.Sort.time_sort(False, True, 'float16', ('sorted_block', 10))
+	7.94±0.06ms	8.52±0.05ms	1.07	bench_function_base.Sort.time_sort(True, False, 'int64', ('sorted_block', 1000))
+	39.0±0.4ms	41.3±0.2ms	1.06	bench_function_base.Sort.time_sort(False, False, 'float16', ('sorted_block', 1000))
+	40.8±0.08ms	43.4±0.09ms	1.06	bench_function_base.Sort.time_sort(False, False, 'int16', ('sorted_block', 10))
+	45.2±0.03ms	48.1±0.1ms	1.06	bench_function_base.Sort.time_sort(False, False, 'int16', ('sorted_block', 100))
+	37.5±0.07ms	39.7±0.03ms	1.06	bench_function_base.Sort.time_sort(False, False, 'int16', ('sorted_block', 1000))
+	33.0±0.2ms	35.1±0.7ms	1.06	bench_function_base.Sort.time_sort(True, False, 'object', ('uniform',))
+	5.71±0.02ms	6.04±0.07ms	1.06	bench_function_base.Sort.time_sort(True, True, 'int32', ('sorted_block', 1000))
+	12.5±0.02ms	13.2±0.2ms	1.05	bench_function_base.Sort.time_sort(False, True, 'float32', ('uniform',))
-	6.27±0.09ms	5.95±0.08ms	0.95	bench_function_base.Sort.time_sort(False, False, 'bool', ('sorted_block', 10))
-	12.7±0.05ms	12.0±0.02ms	0.95	bench_function_base.Sort.time_sort(False, False, 'int8', ('sorted_block', 1000))
-	4.97±0.05ms	4.70±0.04ms	0.95	bench_function_base.Sort.time_sort(False, False, 'uint8', ('ordered',))
-	82.6±1ms	78.6±0.2ms	0.95	bench_function_base.Sort.time_sort(True, False, 'float16', ('random',))
-	677±4ms	641±3ms	0.95	bench_function_base.Sort.time_sort(True, False, 'object', ('random',))
-	5.34±0.1ms	5.02±0.07ms	0.94	bench_function_base.Sort.time_sort(False, False, 'bool', ('ordered',))
-	5.28±0.01ms	4.95±0.03ms	0.94	bench_function_base.Sort.time_sort(False, False, 'bool', ('uniform',))
-	5.29±0.02ms	4.98±0.06ms	0.94	bench_function_base.Sort.time_sort(False, False, 'uint8', ('uniform',))
-	5.31±0.02ms	4.99±0.01ms	0.94	bench_function_base.Sort.time_sort(False, True, 'bool', ('reversed',))
-	14.7±0.02ms	13.8±0.05ms	0.94	bench_function_base.Sort.time_sort(False, True, 'float16', ('uniform',))
-	5.29±0.05ms	4.94±0.03ms	0.94	bench_function_base.Sort.time_sort(False, True, 'uint8', ('reversed',))
-	10.8±0.1ms	10.1±0.1ms	0.94	bench_function_base.Sort.time_sort(True, True, 'int64', ('sorted_block', 100))
-	5.28±0.02ms	4.92±0.02ms	0.93	bench_function_base.Sort.time_sort(False, False, 'bool', ('reversed',))
-	5.29±0.03ms	4.94±0.01ms	0.93	bench_function_base.Sort.time_sort(False, False, 'int8', ('reversed',))
-	5.32±0.03ms	4.96±0.02ms	0.93	bench_function_base.Sort.time_sort(False, True, 'bool', ('uniform',))
-	5.03±0.04ms	4.69±0.03ms	0.93	bench_function_base.Sort.time_sort(False, True, 'uint8', ('ordered',))
-	6.45±0.1ms	5.91±0.04ms	0.92	bench_function_base.Sort.time_sort(False, True, 'bool', ('sorted_block', 10))
-	17.8±2ms	16.4±0.1ms	0.92	bench_function_base.Sort.time_sort(True, True, 'float64', ('sorted_block', 100))
-	6.06±0.1ms	5.49±0.04ms	0.91	bench_function_base.Sort.time_sort(False, False, 'bool', ('sorted_block', 100))
-	5.44±0.1ms	4.98±0.05ms	0.91	bench_function_base.Sort.time_sort(False, True, 'uint8', ('uniform',))
-	647±40μs	590±4μs	0.91	bench_function_base.Sort.time_sort(True, True, 'int32', ('reversed',))
-	8.82±0.2ms	7.98±0.1ms	0.91	bench_function_base.Sort.time_sort(True, True, 'int64', ('sorted_block', 1000))
-	5.22±0.05ms	4.69±0.02ms	0.9	bench_function_base.Sort.time_sort(False, False, 'int8', ('ordered',))
-	386±10μs	349±0.8μs	0.9	bench_function_base.Sort.time_sort(False, False, 'uint32', ('uniform',))
-	18.6±0.2ms	16.3±0.1ms	0.87	bench_function_base.Sort.time_sort(True, True, 'object', ('sorted_block', 10))
-	8.80±0.1ms	7.58±0.7ms	0.86	bench_function_base.Sort.time_sort(False, True, 'int32', ('reversed',))
-	19.4±0.2ms	16.6±0.3ms	0.86	bench_function_base.Sort.time_sort(True, True, 'int64', ('sorted_block', 10))
-	26.2±2ms	22.2±0.1ms	0.85	bench_function_base.Sort.time_sort(True, True, 'float64', ('sorted_block', 10))
-	16.2±0.07ms	12.6±0.03ms	0.78	bench_function_base.Sort.time_sort(True, True, 'int32', ('sorted_block', 10))
-	20.9±0.07ms	16.0±0.1ms	0.77	bench_function_base.Sort.time_sort(True, False, 'int64', ('sorted_block', 10))
-	16.1±0.04ms	12.4±0.07ms	0.77	bench_function_base.Sort.time_sort(True, True, 'uint32', ('sorted_block', 10))
-	20.9±0.3ms	16.0±0.2ms	0.76	bench_function_base.Sort.time_sort(True, False, 'object', ('sorted_block', 10))
-	718±9ms	541±10ms	0.75	bench_function_base.Sort.time_sort(False, False, 'object', ('random',))
-	26.1±0.7ms	17.6±0.4ms	0.68	bench_function_base.Sort.time_sort(True, False, 'object', ('reversed',))
-	344±0.6ms	202±3ms	0.59	bench_function_base.Sort.time_sort(False, False, 'object', ('ordered',))
-	628±1ms	373±1ms	0.59	bench_function_base.Sort.time_sort(False, False, 'object', ('reversed',))

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

Overall object sorts are indeed faster, though there's a regression for ordered stable sorts for object - perhaps error handling is hurting there on a particular code path. Some other flakiness too, seem to be some regressions but smaller sorted blocks are faster?

seberg · 2026-06-01T07:42:40Z

The object sort for already sorted will have a regression because of the additional check for NaN. I.e. effectively to check if we are already sorted, it's using !cmp(b, a) so not b < a. But for object that means we do at least one additional check for b != b (or a, not sure).
We could consider changing that by using a <= b and avoiding the not at the cost of changing the actual comparison operator to include equality and slightly more complexity as we now need a new helper cmp_eq or so...

@charris I was hoping you would chime in briefly on the API here, although I guess we use the != logic already in the NaN functions, so I don't see much problem. Maybe you have an opinion about the already sorted regression too.

MaanasArora · 2026-06-05T04:02:08Z

Thanks, sorry for the delay, I checked the benchmarks and they seemed to be stable, so this was a real regression. Yes, I flipped the not in the timsort by adding greater_equal and using a cmp_eq helper (except for strings); I only did this for the already sorted check. It didn't turn out too complex actually, thanks for the suggestion! The benchmarks have improved for object, but there are 3-4x regressions for float16 uniform/ordered:

Sort Benchmarks

Change	Before [`a20ef19`]	After [`c77fbf5`]	Ratio	Benchmark (Parameter)
+	1.03±0.01ms	4.60±0.05ms	4.46	bench_function_base.Sort.time_sort(True, False, 'float16', ('uniform',))
+	1.12±0.05ms	4.20±0.06ms	3.73	bench_function_base.Sort.time_sort(True, False, 'float16', ('ordered',))
+	14.7±0.07ms	17.3±2ms	1.17	bench_function_base.Sort.time_sort(True, True, 'float16', ('sorted_block', 1000))
+	12.1±0.1ms	13.9±0.1ms	1.15	bench_function_base.Sort.time_sort(False, True, 'float64', ('reversed',))
+	15.2±0.3ms	17.3±0.09ms	1.14	bench_function_base.Sort.time_sort(False, False, 'float16', ('uniform',))
+	708±4μs	807±40μs	1.14	bench_function_base.Sort.time_sort(True, False, 'uint32', ('reversed',))
+	15.0±0.3ms	16.9±0.3ms	1.13	bench_function_base.Sort.time_sort(False, True, 'float16', ('reversed',))
+	3.91±0.03ms	4.43±0.3ms	1.13	bench_function_base.Sort.time_sort(True, True, 'int16', ('reversed',))
+	28.9±0.5ms	32.4±0.8ms	1.12	bench_function_base.Sort.time_sort(True, True, 'float16', ('sorted_block', 10))
+	18.9±0.1ms	21.2±1ms	1.12	bench_function_base.Sort.time_sort(True, True, 'float16', ('sorted_block', 100))
+	5.59±0.1ms	6.14±0.07ms	1.1	bench_function_base.Sort.time_sort(True, True, 'uint32', ('sorted_block', 1000))
+	5.65±0.03ms	6.14±0.08ms	1.09	bench_function_base.Sort.time_sort(True, True, 'int32', ('sorted_block', 1000))
+	16.2±0.3ms	17.5±0.6ms	1.08	bench_function_base.Sort.time_sort(True, False, 'float16', ('sorted_block', 1000))
+	885±7μs	952±10μs	1.08	bench_function_base.Sort.time_sort(True, False, 'float32', ('uniform',))
+	1.05±0.01ms	1.12±0.04ms	1.07	bench_function_base.Sort.time_sort(True, False, 'uint8', ('sorted_block', 100))
+	7.61±0.04ms	8.15±0.04ms	1.07	bench_function_base.Sort.time_sort(True, True, 'int32', ('sorted_block', 100))
+	13.2±0.3ms	14.0±0.5ms	1.06	bench_function_base.Sort.time_sort(True, False, 'float64', ('sorted_block', 1000))
+	347±3μs	367±2μs	1.06	bench_function_base.Sort.time_sort(True, False, 'int16', ('ordered',))
+	76.8±0.6ms	81.3±0.6ms	1.06	bench_function_base.Sort.time_sort(True, True, 'float64', ('random',))
-	23.6±3ms	22.4±0.06ms	0.95	bench_function_base.Sort.time_sort(False, False, 'uint8', ('sorted_block', 100))
-	6.52±0.1ms	6.18±0.04ms	0.95	bench_function_base.Sort.time_sort(False, True, 'bool', ('sorted_block', 10))
-	13.7±0.06ms	13.0±0.1ms	0.95	bench_function_base.Sort.time_sort(True, False, 'int32', ('sorted_block', 10))
-	15.7±0.09ms	14.8±0.09ms	0.94	bench_function_base.Sort.time_sort(False, True, 'float16', ('ordered',))
-	15.0±0.1ms	14.1±0.1ms	0.94	bench_function_base.Sort.time_sort(False, True, 'float16', ('uniform',))
-	5.31±0.02ms	5.02±0.03ms	0.94	bench_function_base.Sort.time_sort(False, True, 'int8', ('reversed',))
-	5.30±0.04ms	4.98±0.02ms	0.94	bench_function_base.Sort.time_sort(False, True, 'uint8', ('reversed',))
-	12.6±0.07ms	11.9±0.03ms	0.94	bench_function_base.Sort.time_sort(False, True, 'uint8', ('sorted_block', 1000))
-	10.7±0.3ms	10.1±0.08ms	0.94	bench_function_base.Sort.time_sort(True, True, 'object', ('sorted_block', 100))
-	5.98±0.2ms	5.50±0.04ms	0.92	bench_function_base.Sort.time_sort(False, False, 'float32', ('sorted_block', 100))
-	6.21±0.1ms	5.69±0.04ms	0.92	bench_function_base.Sort.time_sort(False, False, 'int32', ('sorted_block', 10))
-	5.14±0.04ms	4.72±0.02ms	0.92	bench_function_base.Sort.time_sort(False, True, 'int8', ('ordered',))
-	8.99±0.03ms	8.26±0.1ms	0.92	bench_function_base.Sort.time_sort(True, False, 'int32', ('sorted_block', 100))
-	8.78±0.04ms	8.10±0.1ms	0.92	bench_function_base.Sort.time_sort(True, False, 'uint32', ('sorted_block', 100))
-	11.0±1ms	9.83±0.1ms	0.9	bench_function_base.Sort.time_sort(False, False, 'bool', ('random',))
-	6.82±3ms	6.10±0.06ms	0.9	bench_function_base.Sort.time_sort(False, False, 'bool', ('sorted_block', 10))
-	25.0±2ms	22.5±0.2ms	0.9	bench_function_base.Sort.time_sort(False, False, 'int8', ('sorted_block', 10))
-	6.07±0.2ms	5.44±0.05ms	0.9	bench_function_base.Sort.time_sort(False, True, 'bool', ('sorted_block', 1000))
-	5.83±0.1ms	5.22±0.1ms	0.9	bench_function_base.Sort.time_sort(False, True, 'int64', ('ordered',))
-	5.92±0.6ms	5.26±0.07ms	0.89	bench_function_base.Sort.time_sort(False, False, 'bool', ('uniform',))
-	5.64±0.4ms	5.00±0.02ms	0.89	bench_function_base.Sort.time_sort(False, False, 'uint8', ('reversed',))
-	13.5±1ms	12.0±0.08ms	0.89	bench_function_base.Sort.time_sort(False, False, 'uint8', ('sorted_block', 1000))
-	6.41±0.2ms	5.68±0.08ms	0.89	bench_function_base.Sort.time_sort(False, True, 'bool', ('sorted_block', 100))
-	6.89±0.07ms	6.13±0.1ms	0.89	bench_function_base.Sort.time_sort(True, False, 'int32', ('sorted_block', 1000))
-	6.76±0.02ms	6.03±0.04ms	0.89	bench_function_base.Sort.time_sort(True, False, 'uint32', ('sorted_block', 1000))
-	26.0±1ms	23.2±0.3ms	0.89	bench_function_base.Sort.time_sort(True, True, 'float64', ('sorted_block', 10))
-	6.00±0.03ms	5.29±0.09ms	0.88	bench_function_base.Sort.time_sort(False, True, 'int16', ('reversed',))
-	24.8±2ms	21.6±0.1ms	0.87	bench_function_base.Sort.time_sort(False, False, 'int8', ('sorted_block', 100))
-	13.7±1ms	12.0±0.03ms	0.87	bench_function_base.Sort.time_sort(False, False, 'int8', ('sorted_block', 1000))
-	5.05±0.06ms	4.37±0.3ms	0.87	bench_function_base.Sort.time_sort(False, True, 'uint8', ('ordered',))
-	34.4±5ms	29.6±0.08ms	0.86	bench_function_base.Sort.time_sort(False, False, 'int8', ('random',))
-	6.45±0.7ms	5.42±0.04ms	0.84	bench_function_base.Sort.time_sort(False, False, 'bool', ('sorted_block', 1000))
-	6.09±0.05ms	5.14±0.04ms	0.84	bench_function_base.Sort.time_sort(False, True, 'int16', ('ordered',))
-	6.36±1ms	5.26±0.2ms	0.83	bench_function_base.Sort.time_sort(False, False, 'bool', ('ordered',))
-	20.9±0.1ms	17.2±0.2ms	0.83	bench_function_base.Sort.time_sort(True, False, 'object', ('sorted_block', 10))
-	6.46±2ms	5.25±0.05ms	0.81	bench_function_base.Sort.time_sort(False, False, 'bool', ('reversed',))
-	21.2±0.5ms	17.1±0.2ms	0.81	bench_function_base.Sort.time_sort(True, False, 'int64', ('sorted_block', 10))
-	16.3±0.1ms	12.9±0.09ms	0.79	bench_function_base.Sort.time_sort(True, True, 'int32', ('sorted_block', 10))
-	16.5±0.07ms	13.0±0.5ms	0.79	bench_function_base.Sort.time_sort(True, True, 'uint32', ('sorted_block', 10))
-	7.08±0.8ms	5.53±0.3ms	0.78	bench_function_base.Sort.time_sort(False, False, 'bool', ('sorted_block', 100))
-	5.99±1ms	4.68±0.05ms	0.78	bench_function_base.Sort.time_sort(False, False, 'uint8', ('ordered',))
-	6.18±2ms	4.39±0.01ms	0.71	bench_function_base.Sort.time_sort(False, False, 'int8', ('ordered',))
-	26.8±0.7ms	19.0±0.6ms	0.71	bench_function_base.Sort.time_sort(True, False, 'object', ('ordered',))
-	27.6±0.9ms	18.7±1ms	0.68	bench_function_base.Sort.time_sort(True, False, 'object', ('reversed',))
-	8.88±1ms	5.78±0.02ms	0.65	bench_function_base.Sort.time_sort(False, False, 'int16', ('reversed',))
-	7.91±4ms	4.95±0.08ms	0.63	bench_function_base.Sort.time_sort(False, False, 'int8', ('uniform',))
-	364±30ms	212±0.5ms	0.58	bench_function_base.Sort.time_sort(False, False, 'object', ('ordered',))
-	663±50ms	382±2ms	0.58	bench_function_base.Sort.time_sort(False, False, 'object', ('reversed',))
-	973±200ms	558±10ms	0.57	bench_function_base.Sort.time_sort(False, False, 'object', ('random',))
-	33.6±0.3ms	13.0±0.3ms	0.39	bench_function_base.Sort.time_sort(True, False, 'object', ('uniform',))

Not sure where those are coming from, perhaps the comparator. Also, I did recover cmp rather than cmp_eq for descending, which probably makes sense as we use the inverted version (which usually returns early rather than goes all the way.)

With this, all objects need to implement <= and >= to work, as caught by the test_sort_bad_ordering test (which I updated to add a __le__ method to the bogus class for now). I guess that could be annoying, should we add a fallback? (If we do so in the comparators, it would be silently slower, but I don't know if we should alter the sorts clearly.)

EDIT: The float16 stuff seems to be random compiler choices again, given there really is no branching difference... changing the _equal comparators a bit optimized it more for me, but no point in pushing I guess.

… error

This reverts commit 52b7c81.

…functions

… object regression

…arator This reverts commit faab2fc.

MaanasArora · 2026-06-08T10:11:14Z

Sorry, I think I misunderstood, we clearly don't need the __le__/__ge__ methods to do this; the inverted op is enough! Just pushed a change doing that instead. Benchmarks against main now:

Sort Benchmarks

Change	Before [`a20ef19`]	After [`ca5446d`]	Ratio	Benchmark (Parameter)
+	1.10±0.4ms	3.58±0.4ms	3.27	bench_function_base.Sort.time_sort(True, False, 'float16', ('uniform',))
+	1.16±0.07ms	3.52±0.1ms	3.03	bench_function_base.Sort.time_sort(True, False, 'float16', ('ordered',))
+	1.17±0ms	3.52±0.04ms	3	bench_function_base.Sort.time_sort(True, True, 'float16', ('ordered',))
+	1.16±0.01ms	3.08±0.3ms	2.65	bench_function_base.Sort.time_sort(True, True, 'float16', ('uniform',))
+	889±30μs	1.19±0.1ms	1.34	bench_function_base.Sort.time_sort(True, True, 'float32', ('uniform',))
+	13.5±0.09ms	17.2±0.08ms	1.27	bench_function_base.Sort.time_sort(True, False, 'uint32', ('sorted_block', 10))
+	22.0±0.7ms	26.0±0.2ms	1.18	bench_function_base.Sort.time_sort(True, False, 'float64', ('sorted_block', 10))
+	14.9±0.07ms	17.0±0.1ms	1.15	bench_function_base.Sort.time_sort(False, True, 'float16', ('uniform',))
+	12.0±0.04ms	13.8±0.1ms	1.15	bench_function_base.Sort.time_sort(False, True, 'float64', ('reversed',))
+	15.7±0.2ms	17.4±0.07ms	1.11	bench_function_base.Sort.time_sort(False, False, 'float16', ('ordered',))
+	5.20±0.05ms	5.79±0.09ms	1.11	bench_function_base.Sort.time_sort(False, False, 'int16', ('reversed',))
+	352±2μs	385±5μs	1.09	bench_function_base.Sort.time_sort(False, False, 'int32', ('uniform',))
+	17.0±0.3ms	18.3±0.06ms	1.08	bench_function_base.Sort.time_sort(False, False, 'float16', ('reversed',))
+	39.2±0.4ms	42.3±0.09ms	1.08	bench_function_base.Sort.time_sort(False, False, 'float16', ('sorted_block', 1000))
+	19.0±0.1ms	20.6±0.2ms	1.08	bench_function_base.Sort.time_sort(True, True, 'float16', ('sorted_block', 100))
+	49.4±0.2ms	52.8±0.04ms	1.07	bench_function_base.Sort.time_sort(False, False, 'float16', ('sorted_block', 10))
+	18.7±0.2ms	20.1±0.08ms	1.07	bench_function_base.Sort.time_sort(True, False, 'float32', ('sorted_block', 10))
+	480±2μs	513±20μs	1.07	bench_function_base.Sort.time_sort(True, False, 'uint32', ('ordered',))
+	5.71±0.02ms	6.08±0.1ms	1.07	bench_function_base.Sort.time_sort(True, True, 'int32', ('sorted_block', 1000))
+	50.4±0.3ms	53.5±0.2ms	1.06	bench_function_base.Sort.time_sort(False, False, 'float16', ('sorted_block', 100))
+	45.7±0.06ms	48.2±0.1ms	1.06	bench_function_base.Sort.time_sort(False, True, 'float16', ('sorted_block', 10))
-	22.7±0.09ms	21.6±0.04ms	0.95	bench_function_base.Sort.time_sort(False, False, 'int8', ('sorted_block', 100))
-	5.03±0.04ms	4.75±0.01ms	0.95	bench_function_base.Sort.time_sort(False, False, 'uint8', ('ordered',))
-	12.6±0.04ms	12.0±0.09ms	0.95	bench_function_base.Sort.time_sort(False, False, 'uint8', ('sorted_block', 1000))
-	9.49±0.1ms	8.98±0.1ms	0.95	bench_function_base.Sort.time_sort(False, True, 'int64', ('reversed',))
-	23.0±0.1ms	21.9±0.04ms	0.95	bench_function_base.Sort.time_sort(False, True, 'int8', ('sorted_block', 10))
-	8.50±0.06ms	8.04±0.07ms	0.95	bench_function_base.Sort.time_sort(False, True, 'uint32', ('reversed',))
-	24.0±0.09ms	22.4±0.04ms	0.94	bench_function_base.Sort.time_sort(False, False, 'int8', ('sorted_block', 10))
-	12.7±0.05ms	11.9±0.01ms	0.94	bench_function_base.Sort.time_sort(False, False, 'int8', ('sorted_block', 1000))
-	5.27±0.04ms	4.94±0.02ms	0.94	bench_function_base.Sort.time_sort(False, True, 'int8', ('reversed',))
-	12.6±0.05ms	11.8±0.02ms	0.94	bench_function_base.Sort.time_sort(False, True, 'uint8', ('sorted_block', 1000))
-	80.0±0.9ms	75.3±0.2ms	0.94	bench_function_base.Sort.time_sort(True, True, 'float16', ('random',))
-	5.65±0.03ms	5.27±0.05ms	0.93	bench_function_base.Sort.time_sort(False, True, 'uint32', ('uniform',))
-	6.10±0.1ms	5.61±0.09ms	0.92	bench_function_base.Sort.time_sort(False, False, 'bool', ('sorted_block', 100))
-	6.11±0.2ms	5.64±0.02ms	0.92	bench_function_base.Sort.time_sort(False, True, 'bool', ('sorted_block', 100))
-	8.87±0.2ms	8.16±0.2ms	0.92	bench_function_base.Sort.time_sort(False, True, 'int32', ('reversed',))
-	5.06±0.05ms	4.65±0.02ms	0.92	bench_function_base.Sort.time_sort(False, True, 'uint8', ('ordered',))
-	77.0±7ms	70.7±0.2ms	0.92	bench_function_base.Sort.time_sort(True, False, 'int32', ('random',))
-	30.2±0.5ms	27.8±0.1ms	0.92	bench_function_base.Sort.time_sort(True, True, 'float16', ('sorted_block', 10))
-	20.1±0.7ms	18.6±0.9ms	0.92	bench_function_base.Sort.time_sort(True, True, 'float32', ('sorted_block', 10))
-	15.5±0.1ms	14.2±0.05ms	0.91	bench_function_base.Sort.time_sort(False, True, 'float16', ('ordered',))
-	857±30μs	779±20μs	0.91	bench_function_base.Sort.time_sort(True, False, 'int64', ('uniform',))
-	5.86±0.01ms	5.26±0.1ms	0.9	bench_function_base.Sort.time_sort(False, True, 'int16', ('reversed',))
-	5.20±0.04ms	4.67±0.04ms	0.9	bench_function_base.Sort.time_sort(False, True, 'int8', ('ordered',))
-	6.01±0.2ms	5.38±0.03ms	0.89	bench_function_base.Sort.time_sort(False, True, 'bool', ('sorted_block', 1000))
-	18.4±0.3ms	16.2±0.2ms	0.88	bench_function_base.Sort.time_sort(True, True, 'object', ('sorted_block', 10))
-	5.38±0.04ms	4.64±0.05ms	0.86	bench_function_base.Sort.time_sort(False, False, 'int8', ('reversed',))
-	6.03±0.03ms	5.21±0.2ms	0.86	bench_function_base.Sort.time_sort(False, True, 'int16', ('ordered',))
-	19.7±0.3ms	16.8±0.2ms	0.85	bench_function_base.Sort.time_sort(True, True, 'int64', ('sorted_block', 10))
-	4.30±2ms	3.54±0.01ms	0.82	bench_function_base.Sort.time_sort(True, False, 'bool', ('sorted_block', 1000))
-	7.51±2ms	6.14±0.03ms	0.82	bench_function_base.Sort.time_sort(True, False, 'int32', ('sorted_block', 1000))
-	1.45±0.3ms	1.17±0.07ms	0.8	bench_function_base.Sort.time_sort(True, True, 'float64', ('uniform',))
-	21.3±0.2ms	16.9±0.2ms	0.79	bench_function_base.Sort.time_sort(True, False, 'int64', ('sorted_block', 10))
-	16.2±0.09ms	12.8±0.2ms	0.79	bench_function_base.Sort.time_sort(True, True, 'int32', ('sorted_block', 10))
-	16.2±0.04ms	12.7±0.2ms	0.78	bench_function_base.Sort.time_sort(True, True, 'uint32', ('sorted_block', 10))
-	17.0±6ms	12.8±0.05ms	0.75	bench_function_base.Sort.time_sort(True, False, 'int32', ('sorted_block', 10))
-	2.15±0.4ms	1.60±0ms	0.75	bench_function_base.Sort.time_sort(True, False, 'int8', ('sorted_block', 1000))
-	729±9ms	537±10ms	0.74	bench_function_base.Sort.time_sort(False, False, 'object', ('random',))
-	1.06±0.1ms	747±80μs	0.7	bench_function_base.Sort.time_sort(False, False, 'int64', ('uniform',))
-	12.0±7ms	8.16±0.3ms	0.68	bench_function_base.Sort.time_sort(True, False, 'int32', ('sorted_block', 100))
-	26.3±7ms	16.8±0.2ms	0.64	bench_function_base.Sort.time_sort(True, False, 'object', ('sorted_block', 10))
-	39.5±7ms	23.5±0.5ms	0.6	bench_function_base.Sort.time_sort(True, False, 'object', ('uniform',))
-	344±0.8ms	202±1ms	0.59	bench_function_base.Sort.time_sort(False, False, 'object', ('ordered',))
-	633±20ms	366±2ms	0.58	bench_function_base.Sort.time_sort(False, False, 'object', ('reversed',))
-	31.0±6ms	17.6±2ms	0.57	bench_function_base.Sort.time_sort(True, False, 'object', ('reversed',))

seberg · 2026-06-08T10:32:34Z

Thanks! Sorry quick question, but do you know what's up with the float16 benchmarks? I don't think we use it for that, but I wonder if adding NPY_FINLINE to the isnan and lt_nonan helpers might nudge the compiler to inline?
(Guessing that it is failing to do so suddenly, the other fluctuations seem basically "random", but the 3x is a bit surprising.)

(I guess this should be settling, but if this "inverted" logic would hinder this, we can also revert it...)

MaanasArora · 2026-06-08T10:49:57Z

Not quite sure! The 3x is pretty stable across runs, so it does seem to be a real regression. Adding NPY_FINLINE didn't really help unfortunately:

float16-only sort benchmarks

Change	Before [`c80693c`]	After [12d1d249]	Ratio	Benchmark (Parameter)
+	1.17±0.01ms	4.41±0.8ms	3.78	bench_function_base.Sort.time_sort(True, True, 'float16', ('uniform',))
+	1.18±0ms	3.47±0.7ms	2.94	bench_function_base.Sort.time_sort(True, True, 'float16', ('ordered',))
+	15.0±0.4ms	17.5±0.4ms	1.17	bench_function_base.Sort.time_sort(True, True, 'float16', ('sorted_block', 1000))
+	15.1±0.2ms	17.3±0.03ms	1.15	bench_function_base.Sort.time_sort(False, False, 'float16', ('uniform',))
+	14.9±0.2ms	16.8±0.1ms	1.13	bench_function_base.Sort.time_sort(False, True, 'float16', ('reversed',))
+	36.8±0.2ms	38.6±0.1ms	1.05	bench_function_base.Sort.time_sort(False, True, 'float16', ('sorted_block', 1000))
-	8.51±0.2ms	8.06±0.09ms	0.95	bench_function_base.Sort.time_sort(True, False, 'float16', ('reversed',))
-	15.8±0.2ms	14.9±0.2ms	0.94	bench_function_base.Sort.time_sort(False, True, 'float16', ('ordered',))
-	14.9±0.8ms	14.0±0.2ms	0.94	bench_function_base.Sort.time_sort(False, True, 'float16', ('uniform',))

I suspect the !less in less_equal (and same for greater) is causing some random compiler-specific optimization of the nan-boolean cases, as before !ret came after. Let me experiment with it a bit...

MaanasArora · 2026-06-08T12:17:33Z

OK, pushed a change to the less_equal and greater_equal functions by expanding them, which helped speed up on my machine at least, I think the regressions are gone mostly - benchmarks below! Added a release note as well.

Sort Benchmarks

Change	Before [`c80693c`]	After [`a84f7b6`]	Ratio	Benchmark (Parameter)
+	13.5±0.05ms	17.0±0.06ms	1.26	bench_function_base.Sort.time_sort(True, True, 'int32', ('sorted_block', 10))
+	6.45±0.02ms	7.72±0.1ms	1.2	bench_function_base.Sort.time_sort(False, True, 'float64', ('ordered',))
+	5.04±0.02ms	6.02±0.09ms	1.2	bench_function_base.Sort.time_sort(False, True, 'int16', ('ordered',))
+	7.96±0.05ms	9.50±0.1ms	1.19	bench_function_base.Sort.time_sort(True, False, 'object', ('sorted_block', 1000))
+	14.6±0.09ms	17.3±1ms	1.19	bench_function_base.Sort.time_sort(True, True, 'float16', ('sorted_block', 1000))
+	5.21±0.01ms	6.15±0.01ms	1.18	bench_function_base.Sort.time_sort(False, True, 'int16', ('reversed',))
+	7.96±0.2ms	9.31±0.1ms	1.17	bench_function_base.Sort.time_sort(True, False, 'int64', ('sorted_block', 1000))
+	14.9±0.06ms	17.1±0.03ms	1.14	bench_function_base.Sort.time_sort(False, False, 'float16', ('ordered',))
+	16.0±0.2ms	18.1±0.03ms	1.13	bench_function_base.Sort.time_sort(False, False, 'float16', ('reversed',))
+	9.96±0.1ms	11.2±0.1ms	1.13	bench_function_base.Sort.time_sort(True, False, 'int64', ('sorted_block', 100))
+	5.62±0.02ms	6.35±0.1ms	1.13	bench_function_base.Sort.time_sort(True, False, 'uint32', ('sorted_block', 1000))
+	11.0±0.3ms	12.5±0.3ms	1.13	bench_function_base.Sort.time_sort(True, True, 'int64', ('sorted_block', 100))
+	6.40±0.1ms	7.17±0.06ms	1.12	bench_function_base.Sort.time_sort(False, True, 'float32', ('ordered',))
+	10.0±0.09ms	11.2±0.1ms	1.12	bench_function_base.Sort.time_sort(True, False, 'object', ('sorted_block', 100))
+	8.52±0.03ms	9.45±0.08ms	1.11	bench_function_base.Sort.time_sort(True, True, 'int64', ('sorted_block', 1000))
+	25.2±0.5ms	27.8±0.3ms	1.1	bench_function_base.Sort.time_sort(True, False, 'object', ('ordered',))
+	511±2ms	560±20ms	1.09	bench_function_base.Sort.time_sort(False, False, 'object', ('uniform',))
+	11.6±0.03ms	12.7±0.1ms	1.09	bench_function_base.Sort.time_sort(False, True, 'float64', ('reversed',))
+	40.7±0.04ms	44.5±0.2ms	1.09	bench_function_base.Sort.time_sort(False, True, 'int16', ('sorted_block', 10))
+	12.8±0.1ms	13.8±0.09ms	1.08	bench_function_base.Sort.time_sort(False, True, 'float64', ('uniform',))
+	45.1±0.03ms	48.8±0.1ms	1.08	bench_function_base.Sort.time_sort(False, True, 'int16', ('sorted_block', 100))
+	37.4±0.04ms	40.5±0.07ms	1.08	bench_function_base.Sort.time_sort(False, True, 'int16', ('sorted_block', 1000))
+	15.6±0.2ms	16.9±0.1ms	1.08	bench_function_base.Sort.time_sort(True, False, 'object', ('sorted_block', 10))
+	7.63±0.01ms	8.21±0.4ms	1.08	bench_function_base.Sort.time_sort(True, False, 'uint32', ('sorted_block', 100))
+	702±4μs	760±7μs	1.08	bench_function_base.Sort.time_sort(True, True, 'float32', ('reversed',))
+	54.4±0.1ms	58.4±0.2ms	1.07	bench_function_base.Sort.time_sort(False, True, 'int16', ('random',))
+	29.5±0.2ms	31.5±0.6ms	1.07	bench_function_base.Sort.time_sort(True, False, 'float16', ('sorted_block', 10))
-	8.90±0.1ms	8.41±0.07ms	0.95	bench_function_base.Sort.time_sort(False, True, 'int64', ('reversed',))
-	792±90μs	753±4μs	0.95	bench_function_base.Sort.time_sort(True, False, 'int64', ('uniform',))
-	318±8μs	300±0.7μs	0.94	bench_function_base.Sort.time_sort(True, False, 'bool', ('uniform',))
-	4.82±0.05ms	4.50±0.02ms	0.93	bench_function_base.Sort.time_sort(False, True, 'int32', ('ordered',))
-	23.0±0.4ms	21.5±0.05ms	0.93	bench_function_base.Sort.time_sort(False, True, 'int8', ('sorted_block', 100))
-	324±10μs	303±0.2μs	0.93	bench_function_base.Sort.time_sort(True, True, 'bool', ('uniform',))
-	5.48±0.09ms	5.06±0.01ms	0.92	bench_function_base.Sort.time_sort(False, False, 'int16', ('uniform',))
-	1.14±0.03ms	1.05±0.01ms	0.92	bench_function_base.Sort.time_sort(True, False, 'float64', ('ordered',))
-	15.5±0.3ms	14.1±0.03ms	0.91	bench_function_base.Sort.time_sort(False, True, 'float16', ('ordered',))
-	8.78±0.2ms	7.96±0.1ms	0.91	bench_function_base.Sort.time_sort(False, True, 'int32', ('reversed',))
-	13.2±0.07ms	11.9±0.02ms	0.9	bench_function_base.Sort.time_sort(False, True, 'int8', ('sorted_block', 1000))
-	8.74±0.2ms	7.84±0.02ms	0.9	bench_function_base.Sort.time_sort(False, True, 'uint32', ('reversed',))
-	1.12±0.06ms	1.00±0.01ms	0.89	bench_function_base.Sort.time_sort(True, False, 'float64', ('reversed',))
-	26.5±0.4ms	23.5±0.2ms	0.89	bench_function_base.Sort.time_sort(True, True, 'float64', ('sorted_block', 10))
-	667±50μs	596±3μs	0.89	bench_function_base.Sort.time_sort(True, True, 'uint32', ('reversed',))
-	4.92±0.06ms	4.35±0.01ms	0.88	bench_function_base.Sort.time_sort(False, True, 'uint32', ('ordered',))
-	1.24±0.1ms	1.06±0.03ms	0.86	bench_function_base.Sort.time_sort(True, True, 'float64', ('uniform',))
-	1.05±0.01ms	875±3μs	0.84	bench_function_base.Sort.time_sort(True, False, 'float16', ('uniform',))
-	1.42±0.2ms	1.20±0.03ms	0.84	bench_function_base.Sort.time_sort(True, True, 'float64', ('reversed',))
-	1.09±0.01ms	884±8μs	0.81	bench_function_base.Sort.time_sort(True, False, 'float16', ('ordered',))
-	16.2±0.03ms	12.7±0.05ms	0.79	bench_function_base.Sort.time_sort(True, False, 'uint32', ('sorted_block', 10))
-	661±2ms	518±3ms	0.78	bench_function_base.Sort.time_sort(False, False, 'object', ('random',))
-	5.81±0.01ms	4.55±0.5ms	0.78	bench_function_base.Sort.time_sort(True, False, 'int32', ('sorted_block', 1000))
-	1.15±0ms	880±3μs	0.77	bench_function_base.Sort.time_sort(True, True, 'float16', ('ordered',))
-	1.15±0.01ms	882±10μs	0.77	bench_function_base.Sort.time_sort(True, True, 'float16', ('uniform',))
-	16.1±0.08ms	12.4±0.03ms	0.77	bench_function_base.Sort.time_sort(True, True, 'uint32', ('sorted_block', 10))
-	32.2±0.3ms	24.0±0.4ms	0.75	bench_function_base.Sort.time_sort(True, False, 'object', ('uniform',))
-	25.3±0.1ms	17.1±0.09ms	0.68	bench_function_base.Sort.time_sort(True, False, 'object', ('reversed',))
-	326±0.5ms	202±0.2ms	0.62	bench_function_base.Sort.time_sort(False, False, 'object', ('ordered',))
-	595±2ms	369±0.4ms	0.62	bench_function_base.Sort.time_sort(False, False, 'object', ('reversed',))

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

Argsort benchmarks

Change	Before [`c80693c`]	After [`6d83c91`]	Ratio	Benchmark (Parameter)
+	16.9±0.4ms	22.7±0.8ms	1.35	bench_function_base.Sort.time_argsort(True, False, 'int32', ('sorted_block', 10))
+	683±10μs	913±200μs	1.34	bench_function_base.Sort.time_argsort(True, True, 'int64', ('ordered',))
+	8.35±0.09ms	10.7±1ms	1.28	bench_function_base.Sort.time_argsort(True, False, 'int32', ('sorted_block', 1000))
+	12.2±0.4ms	15.2±0.3ms	1.25	bench_function_base.Sort.time_argsort(False, True, 'float32', ('reversed',))
+	581±20μs	721±100μs	1.24	bench_function_base.Sort.time_argsort(True, False, 'int32', ('uniform',))
+	575±7μs	703±70μs	1.22	bench_function_base.Sort.time_argsort(True, False, 'int32', ('ordered',))
+	11.7±0.4ms	14.3±0.9ms	1.22	bench_function_base.Sort.time_argsort(True, False, 'uint32', ('sorted_block', 100))
+	523±4μs	636±80μs	1.22	bench_function_base.Sort.time_argsort(True, False, 'uint8', ('uniform',))
+	802±3μs	971±200μs	1.21	bench_function_base.Sort.time_argsort(True, True, 'uint32', ('reversed',))
+	6.07±0.07ms	7.14±0.3ms	1.18	bench_function_base.Sort.time_argsort(False, True, 'int64', ('ordered',))
+	6.97±0.4ms	8.17±0.6ms	1.17	bench_function_base.Sort.time_argsort(False, True, 'float32', ('ordered',))
+	5.22±0.1ms	6.12±0.3ms	1.17	bench_function_base.Sort.time_argsort(True, False, 'uint8', ('random',))
+	521±9μs	606±30μs	1.16	bench_function_base.Sort.time_argsort(True, False, 'uint8', ('ordered',))
+	6.74±0.1ms	7.72±0.5ms	1.15	bench_function_base.Sort.time_argsort(False, True, 'bool', ('sorted_block', 10))
+	12.3±0.2ms	14.2±0.5ms	1.15	bench_function_base.Sort.time_argsort(True, False, 'int32', ('sorted_block', 100))
+	19.6±0.3ms	22.6±0.7ms	1.15	bench_function_base.Sort.time_argsort(True, False, 'object', ('ordered',))
+	578±8μs	656±20μs	1.14	bench_function_base.Sort.time_argsort(True, False, 'uint32', ('ordered',))
+	16.7±0.5ms	18.9±2ms	1.13	bench_function_base.Sort.time_argsort(False, False, 'float16', ('ordered',))
+	519±5μs	588±8μs	1.13	bench_function_base.Sort.time_argsort(True, False, 'bool', ('ordered',))
+	8.10±0.2ms	9.13±0.1ms	1.13	bench_function_base.Sort.time_argsort(True, False, 'uint32', ('sorted_block', 1000))
+	578±3μs	652±40μs	1.13	bench_function_base.Sort.time_argsort(True, False, 'uint32', ('uniform',))
+	2.65±0.06ms	3.01±0.1ms	1.13	bench_function_base.Sort.time_argsort(True, False, 'uint8', ('sorted_block', 1000))
+	23.5±0.3ms	26.5±0.7ms	1.13	bench_function_base.Sort.time_argsort(True, True, 'object', ('sorted_block', 10))
+	10.7±0.04ms	11.9±0.9ms	1.11	bench_function_base.Sort.time_argsort(False, True, 'int64', ('reversed',))
+	16.3±0.4ms	18.2±0.2ms	1.11	bench_function_base.Sort.time_argsort(True, False, 'uint32', ('sorted_block', 10))
+	3.24±0.01ms	3.56±0.4ms	1.1	bench_function_base.Sort.time_argsort(True, False, 'bool', ('sorted_block', 100))
+	13.5±0.3ms	15.0±0.3ms	1.1	bench_function_base.Sort.time_argsort(True, False, 'object', ('sorted_block', 100))
+	7.16±0.08ms	7.88±0.2ms	1.1	bench_function_base.Sort.time_argsort(True, False, 'uint8', ('reversed',))
+	9.19±0.3ms	9.99±0.7ms	1.09	bench_function_base.Sort.time_argsort(True, False, 'object', ('sorted_block', 1000))
+	5.87±0.07ms	6.37±0.3ms	1.08	bench_function_base.Sort.time_argsort(False, False, 'bool', ('uniform',))
+	15.2±0.2ms	16.5±0.2ms	1.08	bench_function_base.Sort.time_argsort(False, False, 'int8', ('sorted_block', 1000))
+	5.99±0.04ms	6.48±0.2ms	1.08	bench_function_base.Sort.time_argsort(False, True, 'bool', ('sorted_block', 1000))
+	19.1±0.04ms	20.4±0.6ms	1.07	bench_function_base.Sort.time_argsort(False, False, 'float64', ('reversed',))
+	6.02±0.05ms	6.42±0.2ms	1.07	bench_function_base.Sort.time_argsort(False, True, 'bool', ('ordered',))
+	11.9±0.1ms	12.8±0.4ms	1.07	bench_function_base.Sort.time_argsort(False, True, 'bool', ('random',))
+	56.3±0.3ms	60.1±0.7ms	1.07	bench_function_base.Sort.time_argsort(False, True, 'int64', ('sorted_block', 1000))
+	4.15±0.02ms	4.43±0.2ms	1.07	bench_function_base.Sort.time_argsort(True, False, 'bool', ('sorted_block', 1000))
+	13.6±0.06ms	14.5±0.08ms	1.07	bench_function_base.Sort.time_argsort(True, False, 'float64', ('sorted_block', 1000))
+	20.5±0.2ms	21.8±0.4ms	1.07	bench_function_base.Sort.time_argsort(True, False, 'object', ('sorted_block', 10))
+	31.5±0.7ms	33.4±0.9ms	1.06	bench_function_base.Sort.time_argsort(False, False, 'int32', ('random',))
+	507±3ms	539±1ms	1.06	bench_function_base.Sort.time_argsort(False, False, 'object', ('uniform',))
+	84.8±0.4ms	89.7±3ms	1.06	bench_function_base.Sort.time_argsort(False, True, 'float32', ('random',))
+	14.8±0.1ms	15.6±0.1ms	1.05	bench_function_base.Sort.time_argsort(False, False, 'float16', ('uniform',))
+	4.22±0.05ms	4.44±0.3ms	1.05	bench_function_base.Sort.time_argsort(True, False, 'int16', ('sorted_block', 10))
+	88.4±0.2ms	92.9±1ms	1.05	bench_function_base.Sort.time_argsort(True, False, 'int32', ('random',))
+	35.9±0.1ms	37.9±0.4ms	1.05	bench_function_base.Sort.time_argsort(True, True, 'float16', ('sorted_block', 10))
-	62.0±0.2ms	59.1±0.6ms	0.95	bench_function_base.Sort.time_argsort(False, True, 'int32', ('sorted_block', 10))
-	97.8±2ms	92.9±0.5ms	0.95	bench_function_base.Sort.time_argsort(True, False, 'float32', ('random',))
-	5.96±0.05ms	5.60±0.05ms	0.94	bench_function_base.Sort.time_argsort(False, True, 'int8', ('ordered',))
-	8.06±0.3ms	7.47±0.2ms	0.93	bench_function_base.Sort.time_argsort(True, True, 'int8', ('reversed',))
-	12.7±0.6ms	11.9±0.1ms	0.93	bench_function_base.Sort.time_argsort(True, True, 'uint32', ('sorted_block', 100))
-	8.12±0.2ms	7.48±0.2ms	0.92	bench_function_base.Sort.time_argsort(True, True, 'uint8', ('reversed',))
-	1.17±0.01ms	1.07±0.03ms	0.91	bench_function_base.Sort.time_argsort(True, False, 'float16', ('uniform',))
-	11.3±0.4ms	9.87±0.2ms	0.87	bench_function_base.Sort.time_argsort(False, True, 'int32', ('reversed',))
-	21.7±0.7ms	18.8±0.2ms	0.87	bench_function_base.Sort.time_argsort(True, True, 'float64', ('sorted_block', 100))
-	17.3±0.4ms	14.9±0.1ms	0.86	bench_function_base.Sort.time_argsort(False, True, 'float16', ('uniform',))
-	6.96±0.3ms	5.86±0.1ms	0.84	bench_function_base.Sort.time_argsort(False, True, 'uint32', ('ordered',))
-	3.09±0.2ms	2.57±0.04ms	0.83	bench_function_base.Sort.time_argsort(True, True, 'uint8', ('sorted_block', 10))
-	1.29±0.01ms	1.07±0.03ms	0.82	bench_function_base.Sort.time_argsort(True, False, 'float16', ('ordered',))
-	1.32±0.01ms	1.07±0.02ms	0.81	bench_function_base.Sort.time_argsort(True, True, 'float16', ('uniform',))
-	28.5±0.3ms	22.8±0.5ms	0.8	bench_function_base.Sort.time_argsort(True, False, 'object', ('uniform',))
-	7.39±0.3ms	5.87±0.2ms	0.79	bench_function_base.Sort.time_argsort(False, True, 'int32', ('ordered',))
-	1.42±0.1ms	1.05±0.02ms	0.74	bench_function_base.Sort.time_argsort(True, True, 'float16', ('ordered',))
-	3.70±0.6ms	2.72±0.07ms	0.74	bench_function_base.Sort.time_argsort(True, True, 'uint8', ('sorted_block', 1000))
-	591±10ms	359±2ms	0.61	bench_function_base.Sort.time_argsort(False, False, 'object', ('reversed',))
-	325±2ms	195±3ms	0.6	bench_function_base.Sort.time_argsort(False, False, 'object', ('ordered',))
-	19.5±0.3ms	10.9±0.09ms	0.56	bench_function_base.Sort.time_argsort(True, False, 'object', ('reversed',))

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

seberg

Thanks, as discussed a bit, I am really starting to think it was a terrible idea to think about using <= style comparisons, because if we avoid the __le__ explicitly then I am not quite sure about the implementation.

So, if that makes tests fail (for good reasons or not), then I think it might be best to just not do that cmp_eq thing. In the end, it is only interesting if it avoids NaN checks when no NaNs are involved (and even then it only might be interesting).

seberg · 2026-06-08T13:22:41Z

+`np.sort` and `np.argsort` with arrays of dtype `object`
+now support passing `descending=True` to sort in descending order.
+Unordered objects, i.e. `obj` such that `obj != obj`, are sorted


Suggested change

`np.sort` and `np.argsort` with arrays of dtype `object`

now support passing `descending=True` to sort in descending order.

Unordered objects, i.e. `obj` such that `obj != obj`, are sorted

`np.sort` and `np.argsort` with arrays of dtype ``object``

now support passing `descending=True` to sort in descending order.

Unordered objects, i.e. ``obj`` such that ``obj != obj``, are now sorted

Thanks, fixed! I also cleaned up the release note a bit, not sure if obj != obj was actually nice, but seems more natural this way...

seberg · 2026-06-08T14:04:12Z

+        b = np.concatenate((a[~nanmask][::-1], a[nanmask]))
+        if np.issubdtype(a.dtype, np.object_):
+            # cast to float for comparison, as object np.nan != np.nan
+            a = a.astype(float)


This looks wrong (i.e. the cast is before the actual sort).

Fixed, thanks!

seberg · 2026-06-08T14:07:53Z

-            if (j < n && npy::cmp<Tag, reverse>(a[j], a[j + 1])) {
+            ret = npy::cmp<Tag, reverse>(a[j], a[j + 1]);
+            if (ret < 0) return ret;
+            if (j < n && ret) {


This may be wrong, i.e. it computes ret even if j < n isn't true. You could inline the (ret = ...) == 1 although not the prettiest maybe.

(I guess we could probably just delete heapsort in practice, but maybe not as part of this PR.)

This is done, I think!

seberg · 2026-06-08T14:43:34Z

+            return 0;
+        }
+
+        ret = PyObject_RichCompareBool(a, b, op);


hmmm if isnan(a) && isnan(b) then for this style of comparison, they are considered equal, I think?
I guess in practice that might not even matter for timesort, with just another sorting approach being taken (i.e. if the "already sorted" pass fails for NaNs that might be fine, but I am not quite sure, unless you are? then we should comment)...

I am now thinking I really led you astray here. I don't mind using <=, FWIW, (maybe with a small release note), we can undo if someone notices...
But, at this point it feels like it is adding a lot of annoyances, and I would be just as happy to not do it here. If someone ever wants to optimize it, they could follow up.

But, my guess is your re-factor seems to have optimized from 3 to 2 Python comparisons for the already sorted case but if we change it to something like:

def less_equal_with_nan(a, b): if b > a: return 0 elif b != b: return 1 elif a != a: return 0 return 1

which to me would seem safe, then we would again end up with 3 comparisons when a <= b is True and at that point the whole use of cmp_eq may be pretty much moot?

Yeah, it feels a bit moot with the added complexity now, even if it is a bit optimized (from 3 to 2), and totally moot if not in the future. I don't think it warrants this much of tweaks... I've just gone and ahead and reverted these files to before the cmp_eq experiment, so we lose this baggage!

seberg · 2026-06-08T14:44:17Z

@@ -0,0 +1,6 @@
+object array sorting supports `descending=True`


Maybe mention NaN/Nan-like objects here?

Changed up to include, thanks!

… type" This reverts commit a84f7b6.

…ort comparator" This reverts commit d8a7577.

…to avoid object regression" This reverts commit 707143a.

MaanasArora · 2026-06-08T15:27:34Z

Thanks for reviewing! Yeah, I think reverting is a good choice here, as there was added complexity on a few fronts. At least we know this is tricky to do now :)

…th NaN-like objects

seberg

Made a tiny tweak, mostly removed the with errstate context from the benchmarks (because it didn't make sense to me, but who knows maybe I missed something).
Tiny tweak to the release note, but it's good enough.

One thing that is in a sense missing are tests that actual exercise the error paths, that might be a good follow-up, but I don't want to hold it off due to that.

Thanks, I'll put it in once tests pass, if there is something more, we can follow-up.

MaanasArora changed the title ~~ENH: New-style object sorting with NaN handling~~ ENH: New-style object sorting with descending support and NaN handling May 14, 2026

MaanasArora force-pushed the object-sorts branch from 8e06384 to 7d15e9a Compare May 14, 2026 06:43

seberg reviewed May 14, 2026

View reviewed changes

MaanasArora force-pushed the object-sorts branch from b5fb9d5 to f8619f6 Compare May 15, 2026 02:55

seberg reviewed May 17, 2026

View reviewed changes

Comment thread numpy/_core/src/common/numpy_tag.hpp

Comment thread numpy/_core/src/npysort/quicksort.hpp Outdated

MaanasArora force-pushed the object-sorts branch 2 times, most recently from 12a530d to b9f04dd Compare May 25, 2026 07:15

seberg mentioned this pull request Jun 3, 2026

BUG: Handle nan values on histogram's slow path. #31461

Open

MaanasArora force-pushed the object-sorts branch from 55678b9 to 94e5d4a Compare June 5, 2026 01:43

MaanasArora force-pushed the object-sorts branch from ca5446d to cb2a40a Compare June 8, 2026 10:00

MaanasArora added 13 commits June 8, 2026 06:09

ENH: New-style ArrayMethod object sorting with NaN handling

27207a1

BUG: Add sentinel guard for out-of-bound argsorts

6989dbc

BUG: Correct return value on rich compare error

cd542e2

REF: Rewrite object tag comparisons

ae05df2

REF: New _cmp function to reuse in object less, greater, -1 for…

650dcd6

… error

ENH: Error handling and early exit in sorts using NPY_CMP macro

87bb30a

Revert "ENH: Error handling and early exit in sorts using NPY_CMP macro"

9255027

This reverts commit 52b7c81.

ENH: Handle errors from cmp in sort functions

90d0857

BUG: Switch NULLs to None in object comparison null handling

72aed9d

REF: Simplify object comparison handling in quicksort and aquicksort …

4865d3a

…functions

BENCH: Add object dtype to sort benchmarks

ff962da

REF: Fix indentation

2ddcb7d

BUG: Fix swapped parametrizations

c81cb08

MaanasArora added 2 commits June 8, 2026 06:09

ENH: Add greater_equal to simple dtypes and use in timsort to avoid…

707143a

… object regression

ENH: Fix use of Py_LE/GE and instead invert op for object sort comp…

d8a7577

…arator This reverts commit faab2fc.

MaanasArora added 2 commits June 8, 2026 07:38

ENH: Optimize less_equal and greater_equal for npy_half type

a84f7b6

DOC: Add release note

6d83c91

MaanasArora force-pushed the object-sorts branch from cb2a40a to 6d83c91 Compare June 8, 2026 12:03

seberg reviewed Jun 8, 2026

View reviewed changes

MaanasArora added 7 commits June 8, 2026 11:04

Revert "ENH: Optimize less_equal and greater_equal for npy_half…

2c2ac85

… type" This reverts commit a84f7b6.

Revert "ENH: Fix use of Py_LE/GE and instead invert op for object s…

a99ba57

…ort comparator" This reverts commit d8a7577.

Revert "ENH: Add greater_equal to simple dtypes and use in timsort …

f2133e2

…to avoid object regression" This reverts commit 707143a.

DOC: Clarify release note

0fc1796

TST: Fix casting of object arrays in sorting nan test

792194f

BUG: Fix comparison logic in heapsort and aheapsort implementations

efe1704

STYLE: Remove unnecessary blank line

cea79da

MaanasArora and others added 2 commits June 8, 2026 11:33

DOC: Update release notes to clarify object array sorting behavior wi…

c279e21

…th NaN-like objects

Remove presumably unnecessary with errstate and small tweak

3a4f554

seberg approved these changes Jun 9, 2026

View reviewed changes

seberg merged commit c0c20aa into numpy:main Jun 9, 2026
85 of 87 checks passed

		@@ -0,0 +1,6 @@
		object array sorting supports `descending=True`

Uh oh!

Conversation

MaanasArora commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Disclosure

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

MaanasArora commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented May 15, 2026

Uh oh!

MaanasArora commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MaanasArora commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Jun 1, 2026

Uh oh!

MaanasArora commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaanasArora commented Jun 8, 2026

Uh oh!

seberg commented Jun 8, 2026

Uh oh!

MaanasArora commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaanasArora commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaanasArora Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaanasArora commented May 14, 2026 •

edited

Loading

seberg commented May 14, 2026 •

edited

Loading

MaanasArora commented May 15, 2026 •

edited

Loading

MaanasArora commented May 15, 2026 •

edited

Loading

MaanasArora commented May 25, 2026 •

edited

Loading

MaanasArora commented Jun 5, 2026 •

edited

Loading

MaanasArora commented Jun 8, 2026 •

edited

Loading

MaanasArora commented Jun 8, 2026 •

edited

Loading

MaanasArora Jun 8, 2026 •

edited

Loading