Skip to content

Conversation

@sgasse
Copy link
Contributor

@sgasse sgasse commented Mar 27, 2020

Harmonize the signature of np.mean with np.sum by adding a where
argument, closes gh-15818.

@sgasse sgasse force-pushed the add_where_to_mean branch from 6795608 to cb24386 Compare March 28, 2020 19:57
@sgasse sgasse force-pushed the add_where_to_mean branch from cb24386 to 046c7fd Compare March 28, 2020 22:22
@eric-wieser
Copy link
Member

Thanks for the var support, this looks surprisingly straightforward.

@sgasse sgasse force-pushed the add_where_to_mean branch from 046c7fd to 8abc301 Compare March 29, 2020 08:53
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check which cases ufunc.reduce (c function name is something like GenericReduction) has this optimization for, and try the same ones. In particular, I'm trying to remember if we optimized for np.bool_.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the PyArray_GenericReduceFunction in numpy/core/src/multiarray/number.c however that function looks more like a wrapper, being only 15 lines long. Could you point me in the right direction?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @eric-wieser is indicating at this function : PyUFunc_GenericReduction in numpy/core/src/umath/ufunc_object.c . I wasn't able to find any special casing for np.bool except that where = True is being treated as if no where is provided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thank you for checking @anirudh2290.

@eric-wieser eric-wieser added this to the 1.19.0 release milestone Mar 29, 2020
@sgasse sgasse force-pushed the add_where_to_mean branch from 8abc301 to 5d72822 Compare March 29, 2020 11:53
@sgasse sgasse force-pushed the add_where_to_mean branch from 771126a to 6ca4f60 Compare April 18, 2020 09:14
Comment on lines 245 to 272
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't do the right thing for arrays containing np.inf. I recommend we skip ptp for now, and attempt it in a future PR - it's sufficiently different from mean and var that I think it's worth handling separately, so that we can get the rest merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I will remove changes made to ptp and upload a commit without it.

@sgasse sgasse force-pushed the add_where_to_mean branch from adb7428 to 08b0ccc Compare April 28, 2020 16:42
@sgasse
Copy link
Contributor Author

sgasse commented Apr 28, 2020

@eric-wieser I removed all changes related to ptp. Please let me know if other changes are required before the commit can be merged. Looking at the failed checks, it appears to me that they are unrelated to my changes. However I build locally on Linux and cannot test a build on Windows.

@sgasse sgasse force-pushed the add_where_to_mean branch from 08b0ccc to 25aac08 Compare May 6, 2020 16:19
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kwarg will have some performance implications for small arrays <1000 elements or so, until my parsing improvements would be merged. But just a note, to be honest there is not much we can do about it. It seems passing by position is thwarted by the initial argument (though that could possibly be changed).

@eric-wieser do you want to persue this for 1.19? I will push it off for now.

@seberg seberg modified the milestones: 1.19.0 release, 1.20.0 release May 14, 2020
@sgasse
Copy link
Contributor Author

sgasse commented May 22, 2020

@eric-wieser would you have time to take a last quick glance at this so it could maybe go in?

@seberg seberg self-requested a review May 22, 2020 13:51
@seberg seberg added the 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. label Jul 16, 2020
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, sorry for not moving this forward earlier. @eric-wieser you were happy with this? It seems good to me, would do another more careful path before merging maybe, but seems all good.

@sgasse would you be willing to add an improvement release note for this? since these are all reduction like functions, adding an argument that all reductions accept seems like a straight forward API decision.

@sgasse
Copy link
Contributor Author

sgasse commented Jul 17, 2020

Hey @seberg , nice to hear back on this pull request :)
Sure, later/tomorrow, I can update all the versionadded tags in my changes. Where exactly can I add the improvement release note?

@seberg
Copy link
Member

seberg commented Jul 17, 2020

Thanks, releaes notes would go into doc/release/upcoming_changes, there is a readme in that file detailing the process.

Harmonize the signature of np.mean, np.var np.std, np.any, np.all,
and their respective nd.array methods with np.sum by adding a where
argument, see numpygh-15818.
@sgasse sgasse force-pushed the add_where_to_mean branch from ba45f07 to 4ec1dbd Compare July 18, 2020 11:15
@sgasse
Copy link
Contributor Author

sgasse commented Jul 18, 2020

Alright, thank you @seberg for the location of of the release notes. I added a document there and updated all the versionadded tags. Maybe you can have a look at the release note document.

@seberg
Copy link
Member

seberg commented Aug 6, 2020

Sorry, I was AFK for a bit. The release notes seem long but fine. @eric-wieser if there was anything you felt still doing here please have a quick look. I somewhat remembered that you almost had signed this off. But maybe that was not the case.

As a general note, the slowdown is:

In [2]: %timeit np.mean(arr)                                                                                            
10.8 µs ± 575 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

vs.

In [3]: %timeit np.mean(arr)                                                                                            
17.7 µs ± 132 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

For arr.any() the difference is 2.4µs vs. 3µs after. Since this adds a reduce argument to reduce-like operations. I am generally in favor of adding this, but it is a fairly large speed impact for small arrays there with mean, and I am uncertain that it will be easy to micro-optimize away.

@charris
Copy link
Member

charris commented Nov 21, 2020

Ping everyone here. Sounds like this is ready.

@seberg
Copy link
Member

seberg commented Nov 21, 2020

I am OK with putting it in as is. The only thing that made me pause in the first place was the slowdown, which is a silly reason probably.

@sgasse
Copy link
Contributor Author

sgasse commented Nov 22, 2020

@charris @seberg nice to hear back - should I rebase this or is it fine as is? I see the next major version is still 1.20, so this does not need to be updated, right?

@charris charris merged commit 269f2f2 into numpy:master Nov 22, 2020
@charris
Copy link
Member

charris commented Nov 22, 2020

In it goes, thanks @sgasse .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

01 - Enhancement 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. component: numpy._core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: Add a where argument to np.mean

8 participants