Taking mean of numpy ndarray with masked elements

Question

I have a MxN array of values taken from an experiment. Some of these values are invalid and are set to 0 to indicate such. I can construct a mask of valid/invalid values using

mask = (mat1 == 0) & (mat2 == 0)

which produces an MxN array of bool. It should be noted that the masked locations do not neatly follow columns or rows of the matrix - so simply cropping the matrix is not an option.

Now, I want to take the mean along one axis of my array (E.G end up with a 1xN array) while excluding those invalid values in the mean calculation. Intuitively I thought

 np.mean(mat1[mask],axis=1)

should do it, but the mat1[mask] operation produces a 1D array which appears to just be the elements where mask is true - which doesn't help when I only want a mean across one dimension of the array.

Is there a 'python-esque' or numpy way to do this? I suppose I could use the mask to set masked elements to NaN and use np.nanmean - but that still feels kind of clunky. Is there a way to do this 'cleanly'?

lsterzinger · Accepted Answer · 2018-10-16 20:27:38Z

5

I think the best way to do this would be something along the lines of:

masked = np.ma.masked_where(mat1 == 0 && mat2 == 0, array_to_mask)

Then take the mean with

masked.mean(axis=1)

answered Oct 16, 2018 at 20:27

lsterzinger

7178 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

fergu Over a year ago

Worked perfectly! I didn't know about masked arrays - thank you!

Andras Deak -- Слава Україні · Accepted Answer · 2018-10-16 20:35:26Z

1

One similarly clunky but efficient way is to multiply your array with the mask, setting the masked values to zero. Then of course you'll have to divide by the number of non-masked values manually. Hence clunkiness. But this will work with integer-valued arrays, something that can't be said about the nan case. It also seems to be fastest for both small and larger arrays (including the masked array solution in another answer):

import numpy as np

def nanny(mat, mask):
    mat = mat.astype(float).copy() # don't mutate the original
    mat[~mask] = np.nan            # mask values
    return np.nanmean(mat, axis=0) # compute mean

def manual(mat, mask):
    # zero masked values, divide by number of nonzeros
    return (mat*mask).sum(axis=0)/mask.sum(axis=0)

# set up dummy data for testing
N,M = 400,400
mat1 = np.random.randint(0,N,(N,M))
mask = np.random.randint(0,2,(N,M)).astype(bool)

print(np.array_equal(nanny(mat1, mask), manual(mat1, mask))) # True

answered Oct 16, 2018 at 20:35

Andras Deak -- Слава Україні

35.4k13 gold badges94 silver badges118 bronze badges

4 Comments

m_power Over a year ago

What would be the manual approach when dealing with floats?

Andras Deak -- Слава Україні Over a year ago

@m_power I think the manual version should also work for floats. My motivation was more that for the float case you can just use nans for invalid values and use np.nanmean, which is likely to be faster because it's a single numpy function call. But OP already knew this if you look at the last part of their question, which is why I focussed on the manual version that might be necessary for integral arrays. But the accepted answer's approach with masked arrays might be overall better if you need the masked data in multiple places. This depends on your use case.

m_power Over a year ago

Thanks! I'm using np.nanmean (for an array of float with some NaNs), but I was looking to see if there was a faster approach.

Andras Deak -- Слава Україні Over a year ago

@m_power If you already have an array of floats I'd expect np.nanmean to be fastest, but admittedly I haven't played with such problems. The function seems to be implemented in python so you can try doing what it does with fewer checks if this is really your bottleneck: github.com/numpy/numpy/blob/main/numpy/lib/nanfunctions.py#L863

Collectives™ on Stack Overflow

Taking mean of numpy ndarray with masked elements

2 Answers 2

1 Comment

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related