Calculate mean per few columns in Pandas Dataframe

Question

I have a Pandas dataframe, Data:

ID | A1| A2| B1| B2 
ID1| 2 | 1 | 3 | 7 
ID2| 4 | 6 | 5 | 3

I want to calculate mean of columns (A1 and A2), and (B1 and B2) separately and row-wise . My desired output:

ID | A1A2 mean | B1B2 mean
ID1| 1.5       | 5
ID2| 5         | 4

I can do mean of all columns together , but cannot find any functions to get my desired output.
Is there any built-in method in Python?

jezrael · Accepted Answer · 2019-07-12 10:52:20Z

1

Use DataFrame.groupby with lambda function for get first letter of columns for mean, also if first column is not index use DataFrame.set_index:

df=df.set_index('ID').groupby(lambda x: x[0], axis=1).mean().add_suffix('_mean').reset_index()
print (df)
    ID  A_mean  B_mean
0  ID1     1.5     5.0
1  ID2     5.0     4.0

Another solution is extract columns names by indexing str[0]:

df = df.set_index('ID')

print (df.columns.str[0])
Index(['A', 'A', 'B', 'B'], dtype='object')

df = df.groupby(df.columns.str[0], axis=1).mean().add_suffix('_mean').reset_index()
print (df)
    ID  A_mean  B_mean
0  ID1     1.5     5.0
1  ID2     5.0     4.0

Or:

df = (df.set_index('ID')
        .groupby(df.columns[1:].str[0], axis=1)
        .mean()
        .add_suffix('_mean').reset_index()

Verify solution:

a = df.filter(like='A').mean(axis=1)
b = df.filter(like='B').mean(axis=1)

df = df[['ID']].assign(A_mean=a, B_mean=b)
print (df)
    ID  A_mean  B_mean
0  ID1     1.5     5.0
1  ID2     5.0     4.0

EDIT:

If have different columns names and need specify them in lists:

a = df[['A1','A2']].mean(axis=1)
b = df[['B1','B2']].mean(axis=1)

df = df[['ID']].assign(A_mean=a, B_mean=b)
print (df)

edited Jul 12, 2019 at 10:52

answered Jul 12, 2019 at 10:12

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

studentcoder Over a year ago

what is x here? Actually I need to calculate mean of two columns separately

studentcoder Over a year ago

It will calculate a single mean for all columns. But I need to calculate mean of a1 and a2 together, and b1 and b2 together

studentcoder Over a year ago

It calculates mean of all columns like before. Here I have 4 columns, and need to calculate mean of a1 and a2; mean of b1 and b2. I apply your technique which calculates mean of a1,a2,b1,b2

studentcoder Over a year ago

if you group by by columns.str[0], the problem will be if columns names are totally different name from each other

studentcoder Over a year ago

Your last edit works! Excellent way! Thanks a lot ! But I thought it should have some straight way like SELECT SQL technique

|

Collectives™ on Stack Overflow

Calculate mean per few columns in Pandas Dataframe

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related