1

I have a Pandas dataframe, Data:

ID | A1| A2| B1| B2 
ID1| 2 | 1 | 3 | 7 
ID2| 4 | 6 | 5 | 3

I want to calculate mean of columns (A1 and A2), and (B1 and B2) separately and row-wise . My desired output:

ID | A1A2 mean | B1B2 mean
ID1| 1.5       | 5
ID2| 5         | 4

I can do mean of all columns together , but cannot find any functions to get my desired output.
Is there any built-in method in Python?

1 Answer 1

1

Use DataFrame.groupby with lambda function for get first letter of columns for mean, also if first column is not index use DataFrame.set_index:

df=df.set_index('ID').groupby(lambda x: x[0], axis=1).mean().add_suffix('_mean').reset_index()
print (df)
    ID  A_mean  B_mean
0  ID1     1.5     5.0
1  ID2     5.0     4.0

Another solution is extract columns names by indexing str[0]:

df = df.set_index('ID')

print (df.columns.str[0])
Index(['A', 'A', 'B', 'B'], dtype='object')

df = df.groupby(df.columns.str[0], axis=1).mean().add_suffix('_mean').reset_index()
print (df)
    ID  A_mean  B_mean
0  ID1     1.5     5.0
1  ID2     5.0     4.0

Or:

df = (df.set_index('ID')
        .groupby(df.columns[1:].str[0], axis=1)
        .mean()
        .add_suffix('_mean').reset_index()

Verify solution:

a = df.filter(like='A').mean(axis=1)
b = df.filter(like='B').mean(axis=1)

df = df[['ID']].assign(A_mean=a, B_mean=b)
print (df)
    ID  A_mean  B_mean
0  ID1     1.5     5.0
1  ID2     5.0     4.0

EDIT:

If have different columns names and need specify them in lists:

a = df[['A1','A2']].mean(axis=1)
b = df[['B1','B2']].mean(axis=1)

df = df[['ID']].assign(A_mean=a, B_mean=b)
print (df)
Sign up to request clarification or add additional context in comments.

8 Comments

what is x here? Actually I need to calculate mean of two columns separately
It will calculate a single mean for all columns. But I need to calculate mean of a1 and a2 together, and b1 and b2 together
It calculates mean of all columns like before. Here I have 4 columns, and need to calculate mean of a1 and a2; mean of b1 and b2. I apply your technique which calculates mean of a1,a2,b1,b2
if you group by by columns.str[0], the problem will be if columns names are totally different name from each other
Your last edit works! Excellent way! Thanks a lot ! But I thought it should have some straight way like SELECT SQL technique
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.