2

My raw data is as such

    level0  level1  level2
0   0       A       foo
1   0       A       bar
2   0       B       foo
3   0       B       foo
4   0       B       foo
5   0       B       bar
6   1       A       foo
7   1       A       bar

And then I try to group the counts as such df.groupby(['level0', 'level1', 'level2']).size()

to get this

level0  level1  level2
0       A       foo            1
                bar            1
        B       foo            3
                bar            1
1       A       foo            1
                bar            1
        B       foo            1
                bar            1
2       A       foo            1
                bar            1
        B       foo            1
                bar            1

And now I would simply like to divide the counts of level2 in a ratio of bar/foo to get something like this:

level0  level1      {bar counts/foo counts}
0       A           1.00
        B           0.33
1       A           1.00
        B           1.00
2       A           1.00
        B           1.00

Perhaps I even setup the original groupby wrong but also tried .div, .apply, etc but having a mental block. Thank you for any guidance!

3 Answers 3

1

unstack and eval:

tmp = df.groupby(['level0', 'level1', 'level2']).size()

out = (tmp.unstack().eval('bar/foo')
          .reset_index(name='bar/foo')
      )

Or div:

tmp = (df.groupby(['level0', 'level1', 'level2'])
         .size().unstack()
      )

out = (tmp['bar']
       .div(tmp['foo'])
       .reset_index(name='bar/foo')
      )

Output:

   level0 level1   bar/foo
0       0      A  1.000000
1       0      B  0.333333
2       1      A  1.000000
Sign up to request clarification or add additional context in comments.

1 Comment

@Scott no need for a walrus, out = df.groupby(['level0', 'level1', 'level2']).size().unstack().eval('bar/foo') is enough. I just split it in two since this was OP's original code ;)
1

Code

# your groupby code
tmp = df.groupby(['level0', 'level1', 'level2']).size() 

# get desired output by xs and div
out = tmp.xs('bar', level=2).div(tmp.xs('foo', level=2))

out:

level0  level1
0       A         1.000000
        B         0.333333
1       A         1.000000

Comments

0
import pandas as pd

# Sample DataFrame
data = {
    'col0': [0, 0, 0, 0, 0, 0, 1, 1],
    'col1': ['A', 'A', 'B', 'B', 'B', 'B', 'A', 'A'],
    'col2': ['X', 'Y', 'X', 'X', 'X', 'Y', 'X', 'Y']
}
df = pd.DataFrame(data)
print(df)

res = df.groupby(['col0','col1','col2']).size().unstack(fill_value = 0).eval('ratio = Y/X')\
.fillna(0).groupby(['col0','col1'])['ratio']\
.max().reset_index().rename(columns = {'ratio' : 'Y/X'})
print(res)
'''
   col0 col1       Y/X
0     0    A  1.000000
1     0    B  0.333333
2     1    A  1.000000
'''

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.