1

I've got a dataframe build fromRecordsa django queryset, that I pivot by 2 columns to get a dashboard view of it. I manage to have the global sums byr row and column of the whole table, but I'm trying to get the sum by the first pivot columns (a sub total by row for the first column of each group).

I know absolutely nothing of pandas, but I'm learning.

My dataFrame looks like :

    type                    amount      source  fund
0   Ressource Humaine CDD   -36470.36   Expense fund2
1   Mission                 -1686.47    Expense fund2
2   Fonctionnement          -817465.91  Expense fund1
3   Fonctionnement          1118691.65  Budget  fund1
4   Fonctionnement          -6000       Expense fund3
5   Fonctionnement          -23621.83   Expense fund2
6   Frais de Gestion        -53499      Expense fund2
7   Fonctionnement          15000       Budget  fund3
8   Frais de Gestion        53499       Budget  fund2
9   Fonctionnement          186718.78   Budget  fund2
10  Mission                 1686.47     Budget  fund2
1   Ressource Humaine CDD   38676.53    Budget  fund2


To get an overview of availability by funds in my dash board I pivot it like : piv=cpd.pivot_table(index="type", columns=["fund","source"], values="amount", aggfunc='sum', margins=True, margins_name='Sum')

to get :

fund                    fund1                   fund2                       fund3
source                  Budget      Expense     Budget      Expense         Budget      Expense
type
Fonctionnement          1118691.65  -817465.91  186718.78   -23621.83       15000.00    -6000.00
Frais de Gestion        NaN         NaN         53499.00    -53499.00       NaN         NaN
Mission                 NaN         NaN         1686.47     -1686.47        NaN         NaN
Ressource Humaine CDD   NaN         NaN         38676.53    -36470.36       NaN         NaN

(it misses the total here, but I've got it)

I would like to land on something like :

fund                    fund1                                       fund2                                   fund3
source                  Budget      Expense         total fund1     Budget      Expense     total fund2     Budget      Expense     total fund3
type
Fonctionnement          1118691.65  -817465.91      301 226€        186718.78   -23621.83   163 097€        15000.00    -6000.00    9 000€
Frais de Gestion        NaN         NaN             NaN             53499.00    -53499.00   0               NaN         NaN         NaN 
Mission                 NaN         NaN             NaN             1686.47     -1686.47    0               NaN         NaN         NaN
Ressource Humaine CDD   NaN         NaN             NaN             38676.53    -36470.36   2 207€          NaN         NaN         NaN

I've seen some tips with pandas concat for multi index pivot (eg : Pivot table subtotals in Pandas)

I'm trying to loop by columns or read header or... but I can go further as I'm a deep noob!

How could I insert/append a mid column with sum, and how calculate this sub sum?

2 Answers 2

0

You can do normal pivot and then compute/append the total sum:

# do a normal pivot
df = df.pivot_table(
    index="type",
    columns=["fund", "source"],
    values="amount",
    aggfunc="sum",
)

# compute "sum" dataframes
dfs = []
for c in df.columns.get_level_values(0).unique():
    s = df.loc[:, c].sum(axis=1, skipna=False)
    dfs.append(pd.DataFrame(s, index=s.index, columns=[(c, f"Total {c}")]))

# concat them together, sort the columns:
out = pd.concat([df, pd.concat(dfs, axis=1)], axis=1)
out = out[sorted(out.columns)]
print(out)

Prints:

fund                        fund1                             fund2                          fund3                    
source                     Budget    Expense Total fund1     Budget   Expense Total fund2   Budget Expense Total fund3
type                                                                                                                  
Fonctionnement         1118691.65 -817465.91   301225.74  186718.78 -23621.83   163096.95  15000.0 -6000.0      9000.0
Frais de Gestion              NaN        NaN         NaN   53499.00 -53499.00        0.00      NaN     NaN         NaN
Mission                       NaN        NaN         NaN    1686.47  -1686.47        0.00      NaN     NaN         NaN
Ressource Humaine CDD         NaN        NaN         NaN   38676.53 -36470.36     2206.17      NaN     NaN         NaN
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Andrej, It works super great. Now I have to understand how your code works!
@Ben Try to add debug prints to the code :) The idea is simple: to add "sum"-dataframes to original pivoted dataframe by concatenating.
0

I also used the approach of adding sum rows back in. But with a couple of extra features:

  • Summaries for each level (3-levels of multi-index dataframe)
  • Dereferencing the column names from a dict
col = {'a':'Column A','b':'Column B','c':'Column C', 'v1':'Value 1', 'v2':'Value 2'}

# Create Pivot Table
r1 = pd.pivot_table(df, index=[col['a'],col['b'],col['c']], values=[col['v1'],col['v2']], 
  margins=True, margins_name='All', aggfunc='sum')

# Create 1st-level Sums
r1s2 = r1.drop('All', level=0).groupby([col['a']]).sum()
  .assign(**{col['b']:'subtotal', col['c']:'(A)'})
  .set_index([col['b'],col['c']], append=True)

# Create second-level Sums
r1s = r1.drop('All', level=0).groupby([col['a'],col['b']]).sum()
  .assign(**{col['c']:'subtotal (B)'})
  .set_index(col['c'], append=True)

# Merge three results together
r1 = pd.concat([r1,r1s2,r1s]).sort_index()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.