Dataframe behavior: Pandas 1.1.5 vs 2.3.0

Question

I recently had to update the virtual environment for one of my libraries from Python 3.7 to 3.10, which also involved updating Pandas from 1.1.5 to 2.3.0.

In the previous virtual environment, this used to work fine:

my_index is a string, specificly "SARON"
fixing_df is a dataframe as follows:

         rate_name   date    fixing
    
    0        SARON  44200 -0.725865
    
    1        SARON  44201 -0.724515
    
    2        SARON  44202 -0.725798
    
    3        SARON  44203 -0.723893
    
    4        SARON  44204 -0.723406
    
    ...        ...    ...       ...
    
    1124     SARON  45824  0.203260
    
    1125     SARON  45825  0.203843
    
    1126     SARON  45826  0.203602
    
    1127     SARON  45827  0.185057
    
    1128     SARON  45828 -0.049324

The following code worked with no issues:

index_df = dict(tuple(fixing_df.groupby(['name'])))[my_index]

After updating the virtual environment, the line above was causing a bug (KeyError: 'SARON'), and the following solution worked:

my_index = (my_index,)    
index_df = dict(tuple(fixing_df.groupby(['name'])))[my_index]

In other words, somehow, the dictionary key changed its format after swithcing the virtual environment.

Whilst I temporarily fixed the issue as described above, I am still trying to understand what's really causing this strange behavior. Would anyone be able to point me in the right direction?

always put full error message because there are other useful information. — furas
– furas, Commented Jun 25 at 13:15
maybe first use print() to see what you have in fixing_df.groupby(['name']) and next in tuple(fixing_df.groupby(['name'])), and next in dict(tuple(fixing_df.groupby(['name']))). And run it with pandas 1.x and 2.x. Maybe problem is not environment but how works new pandas. Maybe they decide to change something — furas
– furas, Commented Jun 25 at 13:17

Daweo · Accepted Answer · 2025-06-25 14:44:06Z

4

After some testing with pandas 2 I found that

import pandas as pd
df = pd.DataFrame({'name':['Able','Able','Baker'],'value':[1,3,5]})
for key, value in df.groupby('name'):  # groupby argument is str
    print(type(key))  # gives <class 'str'>
for key, value in df.groupby(['name']):  # groupby argument is list
    print(type(key))  # gives <class 'tuple'>

so behavior depends on type of argument shoved into groupby.

(tested in pandas 2.1.4)

edited Jun 25 at 14:44

answered Jun 25 at 13:26

Daweo

38.2k3 gold badges18 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

LMC Jul 7 at 19:42

Documented on what's new 2.0.0 > When providing a list of columns of length one to DataFrame.groupby(), the keys that are returned by iterating over the resulting DataFrameGroupBy object will now be tuples of length one (GH 47761)

Collectives™ on Stack Overflow

Dataframe behavior: Pandas 1.1.5 vs 2.3.0

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related