2

I recently had to update the virtual environment for one of my libraries from Python 3.7 to 3.10, which also involved updating Pandas from 1.1.5 to 2.3.0.

In the previous virtual environment, this used to work fine:

  • my_index is a string, specificly "SARON"
  • fixing_df is a dataframe as follows:
         rate_name   date    fixing
    
    0        SARON  44200 -0.725865
    
    1        SARON  44201 -0.724515
    
    2        SARON  44202 -0.725798
    
    3        SARON  44203 -0.723893
    
    4        SARON  44204 -0.723406
    
    ...        ...    ...       ...
    
    1124     SARON  45824  0.203260
    
    1125     SARON  45825  0.203843
    
    1126     SARON  45826  0.203602
    
    1127     SARON  45827  0.185057
    
    1128     SARON  45828 -0.049324

The following code worked with no issues:

index_df = dict(tuple(fixing_df.groupby(['name'])))[my_index]

After updating the virtual environment, the line above was causing a bug (KeyError: 'SARON'), and the following solution worked:

my_index = (my_index,)    
index_df = dict(tuple(fixing_df.groupby(['name'])))[my_index]

In other words, somehow, the dictionary key changed its format after swithcing the virtual environment.

Whilst I temporarily fixed the issue as described above, I am still trying to understand what's really causing this strange behavior. Would anyone be able to point me in the right direction?

3
  • 1
    How does fixing_df does looks like? Commented Jun 25 at 13:06
  • 2
    always put full error message because there are other useful information. Commented Jun 25 at 13:15
  • 3
    maybe first use print() to see what you have in fixing_df.groupby(['name']) and next in tuple(fixing_df.groupby(['name'])), and next in dict(tuple(fixing_df.groupby(['name']))). And run it with pandas 1.x and 2.x. Maybe problem is not environment but how works new pandas. Maybe they decide to change something Commented Jun 25 at 13:17

1 Answer 1

4

After some testing with pandas 2 I found that

import pandas as pd
df = pd.DataFrame({'name':['Able','Able','Baker'],'value':[1,3,5]})
for key, value in df.groupby('name'):  # groupby argument is str
    print(type(key))  # gives <class 'str'>
for key, value in df.groupby(['name']):  # groupby argument is list
    print(type(key))  # gives <class 'tuple'>

so behavior depends on type of argument shoved into groupby.

(tested in pandas 2.1.4)

Sign up to request clarification or add additional context in comments.

1 Comment

Documented on what's new 2.0.0 > When providing a list of columns of length one to DataFrame.groupby(), the keys that are returned by iterating over the resulting DataFrameGroupBy object will now be tuples of length one (GH 47761)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.