Python: Issue with reassigning columns to DataFrame

Question

I have a DataFrame with multiple columns. I am trying to normalize all the columns except for one, price.

I found a code that works perfectly on a sample DataFrame I created, but when I use it on the original DataFrame I have, it gives an error ValueError: Columns must be same length as key

Here is the code I am using:

df_final_1d_normalized = df_final_1d.copy()

cols_to_norm = df_final_1d.columns[df_final_1d.columns!='price']
df_final_1d_normalized[cols_to_norm] = df_final_1d_normalized[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))

The issue is with reassigning the columns to themselves in the third line of code.

Specifically, this works df_final_1d_normalized[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min())).

But, this does not work df_final_1d_normalized[cols_to_norm] = df_final_1d_normalized[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))

Here is a sample dataframe in case you want to test it out to see that it actually works on other DataFrames

df  = pd.DataFrame()
df['A'] = [1,2,3,4, np.nan, np.nan]
df['B'] = [2,4,2,4,5,np.nan]
df['C'] = [np.nan, np.nan, 4,5,6,3]
df['D'] = [np.nan, np.nan, np.nan, 5,4,9]

df_norm = df.copy()
cols_to_norm = df.columns[df.columns!="D"]
df_norm[cols_to_norm] = df_norm[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))

What could the error be?

coco18 · Accepted Answer · 2023-02-10 19:34:10Z

0

If I am understanding correctly, you dont need a lambda function. You can just write:

df_final_1d_normalized[cols_to_norm] = (df_final_1d_normalized[cols_to_norm] - df_final_1d_normalized[cols_to_norm].min())/(df_final_1d_normalized[cols_to_norm].max() - df_final_1d_normalized[cols_to_norm].min())

This will do the work.

Here is the example from the question:

df  = pd.DataFrame()
df['A'] = [1,2,3,4, np.nan, np.nan]
df['B'] = [2,4,2,4,5,np.nan]
df['C'] = [np.nan, np.nan, 4,5,6,3]
df['D'] = [np.nan, np.nan, np.nan, 5,4,9]

df_norm = df.copy()
cols_to_norm = df.columns[df.columns!="D"]
df_norm[cols_to_norm] = (df_norm[cols_to_norm] - df_norm[cols_to_norm].min()) / (df_norm[cols_to_norm].max() - df_norm[cols_to_norm].min())
df_norm

The result is then:

    A           B           C           D
0   0.000000    0.000000    NaN         NaN
1   0.333333    0.666667    NaN         NaN
2   0.666667    0.000000    0.333333    NaN
3   1.000000    0.666667    0.666667    5.0
4   NaN         1.000000    1.000000    4.0
5   NaN         NaN         0.000000    9.0

edited Feb 10, 2023 at 19:34

answered Feb 10, 2023 at 19:07

coco18

1,1551 gold badge12 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

MathMan 99 Over a year ago

I am still getting the same error ValueError: Columns must be same length as key

coco18 Over a year ago

@MathMan99 I am using your code and it is working:

df_norm[cols_to_norm] = (df_norm[cols_to_norm] - df_norm[cols_to_norm].min()) / (df_norm[cols_to_norm].max() - df_norm[cols_to_norm].min())

MathMan 99 Over a year ago

I'll dig deeper on my end to see what the issue could be. The code works on the sample DataFrame. It's the reassigning part that's giving me the error.

coco18 Over a year ago

@MathMan99 here is an answer for a similar problem: stackoverflow.com/questions/46585193/…

Collectives™ on Stack Overflow

Python: Issue with reassigning columns to DataFrame

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related