0

I have a DataFrame with multiple columns. I am trying to normalize all the columns except for one, price.

I found a code that works perfectly on a sample DataFrame I created, but when I use it on the original DataFrame I have, it gives an error ValueError: Columns must be same length as key

Here is the code I am using:

df_final_1d_normalized = df_final_1d.copy()

cols_to_norm = df_final_1d.columns[df_final_1d.columns!='price']
df_final_1d_normalized[cols_to_norm] = df_final_1d_normalized[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))

The issue is with reassigning the columns to themselves in the third line of code.

Specifically, this works df_final_1d_normalized[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min())).

But, this does not work df_final_1d_normalized[cols_to_norm] = df_final_1d_normalized[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))

Here is a sample dataframe in case you want to test it out to see that it actually works on other DataFrames

df  = pd.DataFrame()
df['A'] = [1,2,3,4, np.nan, np.nan]
df['B'] = [2,4,2,4,5,np.nan]
df['C'] = [np.nan, np.nan, 4,5,6,3]
df['D'] = [np.nan, np.nan, np.nan, 5,4,9]

df_norm = df.copy()
cols_to_norm = df.columns[df.columns!="D"]
df_norm[cols_to_norm] = df_norm[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))

What could the error be?

1 Answer 1

0

If I am understanding correctly, you dont need a lambda function. You can just write:

df_final_1d_normalized[cols_to_norm] = (df_final_1d_normalized[cols_to_norm] - df_final_1d_normalized[cols_to_norm].min())/(df_final_1d_normalized[cols_to_norm].max() - df_final_1d_normalized[cols_to_norm].min())

This will do the work.

Here is the example from the question:

df  = pd.DataFrame()
df['A'] = [1,2,3,4, np.nan, np.nan]
df['B'] = [2,4,2,4,5,np.nan]
df['C'] = [np.nan, np.nan, 4,5,6,3]
df['D'] = [np.nan, np.nan, np.nan, 5,4,9]

df_norm = df.copy()
cols_to_norm = df.columns[df.columns!="D"]
df_norm[cols_to_norm] = (df_norm[cols_to_norm] - df_norm[cols_to_norm].min()) / (df_norm[cols_to_norm].max() - df_norm[cols_to_norm].min())
df_norm

The result is then:

    A           B           C           D
0   0.000000    0.000000    NaN         NaN
1   0.333333    0.666667    NaN         NaN
2   0.666667    0.000000    0.333333    NaN
3   1.000000    0.666667    0.666667    5.0
4   NaN         1.000000    1.000000    4.0
5   NaN         NaN         0.000000    9.0
Sign up to request clarification or add additional context in comments.

4 Comments

I am still getting the same error ValueError: Columns must be same length as key
@MathMan99 I am using your code and it is working: df_norm[cols_to_norm] = (df_norm[cols_to_norm] - df_norm[cols_to_norm].min()) / (df_norm[cols_to_norm].max() - df_norm[cols_to_norm].min())
I'll dig deeper on my end to see what the issue could be. The code works on the sample DataFrame. It's the reassigning part that's giving me the error.
@MathMan99 here is an answer for a similar problem: stackoverflow.com/questions/46585193/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.