0

I tried to exclude a few outliers from a pandas dataframe, but the function just return the same table without any difference.I can't figure out why.

excluding outliers

def exclude_outliers(DataFrame, col_name):
    interval = 2.5*DataFrame[col_name].std()
    mean = DataFrame[col_name].mean()
    m_i = mean + interval 
    DataFrame = DataFrame[DataFrame[col_name] < m_i]
 

outlier_column = ['util_linhas_inseguras', 'idade', 'vezes_passou_de_30_59_dias', 'razao_debito', 'salario_mensal', 'numero_linhas_crdto_aberto',
                  'numero_vezes_passou_90_dias', 'numero_emprestimos_imobiliarios', 'numero_de_vezes_que_passou_60_89_dias', 'numero_de_dependentes']

for col in outlier_column:
    exclude_outliers(df_train, col)

df_train.describe()
1
  • Can you paste some sample data which we can copy and run the code Commented Apr 26, 2021 at 22:49

1 Answer 1

1

As written, your function doesn't return anything and, as a result, your for loop is not making any changes to the DataFrame. Try the following:

At the end of your function, add the following line:

def exclude_outliers(DataFrame, col_name):
   ...  # Function filters the DataFrame
   # Add this line to return the filtered DataFrame
   return DataFrame

And then modify your for loop to update the df_train:

for col in outlier_column:
    # Now we update the DataFrame on each iteration
    df_train = exclude_outliers(df_train, col)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.