0

I would like to get some suggestions about the best way to normalize my dataset to use as ML inputs.

My dataset looks like this:

    -------------------------------------------------------------------------
    |   date   | holiday | weekday |  type  | max_temp | min_temp |   qty   |
    -------------------------------------------------------------------------
 1  | 01/31/22 |    0    |   tue   | casual |   35.25  |  23.44   |  1,358  |
 2  | 07/02/21 |    1    |   mon   | member |   34.33  |   7.29   |  1,358  |    
 3  | 03/12/20 |    0    |   sat   | casual |   12.21  |   2.18   |  1,358  |    
... 
 n

I'm using Python to clean the data, and I intend to use this dataset to apply some linear regression, random forests, and XGBoost algorithm to predict the last column (qty).

Any suggestions for the best practice to prepare my data?

6
  • From my previous experience, I know that it's possible to turn all data into dummies, but that is the really the best practice? I think about to create more columns and fill with 0 and 1 for the different classifications of the categorical data, but i am not certainly about increasing the size of the dataset. Commented Sep 15, 2022 at 23:22
  • 1
    The question is too broad. I recommend you the book hands on Machine Learning from Aurelien Geron. Regards. Commented Sep 15, 2022 at 23:57
  • Searching, I found that a good way is to convert categorical data into dummy variables with the "pd.get_dummies()". I will try to do this. Commented Sep 16, 2022 at 0:59
  • There are specific functions for that in scikit learn. Have a look to ordinal encoder, one hot encoder, label encoder. Each one has a specific use. Regards. Commented Sep 16, 2022 at 1:48
  • Thank you @LuisAlejandroVargasRamos, for my dataset, I think that the most adequate is to turn the categorial data into dummies through "one hot encoder". I tried that on my notebook and looks good. I am going now to see if the ML model runs OK with that methodology. Commented Sep 16, 2022 at 2:29

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.