Data Preprocessing Comes First

Photo by M. B. M. on Unsplash
  1. Should I solve a problem about regression or classification?
  2. Which model would be best for my dataset?
  3. Heard that random forest works better, is that true?
  • Do your dataset has some outliers?
  • What are the relations in your dataset features? How much strong they are?
  • What is your target variable? Do you need any encoding of data?
  • Do your data need some scaling?
  • Is your target variable balanced?
  • Are your independent variables enough or need some feature engineering?
  • Which features are more important for your target variable?
  • And finally which model are you going for? Is your data likely to your selected model?

