In Machine Learning and Data Science in general, understanding the data is paramount. This understanding can come from many different sources and techniques: domain expertise, exploratory analysis, SMEs, some specific Machine Learning techniques, and feature engineering. As a matter of fact, most Machine Learning and Statistical analysis strongly depends on how the data is prepared, thus making feature engineering very important for any serious Machine Learning enterprise.
“Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data.”
In this talk we will discuss what feature engineering and feature selection are; how to select important features in a real-world dataset and how to develop a simple, but powerful ensemble to measure feature importance and perform feature selection.
Familiarity with intermediate concepts of the Python programming language is required to follow the implementation steps. General knowledge of the basic concepts of Machine Learning and data cleaning will be useful, but not strictly necessary, to follow the discussion on feature selection and feature engineering.