In this tutorial, we intend to do automated modeling on a subset of the loan- level data from Fannie Mae and Freddie Mac using H2O's automated algorithm(AutoML). We will solve a binary classification problem (predicting if a loan is delinquent or not). Also, we will explore a regression use-case (predicting interest rates on the same dataset). We will be using the h2o Python module in a JupyterLab.
Choosing the best machine learning models and tuning them can be time consuming and exhaustive. Often, it requires levels of expertise to know what parameters to tune. The field of Automated Machine Learning (AutoML) focuses on solving this issue. AutoML is useful both for experts, by automating the process of choosing and tuning a model; and for non-experts as well, by helping them to create high performing models in a short time frame. H2O is an open-source, distributed machine learning platform with APIs in Python, R, Java, and Scala. H2O AutoML is an automated algorithm for automating the machine learning workflow, which includes automatic training, hyper-parameter optimization, model search and selection under time, space, and resource constraints. H2O's AutoML further optimizes model performance by stacking an ensemble of models.
- Basic knowledge of Machine Learning
- Familiarity with Python
- H2O installed on local machine or cloud environment
- Quick H2O installation (requires Java and h2o Python module)
- Task 0: Introduction to Automatic Machine Learning, H2O and H2O AutoML (15 min)
- Task 1: Importing libraries, initializing H2O, importing data (5 min)
- Task 2: Data Preparation and Transformations (5 min)
- Task 3: H2O AutoML Classification and Model Evaluation (Interpretation) (15 min)
- Task 4: H2O AutoML Regression and Model Evaluation (Interpretation) (15 min)
- Task 5: H2O AutoML Classification in Flow (10 min)
- Task 6: H2O AutoML Regression in Flow (15 min)
- Task 7: Q&A (10 min)