Feature Importance and Ensemble Methods : a new perspective

YouTube

Description

Ensemble methods are extremely performant in terms of prediction, but lack easy interpretation. Feature importance is not only counting up how many times a feature has been used in a weak learner, but also by how much this feature contributes to the result. Detailed example and implementation are provided in a jupyter notebook in python for the library "xgboost" of extreme gradient boosting.

Abstract

I - Feature importance in ensemble algorithms - state of the art

Feature importance in sklearn/xgboost: basically counts the occurrences of a feature in all the weak learners
Construction of the trees in xgboost: if the trees are deep enough, every feature is going to be used
Global feature importance is a misleading: a given feature might be critical for a given subpopulation but completely irrelevant for another (ex : multi-class classification)

II - Xgboost real feature importance

Prediction influence: first splits influence the prediction more than last splits, so the importance of a feature must be weighted by the discrimination it provides
Point-to-point feature importance: following the path of a given prediction, it is possible to weigh the importance of every used feature
A relevant assessment of feature importance: explanation of a given prediction, and aggregation on a set of data points

III - Implementation and examples

Point-to-point feature importance illustration and implementation explanation
Evolution of feature importance with respect to learning iterations
Noisy variables cancellation

IV - Limits and ways forward

A word on correlated variables
Is there a compromise performance/interpretation ?

PyVideo

Feature Importance and Ensemble Methods : a new perspective

Description

Details