Contribute Media
A thank you to everyone who makes this possible: Read More

Common pitfalls leading to wrongly estimated model performance

Description

Overfitting is something every data scientist is aware of. Using techniques like cross validation can help detect overfitting. Unfortunately, regular cross validation still fails detecting certain errors. Assumptions can be violated and intricate feature engineering can lead to target leakage. The goal of this talk is to learn more about the experimental setup and better approaches.

Overfitting is a common issue and something every data scientist is aware of. By using techniques like cross validation, metrics can be used to approximate the performance of a model on unseen data. Unfortunately, regular cross validation often still fails the assumptions required for unbiased performance estimation. Certain statistical assumptions can be violated and intricate feature engineering can introduce obscure target leakage that lead to biased estimations.

The statistical assumptions that we will talk about are the i.i.d. assumption and the lack of concept drift. The i.i.d. assumption means that random samples are independent and identically distributed (i.i.d). The lack of concept drift entails that samples are stationary if we look at the time dimension of data collection, which means that the relationship between the features and targets does not depend on the implicit time.

Validation schemas are meant to simulate reality as closely as possible. We will look at the theory behind training, validation and test sets before discussing issues with standard crossvalidation. Possible solutions include nested crossvalidation, time window validation and grouped validation. While the only true verification happens in production, we will also look into approaches that minimize the risk of missing target leakage in the validation phase.

The goal of this talk is to learn more about the intuition behind proper experimental setup, potential pitfalls to keep in mind and tools to minimize the associated risks.

Details

Improve this page