Contribute Media
A thank you to everyone who makes this possible: Read More

Dataframe Validation In Python - A Practical Introduction


As Machine Learning models rely on data in order to make their predictions, data quality evaluation is a crucial aspect of any ML pipeline. We as Engineers/Data-Scientists, should validate our data in the same manner in which we validate our code. Data errors can lead to: Bad and costly decisions, Inaccurate predictions due to invalid data and Time waste. There is an abundance of different libraries that perform various kinds of data integrity checks. I will specifically focus on Dataframe validation.

In this talk, I will present the problem and give a practical overview (accompanied by Jupyter Notebook code examples) of three libraries that aim to address it:

By the end of this talk, you will understand the Importance of data validation and get a sense of how to integrate data validation principles as part of the ML pipeline.

Improve this page