Description
This talk takes you on a whirlwind tour of data quality tools in Python. Data quality has been a hot topic for a while, and there are several open source Python-based packages out there that all cover different aspects of the broad concept we call "data quality". But - which one is the right tool for my use case? What do these tools do (or not do), and how do I know which one to pick?
We'll first dive into a classification of the different types of data quality tools in the space, before looking at some hands-on examples of some of the most prominent packages. We'll be making brief stops at tools like pydqc, datagristle, bulwark, dvc, dedupe, and Great Expectations, and see hands-on demos of each tool. The audience will walk away with a better understanding of "what's out there" and a little boost to get started using their tool of choice!