Day 2, 11:30-12:00
Imagine you are a data engineer and in charge of a data pipeline. What do you think is the most important thing for the data pipeline? I think it is definitely the quality of data! However, the more complex your data pipeline becomes, the harder it is to maintain the data quality. For example, what if the format of some source data is changed without being noticed? What if some program update includes a bug? Such things cause data issues. It can take a long time to find the issues. Or even worse, your stakeholders may find the issues before you do! Great Expectations helps you solve such problems. It is a Python-based open-source library for validating, documenting, and profiling your data. It allows you to define the shape of data, test data, and document the results. In this talk, I will introduce you Great Expectations and share my experience with it. Let's make your data pipeline robust with Great Expectations!
Great Expectations A Python-based open-source library for validating, documenting, and profiling your data.
Speaker: Keisuke Nishitani
A data engineer and python programmer in Osaka, Japan. Working on a data pipeline built on Amazon Redshift, Amazon S3 and AWS Lambda. Interested in data workflow frameworks, data analysis and data visualization.