For many, wine is a taste acquired over many years; others use data. In this talk, I recite the story of how I uncovered the constituents of a good wine using Data Visualization while discussing the nuances of Exploratory Data Analysis (EDA) – the process of taking the first glance at data.
Idea Behind The Talk
With the rise of tools allowing for smooth implementation of powerful algorithms, it is tempting to skip EDA. However, EDA is just as important as any part of a data project; if you don't know your data well enough, you can end up doing very shallow work , i.e., inaccurate models, choosing wrong variables, inefficient use of resources, or all of the above. Sometimes, EDA uncovers more than what the confirmatory study would've done otherwise.
Exploratory Data Analysis is what one should do when first encountering a dataset. However, it's not a one-off process: there are setbacks, multiple iterations, and the process sets the tone for a more formal analysis of data in hand. With a story-like format, the presentation mentions the setbacks one faces when performing a real data study.
The motivation behind creating this talk is to impart the idea of Exploratory Data Analysis, and how Data Visualizations help uncover patterns (not limited to the findings of the wine study). Moreover, I believe the format of a sharing a real story with the idea of "how to reach and infer from a specific plot" would help the audience understand data visualization better than talking about syntactic sugar of a particular visualization library. Moreover, the ideas can be further generalized to any other visualization library.
Outline of the Talk
- History of Wine & Data Science
- Introduction to Exploratory Data Analysis (EDA)
- Why Data Visualization? – Anscombe's Quartet
- The Grammar of Graphics: Why I used ggplot2?
Wine Project: Finding Constituents of Good Red & White Wines
- About the Project & How to Replicate It?
- How to Quantify ‘Artistic’ Measures?
- Principles of Data Visualization to Uncover Patterns
- Inspecting Data Using Univariate, Bivariate & Multivariate Plots
Aspects of EDA Not Used In the Wine Project
- How to Prepare Your Dataset? – Data Aggregation
- How to Remove Outliers Using Data Visualization
- How to Decide the Best Fit During EDA
- When to Transform Variables