Contribute Media
A thank you to everyone who makes this possible: Read More

Kandinsky: Using KMeans (and friends) to play with the colors of photograph(s)

Description

Clustering is tricky yet absolutely essential for many a Machine Learning initiative. The what, the how and the why confound each time we look at the data, whether it is customer segmentation (or cohort) analysis or it is finding centers of influence or breaking down a population into groups to build different models for each.

Studying clustering algorithms like KMeans using toy datasets is insufficient (and often tedious) because it does not let you experience real-world problems. For e.g. the problem when the centroids don't settle, or situations where we have too many or too few clusters. Which distance measure to use and when? How to prepare (normalize? standardize?) the dataset for clustering?

Also, not too many real-world scenarios are "visual", unless we plot a graph or two, and that fails when we deal with higher dimensions.

What if we could use a non-trivial but visual data source? Like the colors and pixels of a photograph, where we could see the data that went in and the resultant output clusters?

The obvious takeaways of this talk, in my experience, are that Data Science and Data Engineering practitioners gain a deeper understanding of what's going on in the clustering algorithms in a fun, very "visual" and engaging manner; and also build a better intuition about the best approach to take for solving a problem.

Details

Improve this page