Hierarchical Data Clustering in Python

YouTube

Summary

Clustering of data is an increasingly important task for many data scientists. This talk will explore the challenge of hierarchical clustering of text data for summarisation purposes. We'll take a look at some great solutions now available to Python users including the relevant Scikit Learn libraries, via Elasticsearch (with the carrot2 plugin), and check out visualisations from both approaches.

Description

Background: methods for clustering text data and the challenge of data summarisation
Hierarchical clustering: agglomerative vs divisive
sklearn.cluster and metrics modules
Elasticsearch + carrot2 plugin
Performance comparisons, assessment of ease of scalability and use
Static visualisation using Matplotlib, interactive using Foamtree

PyVideo

Hierarchical Data Clustering in Python

Summary

Description

Details