Contribute Media
A thank you to everyone who makes this possible: Read More

Hierarchical Data Clustering in Python

Summary

Clustering of data is an increasingly important task for many data scientists. This talk will explore the challenge of hierarchical clustering of text data for summarisation purposes. We'll take a look at some great solutions now available to Python users including the relevant Scikit Learn libraries, via Elasticsearch (with the carrot2 plugin), and check out visualisations from both approaches.

Description

  • Background: methods for clustering text data and the challenge of data summarisation
  • Hierarchical clustering: agglomerative vs divisive
  • sklearn.cluster and metrics modules
  • Elasticsearch + carrot2 plugin
  • Performance comparisons, assessment of ease of scalability and use
  • Static visualisation using Matplotlib, interactive using Foamtree

Details

Improve this page