Contribute Media
A thank you to everyone who has made this possible: Read More

Natural Language Processing with NLTK and Gensim


Speakers: Tony Ojeda, Benjamin Bengfort, Laura Lorenz

In this tutorial, we will begin by exploring the features of the NLTK library. We will then focus on building a language-aware data product - a topic identification and document clustering algorithm from a web crawl of blog sites. The clustering algorithm will use a simple Lesk K-Means clustering to start, and then will improve with an LDA analysis using the popular Gensim library.

Slides can be found at: and

Improve this page