Gensim is fairly popular NLP library available in Python. In addition to having implementations of several popular algorithms, it has a utilities that make working with the corpus itself easier.
In this talk I'd like to give an overview of Gensim, and then two examples. One will illustrate an LDA example, then I'll show a somewhat novel use of Word2Vec to understand user preferences.
Overview: The overview will follow the general arc of an NLP project. Reading the corpus, here this is done with gensim's streaming API. Transformations, often a transformation to BOW is done, and potentially something like TFIDF. Training the model from the corpus. Working with the result for analysis or otherwise. Examples: This will be a straight forward application: topic discovery on a corpus and then analyzing the resulting topics to look for patterns. Next I'll cover how to use Gensim's Word2Vec implementation to better understand customer preferences.
Slides available here: http://blog.trenthauck.com/portfolio/presentation.pdf