Tell Me Something I Don't Know: Analyzing OkCupid Profiles


In this talk, we present an approach for combining natural language processing with machine learning in order to explore the relationship between free text self-descriptions and demographics in OkCupid profile data. We discuss feature representation, clustering and topic modeling approaches, as well as feature selection and modeling strategies. We find that we can predict a user's demographic makeup based on their user essays, and we conclude by sharing some unexpected insights into deception.

