Analysing user behaviour - from histograms to random forests (PyData)


The goal is to give the audience a roadmap for analysing user data using python friendly tools. I will touch on many aspects of the data science pipeline from data cleansing to building predictive data products at scale. I will start gently with pandas and dataframes and then discuss some machine learning techniques like kmeans and random forests in scikitlearn and then introduce Spark for doing it at scale. I will focus more on the use cases rather than detailed implementation. The talk will be informed by my experience and focus on user behaviour in games and mobile apps.


