The pandas and scikit-learn packages combine together to produce a powerful toolkit for data analytics. In this talk, we will be using them together to analyse the outcome of NBA games, trying to predict the winner of a match. There is plenty of data out there to allow us to create good predictions – the key is getting it in the right format and building the right model.
In this talk we will go through importing data from the net, cleaning it up, creating new features, and building a predictive model. We then evaluate how well we did, using recent NBA data. The model we use will be a decision tree ensemble called a random forest.