Contribute Media
A thank you to everyone who has made this possible: Read More

Probablistic Programming Data Science with PyMC3

Description

PyData London 2016

Probabilistic programming is a new paradigm that greatly increases the number of people who can successfully build statistical models and machine learning algorithms, and makes experts radically more effective. This talk will provide an overview of PyMC3, a new probabilistic programming package for Python featuring intuitive syntax and next-generation sampling algorithms.

Machine learning is the driving force behind many recent revolutions in data science. Comprehensive libraries provide the data scientist with many turnkey algorithms that have very weak assumptions on the actual distribution of the data being modeled. While this blackbox property makes machine learning algorithms applicable to a wide range of problems, it also limits the amount of insight that can be gained by applying them.

The field of statistics on the other hand often approaches problems individually and hand-tailors statistical models to specific problems. To perform inference on these models, however, is often mathematically very challenging, and thus requires time-deriving equations as well as simplifying assumptions (like the normality assumption) to make inference mathematically tractable.

Probabilistic programming is a new programming paradigm that provides the best of both worlds and revolutionizes the field of machine learning. Recent methodological advances in sampling algorithms like Markov Chain Monte Carlo (MCMC), as well as huge increases in processing power, allow for almost complete automation of the inference process. Probabilistic programming thus greatly increases the number of people who can successfully build statistical models and machine learning algorithms, and makes experts radically more effective. Data scientists can create complex generative Bayesian models tailored to the structure of the data and specific problem at hand, but without the burden of mathematical tractability or limitations due to mathematical simplifications.

This talk will provide an overview of PyMC3, a new probabilistic programming package for Python featuring intuitive syntax and next-generation sampling algorithms.


PyData is a gathering of users and developers of data analysis tools in Python. The goals are to provide Python enthusiasts a place to share ideas and learn from each other about how best to apply our language and tools to ever-evolving challenges in the vast realm of data management, processing, analytics, and visualization.

We aim to be an accessible, community-driven conference, with tutorials for novices, advanced topical workshops for practitioners, and opportunities for package developers and users to meet in person.

www.pydata.org

Notebook: https://gist.github.com/anonymous/9287a213fe188a79d7d7774eef79ad4d

Slides: https://docs.google.com/presentation/d/1QNxSjDHJbFL7vFwQHHheeGmBHEJAo39j28xdObFY6Eo/edit

Twitter: https://twitter.com/twiecki

Improve this page