Pandas is an extremely powerful library for data analysis. With that power comes complexity. This tutorial will focus on the core features of pandas, which handle most data munging tasks. The emphasis will be on practical applications, illustrating solutions to common problems using real-world data.
The motivation of this tutorial mirrors that of pandas itself: practicality. A brief discussion on the problems pandas tries to solve will help frame the rest of the tutorial. We'll aim for an intuitive understanding of each new method and data structure. This will help keep us from getting overwhelmed by the options available as we expand our data munging toolkit. The start of the talk will focus on the core operations of
Selecting and Indexing Reshaping and Tidy Data Summarization Grouped operations Merging and Joining These operations can be combined into "pandastic" method chains that flow seamlessly from data IO to analysis.
Time permitting we'll look at some of the more specialized areas of pandas including Categoricals, time-series analysis, Hierarchical Indexes, chunked / out of core processing, and data pipelines.
Learning to use a library the size of pandas is a huge commitment. What's more, your goal is rarely achieved just with pandas. Rather, pandas gets you to the point where you can begin your interesting analysis. We'll build the foundation to quickly get you past the data munging, to the analysis.
Materials: - slides: http://www.slideshare.net/PyData/pandas-head-to-tail-slidestom-augspurger - Github repo: https://github.com/tomaugspurger/pydataseattle - nbviewer link to notebooks: http://nbviewer.ipython.org/github/TomAugspurger/PyDataSeattle/tree/master/notebooks/