Christian Staudt (@C_L_Staudt)
I am an independent data scientist with a background in computer science, in-depth in algorithms, data analysis, high-performance computing and software engineering. My current interests include machine learning and data visualization.
Pythonistas have access to an extensive collection of tools for data analysis. The space of tools is best understood as an ecosystem: Libraries build upon each other, and a good library fills an ecological niche by doing certain jobs well. This is a guided tour of the Python data science ecosystem.
The Python Ecosystem for Data Science: A Guided Tour
Python is on its way to becoming the lingua franca of data science, and Pythonistas have access to an impressive and extensive collection of tools for data analysis. Here, a data scientist needs to see the forest for the trees: The space of tools is best understood as an ecosystem, where libraries build upon each other, and where a good library fills an ecological niche by doing certain jobs well. This talk is a guided tour of the Python data science ecosystem. More than a list of libraries, it aims to provide some structure, classing tools by type of data, size of data, and type of analysis. In our tour, we visit a number of areas, including working with tabular data (numpy, pandas, dask, ...) and graph data (e.g. networkx), statistics (e.g. statsmodels), machine learning (scikit-learn, ...), data visualization (matplotlib, seaborn, bokeh, ...). Aspiring data scientists, and everyone else working with data, should find this useful for selecting the right tools for their next data-driven project.
Recorded at PyCon.DE 2017 Karlsruhe: https://de.pycon.org/
Video editing: Sebastian Neubauer & Andrei Dan
Tools: Blender, Avidemux & Sonic Pi