Contribute Media
A thank you to everyone who makes this possible: Read More

Football (soccer) data analysis: A Pedagogic introduction – Indranil Ghosh (PyCon Taiwan 2021)


Day 2, 10:35-11:20


Nowadays data is the key to solving challenges in most of the fields say astronomy, applied maths, health care, or even sports. Data is everywhere and it is super beneficial to leverage these data to build models for making sense out of these data and tell stories. The field of sports science, especially football (soccer), is enriched by data analysis that makes us understand the game better and predict outcomes. Many people want to delve deep into football (soccer) data analysis and get their hands dirty. This talk is to help them do so by pedagogically introducing them to the introductory analytical methodologies to overcome the initial barriers of the field and start working with football data analyses and visualizations.


This talk introduces the following concepts on football data analysis:

I will start my talk addressing how to get open access football event data using the statsbomb API using Python,

The next thing I will talk about is drawing a football pitch using the mplsoccer Python module, so that we can start making most of our football data visualizations on this pitch,

I will then talk about simple data visualizations like drawing shot maps, pass maps and their corresponding heat maps,

Next I will teach how to visualize a pass network on this pitch of a particular team during a particular game. We will further advance our knowledge by analyzing this pass network using the NetworkX python module that is usually used in complex network analysis in mathematics. We will learn how to calculate pass degree distributions of each player, find out which player was the most central in that pass network by calculating "centrality" of each player node, and so on,

After that, I will teach how to implement computational geometric concepts like Convex Hulls, Voronoi diagrams and Delaunay triangulations using the Python package scipy.spatial and mplsoccer on open access football tracking data so that we can analyze how many passes were available to a player at a particular instance of a game, or how a group of players broke down space on the pitch at a particular instance, etc., and

Finally I will talk about how to analyze Expected Goals (xG) using open data from statsbomb.

I will end my talk guiding the audience to the references I used for starting with football (soccer) data analysis.

One use-case for soccer data analysis is, for example, using network analysis to find out the central player from a particular team who has been performing constantly in last few matches. This player can be considered as one of the backbones for that team and are recommended to be deployed in future games.

Slides not uploaded by the speaker. HackMD:

Speaker: Indranil Ghosh

I am a first-year Ph.D. student in applied mathematics from the School of Fundamental Sciences, Massey University. My research is on dynamical systems and robust chaos. I have a master's in Physics from Jadavpur University. I am mostly interested in dynamical systems, computational mathematics, optimization, quantum computing, etc. I am much fascinated with open source software development and write codes mostly in Python and R, and sometimes Fortran. I have developed the R package "QGameTheory", which is an open-source R tool to work with the basics of quantum computing and game theory simulations. I love presenting my learnings on national/international platforms and have presented in conferences on Python, R, Open source software, etc. Website:


Improve this page