Once upon a time, coding for supercomputers meant lots of Fortran. Users were invariably physicists, engineers and mathematicians who already knew how to code. Today, almost all fields of research have big data and ever more complex analysis, necessitating a move from the desktop to large-scale compute. Now there's far more diversity, and happily, much Python being run on supercomputers across the world.
I'm going to talk about the general architecture for supercomputers, the parallel programming patterns to suit them, and how to implement them in Python. This includes the traditional message-passing approaches, as well as modern tools like numpy, dask, cython and numba which mean we can squeeze out performance that's competitive with low-level languages, but a whole lot more fun to write.
Supercomputers are usually managed multi-user environments, and so setting up the environment and packages needed for your code takes some thought. You can use what's pre-installed for you, install your own packages in a virtual environment, or go all-out and put everything in a container. We'll go over what's involved, and the trade-offs of each approach.
Finally, while academics might have ready access to a supercomputer, most of us do not. Never fear! I'll go over how to build your own in the cloud that costs a couple bucks an hour to run.