I'm going to talk about the 4 main levels of parallelism in modern
- multiple (virtual) machines
- multiple processes
- multiple threads
- multiple green threads, aka asyncio
Why you might use each of them, how to go about doing so with python and
some of the pitfalls you might fall into along the way.
To do so, I'll give short examples in code of achieving each level:
- leveraging multiple hosts using RQ, and also the possibility of RPC
- multiprocessing and threading using their respective modules from
the python standard library
- asyncio demonstrated with AIOHTTP
That sounds great, but there are "gotchas" you should know about before
you get started, for example:
- multiple machines can actually be multiple virtual machines on the
- effectively communicating between processes is hard, how can we go
about making it easier?
- the limitations of threading and the GIL
- run_in_executor - do we ever really need to use multiprocessing or
threading directly again
- use of asyncio when dealing with both networking between hosts and
between processes - you end up using two different kinds of
concurrency at the same time. That can be confusing, but also awesome.
I'll finish of by showcasing a library I built, arq which is a job
queueing and RPC library for python which uses asyncio and Redis.