The Failure of Python Object Serializations: Why HPC in Python is Broken and How to Fix it
Parallel and asynchronous computing in python is crippled by pickle's poor object serialization. Dill, a more robust serialization package, strives to serialize all of python. Dill has been used to enable state persistence and recovery, global caching, and the coordination of distributed parallel calculations across a network of the world's largest computers.
Parallel and asynchronous computing in python is crippled by pickle's poor object serialization. However, a more robust serialization package would drastically improve the situation. To leverage the cores found in modern processors we need to communicate functions between different processes -- and that means callables must be serialized without pickle barfing. Similarly, parallel and distributed computing with MPI, GPUs, sockets, and across other process boundaries all need serialized functions (or other callables). So why is pickling in python so broken? Python's ability to leverage these awesome communication technologies is limited by python's own inability to be a fully serializable language. In actuality, serialization in python is quite limited, and for really no good reason.
Many raise security concerns for full object serialization, however it can be argued that it is not pickle's responsibility to do proper authentication. In fact, one could apply rather insecure serialization of all objects the objects were all sent across RSA-encrypted ssh-tunnels, for example.
Dill is a serialization package that strives to serialize all of python. We have forked python's multiprocessing to use dill. Dill can also be leveraged by mpi4py, ipython, and other parallel or distributed python packages. Dill serves as the backbone for a distributed parallel computing framework that is being used to design the next generation of large-scale heterogeneous computing platforms, and has been leveraged in large-scale calculations of risk and uncertainty. Dill has been used to enable state persistence and recovery, global caching, and the coordination of distributed parallel calculations across a network of the world's largest computers.