Contribute Media
A thank you to everyone who makes this possible: Read More

Open Source is Better Together: GPU Python Libraries Unite


Today, the computational limits of CPUs are being realized, and GPUs are being utilized to satisfy the compute demands of users. In the past, this has meant low level programming in C/C++, but today there is a rich ecosystem of open source software with Python APIs and interfaces. This talk will highlight the journey of developing open source software on top of and integrating with this ecosystem.

  1. PyData Ecosystem
    • Pandas, Numpy, SciPy, SKLearn, Dask, Cython, etc.
    • Highly interoperable with everything standardizing around Numpy / Pandas
    • Highly productive
    • Compute limited
  2. Apache Big Data Ecosystem
    • Spark, Beam, Flink, Hive, Impala, etc.
    • Semi interoperable but very technology dependent
    • Semi productive
    • Still compute limited
  3. GPUs
    • Thrust, CUB, NCCL, OpenUCX, etc.
    • Not very interoperable
    • Not productive
    • Not compute limited!
  4. Apache Arrow
    • Standards for memory layouts
    • Cross language compatible
    • Potential to bridge the PyData, Apache Big Data, and GPU ecosystems!
    • Combining the compute of GPUs with the productivity of the PyData ecosystem with the integration and interoperability of Apache Arrow
    • Built on top of OSS C/C++ GPU Ecosystem: Thrust, CUB, NCCL, OpenUCX
    • Integrated with OSS Python GPU Ecosystem: Numba, CuPy, PyTorch
    • Built on top of and integrated with OSS PyData Ecosystem: Pandas, Numpy, Dask, Cython
  6. Ecosystem Interoperability
    • Standards / Protocols
    • Numpy __array_function__ protocol
    • __cuda_array_interface__ protocol
    • DLPack
    • User Experience
    • Follow the same Python APIs that users are comfortable, productive, and happy with
    • Performance
    • Deliver 10-1000x the performance with nearly zero code change
    • Scaling
    • Scale the same way as existing PyData ecosystem with Dask
    • Improve Dask for everyone with lower level communication acceleration
  7. Struggles
    • CI
    • Travis-CI doesn’t cut it for GPUs and no easy to use off the shelf alternative
    • Programming Paradigm Mindset
    • Thinking in terms of vectorized operations instead of loops / iterations
    • Amdahl’s Law
    • New bottlenecks that we didn’t previously worry about
  8. Conclusion
  9. Q/A


Improve this page