Your Escape Plan From Numpy + Cython


Cheng-Lin Yang

If you've been a data scientist or researcher long enough, you must have encountered a situation where your NumPy code ran quickly on small datasets in a testing environment but performed poorly on real-world datasets (100x larger or more). In this talk, I will introduce three Pythonic solutions to improve NumPy performance drastically without modifying too many codes.

At the beginning of the talk, a math equation: logsumexp, which is widely used in machine learning, will be illustrated. I will show how it is implemented with pure NumPy and use it as a benchmark so we can compare it to three proposed solutions at the end of the talk.

Then, three solutions: CuPy, Numba, and Pythran will be presented in separate sections. In each section, I will give a brief introduction to the solution and show how to apply this solution to our benchmark code.

At the end of the talk, I will compare these solutions from different aspects:

  • How much performance is boosted after each solution is applied
  • Ease to apply on your existing code (including the ease of debugging)
  • Limitations of each solution
  • Which solution should be applied first in given scenarios

Last but not the least, I will show a relatively new but interesting solution: Transonic to the audience so they can give it a try on their side project.

Sat Sep 5 16:00:00 2020 at Obvious

