If you've been a data scientist or researcher long enough, you must have encountered a situation where your NumPy code ran quickly on small datasets in a testing environment but performed poorly on real-world datasets (100x larger or more). In this talk, I will introduce three Pythonic solutions to improve NumPy performance drastically without modifying too many codes.
At the beginning of the talk, a math equation: logsumexp, which is widely used in machine learning, will be illustrated. I will show how it is implemented with pure NumPy and use it as a benchmark so we can compare it to three proposed solutions at the end of the talk.
Then, three solutions: CuPy, Numba, and Pythran will be presented in separate sections. In each section, I will give a brief introduction to the solution and show how to apply this solution to our benchmark code.
At the end of the talk, I will compare these solutions from different aspects:
- How much performance is boosted after each solution is applied
- Ease to apply on your existing code (including the ease of debugging)
- Limitations of each solution
- Which solution should be applied first in given scenarios
Last but not the least, I will show a relatively new but interesting solution: Transonic to the audience so they can give it a try on their side project.
Python, PyCon, PyConAU, PyConline
Sat Sep 5 16:00:00 2020 at Obvious