Description
We present a case study in porting a Python-based cosmology data processing application to GPU-accelerated compute nodes, achieving a 20x improvement in per-node throughput, on current and future supercomputers at the National Energy Research and Scientific Computing Center (NERSC). We describe our iterative approach to porting and optimizing the application using CuPy and Numba CUDA for GPU-acceleration and NVIDIA NSight Systems for performance analysis. We discuss the lessons learned during the course of this work to guide future efforts of the team and inform other science teams looking to leverage GPU-acceleration in their Python-based data processing applications.