Contribute Media
A thank you to everyone who makes this possible: Read More

Using PyNIO and MPI for Python to help solve a big data problem for the CESM

Summary

The Community Earth System Model produces orders of magnitude more data than earlier models, and the old data handling methods are no longer adequate. We discuss how PyNIO together with MPI for Python has provided the most efficient solution yet tested for the task of converting the raw output of the model to NetCDF files suitable both for archiving and for convenient use by scientists.

Description

Like most climate models, the CESM (Community Earth System Model) steps through time as a particular model scenario evolves and, at set intervals, outputs the state of all the important variables into single NetCDF files for each component of the model (atmosphere, ocean, land, and sea ice). Each file contains all the variables for a component at a single time step. Because the data volume is large, it is impractical to attempt to handle all the data for a complete model run as a single aggregation. Therefore, a consensus has evolved to mandate that the data be reorganized to contain single variables over some convenient time period. Finding a solution that can take advantage of multi-core architectures to do the job efficiently has not been easy. Recently, in an effort to determine the best solution, researchers at NCAR have conducted a set of benchmark tests to find the best tool for the job. Contenders included NCO (NetCDF Operators, the current incumbent for the task); an in-house Fortran code using the parallel I/O library PIO; a serial Python script using PyNIO; a version of the PyNIO script adapted to work with mpi4py in a very simple manner; CDO; NCL; and Pagoda. Surprisingly, PyNIO parallelized with mpi4py generally outperformed the other contenders by a large margin, and will now be tested as a replacement for the existing NCO scripts. This talk will look at the simple mpi4py and PyNIO code that achieves this result, discuss the reasons why the performance gain varies from case to case, and suggest ways to improve performance in challenging cases. Along the way, PyNIO's capabilities and recent improvements will be explained. In addition, other possible contenders for this role, in particular NetCDF4-Python coupled with mpi4py in a similar fashion, will be benchmarked using the same test suite.

Details

Improve this page