"category": "PyCon US 2011",
"Daniel W. Barnette"
"title": "Supercomputer and Cluster Application Performance Analysis using Python",
"description": "PyCon 2011: Supercomputer and Cluster Application Performance Analysis using\nPython\n\nPresented by Daniel W. Barnette, PhD\n\nSandia National Labs analyzes high-performance computing environments to\noptimize application performance, analyze system architectures, and provide\ndesign guidance for future systems. We discuss 1) generating performance data\nacross multiple systems using mini-applications, and 2) using our open source\nPython tools Pylot/Co-Pylot to store and analyze data using a MySQL database\nserver.\n\nAbstract\n\nSandia National Laboratories analyzes large-scale, state-of-the-art high\nperformance computing environments for the Department of Energy (DOE),\nDepartment of Defense (DoD), and other government agencies. Execution\nefficiency is vital when dealing with datasets that require billions of\nelements or when running simulations that take millions of core-hours to\ncomplete.\n\nOne approach to investigating execution efficiency is to instrument our large-\nscale applications and platforms to generate timings and other performance\ndata. Although effective in mature computing environments, working directly\nwith large-scale applications is cumbersome, time consuming, and even\nimpossible in the early stages of computer system analysis and design.\nFurthermore, the software and data sets of these applications may be\nrestricted, limiting our abilities to collaborate.\n\nIn order to enhance our analysis capabilities far upstream from when large-\nscale applications can be used and when working with external collaborators,\nwe have developed a collection of mini-applications that capture the essence\nof our much larger scientific codes, are readily applicable to both large and\nsmall systems, and whose run-time information can accurately reveal problems\nassociated with execution efficiency.\n\nGenerating the data is only half the problem, though. We need the ability to\ncapture platform-relevant mini-app performance data at the convenience of the\ntesters when and where they generate the data. We also need the ability to\nsearch through, filter, and visualize the resulting performance measurement\ndatasets in detail to identify and understand trends and patterns.\n\nSandia National Laboratories has developed a performance analysis suite\nprimarily consisting of two tools written in Python, Pylot and Co-Pylot. Co-\nPylot is a relatively simple interface that enables easy batch transfer of\nperformance data to a remote MySQL database server for persistent storage.\n\nOnce stored, the performance data is extracted, organized, filtered, and\nanalyzed using Pylot, a more functionally complex interface. Pylot is used to\npresent user-selected MySQL database fields in a variety of views including\nstatistical data, bar and pie charts, Cartesian or log-log or semi-log plots,\nreference curves for comparisons, and Kiviat diagrams (also called radar\ncharts) for multivariate datasets.\n\nA built-in storage buffer provides the ability to store, compare, and analyze\ndata from multiple databases. This capability is critical for studying\nperformance variations of a code running on a particular architecture,\ncomparing application performance across architectures, or comparing multiple\napplications on one or more architectures. Values in up to four database\nfields at a time can be mathematically combined to generate a new temporary\nfield to provide complete generality while accessing a database. Further,\nPylot provides the ability to easily move MySQL databases and tables between\ncomputers, including the analyst\u2019s laptop. This coherency of databases across\nmultiple analysis platforms can be used, for example, to avoid network latency\nissues associated with accessing remote servers. It also serves as a\ndistributed backup system.\n\nAn outline of this presentation follows:\n\n 1. Applications at Sandia National Laboratories (6 mins) \n * Simulation size and runtime of typical large Sandia applications \n * Difficulties of using large-scale applications in early computer system design and analysis \n * Mantevo mini-apps \u2013 small, self-contained programs that embody essential performance characteristics of key applications. \n 2. Gathering data (4 mins) \n * What information Mantevo mini-apps provide \n * Co-Pylot \u2013 getting your data into a remote database \n 3. Supercomputer and Cluster application analysis (10 mins) \n * Pylot \u2013 demo of accessing and graphing MySQL data as a method for analyzing performance \n * Diagnosing performance issues \n * Comparing different systems and different runs \n 4. Future Extensions of Pylot (5 mins) \n * Capturing compile-time and execution info \n * Efforts to move parts of Pylot to the web \n\n",
"copyright_text": "Creative Commons Attribution-NonCommercial-ShareAlike 3.0",