ESS Project: FY98 Annual Report 

Testbeds


System Performance Evaluation Project

Objective

The System Performance Evaluation project works with the large-scale science simulation codes produced by the nine ESS Grand Challenge Investigator teams and with the ESS testbed computer systems. The objective is to understand the behavior of these codes on the scalable parallel testbed computer systems and, to a lesser extent, to understand their behavior on other parallel systems such as the CESDIS Beowulf systems. We expect to work with about 10 to 15 different science codes in total. Our interest is in using measurement tools to understand how these large science codes stress the parallel system and how the parallel system responds to these stresses, as illustrated by the graphic below.

Diagram Showing How Grand Challenge Codes and ESS Parallel Testbeds Relate

 

In particular, we wish to find ways to:

The results of this work are published in various journals and conference proceedings.

Approach

Our approach is to use the science codes as they are submitted by the science teams to meet performance milestones. We use various measurement tools to understand the static structure of each code and its dynamic behavior when executed with a typical data set (also provided by the science team). Typically, a code is instrumented to collect the desired statistics and timings and then run on the testbed system using various numbers of processing nodes. The results are analyzed, and if more data is required, the instrumentation is modified and the code rerun.

The insights gained from this research on a particular code often lead to understandings about how to improve code performance. These insights are fed back to the science team to aid them in further code development. Results may also be useful to Silicon Graphics in improving their hardware and software systems, so results are often forwarded to the inhouse Silicon Graphics team and the inhouse computational scientists.

Measurements of Interest
Part of the research effort is to determine what aspects of science code structure and behavior have the greatest effect on performance. To this end, we are measuring some of the following elements in each code:

Tools Used
These studies use a variety of tools for instrumenting and measuring various characteristics of the science codes and their behavior. The primary tool to date has been a software system called Godiva (GODdard Instrumentation Visualizer and Analyzer) developed by this project. We also use the Silicon Graphics software tools on the CRAY T3E.

Accomplishments

The Godiva software instrumentation tool, developed during FY97 and described below, remained stable in Version 3.4 during FY98. It served as the primary tool for understanding the behavior of ESS codes and systems in this project. The software was ported to theHIVE, a 128-processor Beowulf-class parallel system at Goddard and was used in comparing aspects of the performance of this machine to that of the large-scale CRAY T3E system at Goddard. Two papers describing aspects of the Godiva design were presented at conferences during the year (see the URL at the end of this section for complete references).

During FY98, a major goal was the development of a comprehensive method of quantifying the stresses produced by large science codes on parallel computer systems. This effort was successful, and the new methods are now being tested on ESS codes. In addition, at the end of FY98 the Godiva software was extensively modified to incorporate understanding of how to measure these stress patterns and how to separate these measurements from the performance response measurements of the underlying computer system. This new system, Godiva 4.0, is now operational and will be used to support this project in FY99.

Godiva Software Instrumentation Tool
Godiva has proven to be a useful new tool for the study of large science codes. Using Godiva, a wide variety of aspects of a code may be instrumented to observe dynamic behavior as the program executes. Of particular importance to date have been the ability to study cache behavior on the CRAY T3E; computation rates in selected code segments; parallel communication and synchronization profiles using MPI, PVM, or shmem library calls; and load balance among processors.

Godiva's approach to code instrumentation is as follows. First, selected parts of the code are annotated to study the characteristics of interest. These annotations use syntax specified in the Godiva Users Manual. Annotations appear as comments to a Fortran or C compiler. The annotated code is fed through the Godiva preprocessor, which generates Fortran or C source code, with calls to the Godiva run-time library inserted at appropriate points. The generated source program is then compiled and linked with the Godiva run-time library. Execution of the program generates a trace file on each processor. The trace file contains statistics collected on the fly during execution. After execution is complete, a Godiva postprocessor is used to generate tables, graphs, and histograms from the trace files produced by the processing nodes.

Currently, Godiva supports about 30 different annotation types in the source program. These annotations may be used to generate about 20 different forms of output tables and graphs. For more information about the Godiva software system, visit the URL shown at the bottom of this page.

Godiva has been developed as a personal research tool, not intended for general distribution, but it is made available to other ESS team researchers as appropriate. Because it is a personal research tool, it undergoes frequent change to meet the demands and new directions of this project.

Significance

By understanding and quantifying the stresses produced on parallel computer systems by large science codes as well as the performance responses of the computer systems, this research is intended to lead to improvements in both the codes and the computer hardware and software systems. The methods used for measurement also are expected to lead to improvements in computer benchmarking and performance analysis techniques.

Status/Plans

During FY99, the new methods for analyzing large science codes and the new Godiva 4.0 software instrumentation options will be applied to intensive study of ESS science codes and to performance comparisons of ESS testbed computer systems.

Point of Contact

Terrence W. Pratt
Center of Excellence in Space Data and Information Sciences (CESDIS)
NASA Goddard Space Flight Center
pratt@cesdis.gsfc.nasa.gov
301-286-0880
http://ess.gsfc.nasa.gov/system_eval.html