NASA Grand Challenge Applications and Enabling Scalable Computing Testbed in Support of High Performance Computing.
Final Report: Four Dimensional Data Assimilation

  
killer_tomato

Peter Lyster, Principal Investigator, NASA Data Assimilation Office
July 4, 2000

NASA Grand Challenge Applications and Enabling Scalable Computing Testbed in Support of High Performance Computing. PI Project: Four Dimensional Data Assimilation
Title: Final Report
Agreement Number: NCCS5-150

P.M. Lyster

NASA/GSFC Data Assimilation Office (DAO), Greenbelt, MD
University of Maryland Earth Systems Science Interdisciplinary Center (ESSIC)

Email to lys@dao.gsfc.nasa.gov
http://ct.gsfc.nasa.gov/lys/lys.html

Front Cover: Three-dimensional Visualization of Methane Distribution in the Stratosphere for September 1992. The results were generated from the Kalman filter using the GSFC Cray T3E, and were included in the video: Images of Earth and Space: SC97 Edition.

NASA logo University of Maryland logo Data Assimilation Office

Abstract

This project on software for data assimilation has five main achievements: (1) the introduction of distributed-memory parallel algorithms to the DAO for the first time -- for both the GCMs and the analysis software; (2) advances in the understanding, with documentation, of the software and computational complexity of data assimilation systems; (3) improvements in the wall-clock performance of the operational Physical-space Statistical Analysis System (PSAS) by a factor of four; (4) development of the distributed-memory parallel PSAS with significant improvements in scalability and performance; and (5) development and scientific validation of a parallel Lagrangian Kalman filter for constituent assimilation that achieved its performance goals. The Lagrangian filter could not have been developed without high end computing capability.

The multivariate production algorithm at the DAO is the Goddard Earth Observing System Data Assimilation System (GEOS DAS). The analysis component of data assimilation systems continues to require considerable research on software complexity and performance. For GEOS DAS the compute-intensive part of the analysis is performed by the PSAS which involves complex databases and covariance models. For example, panel discussion on software Third WMO Symposium on Data Assimilation in Meteorology and Oceanography, Quebec City, Canada, 7-11 June 1999. The 50 and 100 gigaflop/s milestones for the GEOS DAS were negotiated out of our agreement. We have gained significant understanding of the software complexity and performance of the GEOS DAS and the PSAS in particular, and this will be discussed in the text and attachments.

Contents

  1. Introduction

  2. Discussion of the Key Elements of the Project

  3. Attachments

1. Introduction

This is the second phase of the DAO's High Performance Computing and Communications (HPCC) Grand Challenge Principal Investigator project (1997-1999). Early in 1998, the DAO and the HPCC Project agreed that it was premature to establish 50 and 100 gigaflop/s milestones for the end-to-end GEOS DAS; at the time there was still considerably more work needed to stabilize the components of the core system, namely the General Circulation Model (GCM) and the Physical-space Statistical Analysis System (PSAS). In particular the DAO agreed to provide working versions of its parallel GEOS-3 GCM and the PSAS, and reports describing the performance aspects of these codes. Note that this is not the same model as the fvGCM that is being developed for the new Data Assimilation System at the DAO. For background, the reader is referred to earlier submissions to HPCC Project on Peter Lyster's web page http://ct.gsfc.nasa.gov/lys/lys.html.

2. Discussion of the Key Elements of the Project

Before discussing details of the performance of algorithms it is important to recognize that the quality of the Scientifc Software holds primacy in efforts such as data assimilation. This involves portability, maintainability, extensibility to new science, as well as performance. At the DAO the software must meet the needs of an operational environment. In this report, issues of scientific software quality in multi-developer environment are ever present.

The most important measure of performance for scientific code is the time-to-solution because shortening this value leads to increased turn-around for scientific study. For those who need repeated simultaneous experiments, e.g., ensemble studies, a secondary measure is the ability to run concurrent small jobs possibly on the same machine or in a distributed heterogeneous environment. The fundamental limit to time-to-solution is the single processor speed of the software. For parallel computing a second issue is the scalability, namely the extent to which using extra processors (i.e., more resource) effectively reduces the time-to-solution. Amdahl's law simply quantifies the effect of the non-parallelized part of an algorithm on scalability -- the theoretical maximum number of processors that may be used effectively is approximately the inverse of the fraction of the non-parallelized part of the algorithm. In addition to the limitations of unparallelized code, multi-processor communications costs and load imbalance also degrade scalability.

The following briefly discusses the history and phases of the project, then discusses the key results of our work: design, implementation and performance of the parallel GCM, PSAS, and the Lagrangian/Kalman filter.

1.
The project phases that were needed to build the system were: hiring and training key developers of the parallel GCM and PSAS, completing the GEOS-3 Software Development Plan, completing the design of the GEOS-3 DAS, and developing the GEOS-3 DAS. At the beginning of the project, the PI estimated the development time for the distributed-memory parallel PSAS to be 4 person years. In retrospect the PI performed an analysis of the PSAS development plan using the Constructive Cost Model (COCOMO) tool and found the estimated time to be 30 person-years. It actually took 8 person years. The development time for a parallel GCM would be expected to be shorter since the community knowledge of this type of algorithm is more extensive.
2.
The GEOS-3 DAS is composed of several different modules, each requiring different parallelization strategies. In particular, GCM requires different parallelization strategies for: the transport with special treatment of poles; the Shapiro filter; the high-latitude spectral filter; the polar rotation; and the Physics and History routines (only the last one is embarrassingly parallel). Also, the timing profile of GEOS-3 DAS is flat, meaning that there are many submodules which use substantial amounts of the CPU cost of the algorithm. The PSAS has two major components, the innovation equation solver and the analysis equation. It is also a complex parallel database that handles more than 100,000 observations of a range of data types (i.e., it is multivariate) from a range of instruments, and generates multivariate covariance models. These and related issues are discussed in more detail in the article, The Computational Complexity of Atmospheric Data Assimilation, as well as the Documentation of the PSAS which are attached to this report. Of note is the cumulative effect of unparallelized code, difficult to parallelize code, the I/O, and the communications-latency cost of parallel submodules, which collectively give rise to an Amdahl's bottleneck. For example, if only 1% of the algorithm is not efficiently parallelized then the end-to-end system will not scale above 100 processors. The upper limit of scalability of GEOS-3 DAS is not known; however, it is unlikely to exceed several hundred processors in the near future.

3. Attachments

The following documents present theory, software, and performance issues of core algorithms at the DAO: the GCM, the PSAS, and the Lagrangian filter.