Silicon Graphics (SGI) is collaborating with NASA and selected Grand Challenge Investigator teams to implement highly scalable applications on the CRAY T3E series of scalable systems. We will use advanced systems and operating software to implement a wide range of models based on standard languages and libraries to achieve new levels of performance. By achieving the high performance levels, teams will be able to model more accurately the science being studied.
In order to reach performance levels exceeding 50 and 100 gigaFLOPS, SGI has installed a CRAY T3E system at Goddard Space Flight Center (GSFC). The CRAY T3E, a massively scalable architecture, is capable of utilizing over 2,048 processors. Currently, the GSFC testbed includes 1,024 processors supporting the Grand Challenge teams as well as NASA-directed programs. As new generations of the CRAY T3E become available, they are periodically used to perform milestone tests and to conduct specific experiments.
Applications are implemented using SGI-developed scalable operating systems, compilers, and development tools to achieve scaling and performance. Onsite application analysts at GSFC consult with each of the teams on methods for scaling and optimization. Further technical assistance is rendered by the SGI Eagan software division in Eagan, MN.
FY98 saw teams accomplishing performance milestones at speeds of 50 gigaFLOPS to over 100 gigaFLOPS. A summary of notable achievements follows:
Simulations of the Earth's Core and Mantle Dynamics
Peter Olson, Johns Hopkins University
Using a mesh finer than any used before, the TERRA mantle code achieved 86.6 gigaFLOPS on the GSFC T3E and 99 gigaFLOPS on a T3E-900 (the processors are 50 percent faster than the T3E).
DYNAMO, a pseudospectral code for simulating the geodynamo, has been completely revamped to perform on the T3E. The code has achieved over 100 gigaFLOPS on 512 nodes of the GSFC T3E and will soon be coupled with TERRA to obtain realistic boundary conditions from the Earth's mantle. DYNAMO is also now capable of handling far larger grids, owing to the new domain-decomposition strategy.
SAR Interferometry and Imaging Science
David Curkendall, Jet Propulsion Laboratory
The Scalable Synthetic Aperture Radar (SAR) suite achieved a sustained performance on 512 nodes of a T3E equivalent to 25 times the 32-node Thinking Machines CM-5. This code is to be used to digest the huge volume of data produced by modern SAR's.
Four Dimensional Data Assimilation
Peter Lyster, University of Maryland
Implementation efforts for a full distributed-memory version of the Data Assimilation Office End-to-End Product commenced.
An Earth System Model: Atmosphere/Ocean Dynamics and Tracers Chemistry
Roberto Mechoso, University of California, Los Angeles
The Mechoso team has coupled their atmospheric and oceanic models and has successfully run the combination on up to 1,024 nodes of the T3E. The negotiated four-times speedup over the Level 1 milestone has been achieved, without the data-broker paradigm.
Rayleigh-Benard-Marangoni Problems in a Microgravity Environment
Graham Carey, The University of Texas at Austin
With the MGFLO code, solving the thermal and flow equations in an iteratively decoupled fashion, the team achieved 112 gigaFLOPS on 1,024 nodes of a T3E-900. Solving the thermal and flow systems in a fully coupled fashion, MGFLO achieved 118.7 gigaFLOPS on 1,024 nodes of the GSFC T3E.
The capability to model the transport of a single chemical species has been added to the MGFLO code. The model supports this in decoupled thermal-flow problems. This capability is currently being expanded to simulate multiple reacting chemical species in fully coupled flows.
Turbulent Convection and Dynamos in Stars
Andrea Malagoli, The University of Chicago
All three codes (MHD-PPMC, MPS, and HPS) have achieved in excess of 100 gigaFLOPS on 1,024 nodes of GSFC's T3E and over 150 gigaFLOPS on a T3E-1200. These codes are used to study how small-scale turbulence in stellar interiors interacts with large-scale rotation and magnetic fields.
Solar Activity and Heliospheric Dynamics
John Gardner, Naval Research Laboratory
FCTMHD3D has been restructured to use the adaptive mesh refinement library developed at GSFC. This version of the code performs at roughly a 600-times speedup on the T3E over 1 CPU on the CRAY C90. This new adaptive capability represents many orders of magnitude improvement in the time-to-solution over previous versions of the code.
Multiscale Modeling of the Heliosphere
Tamas Gombosi, University of Michigan
The BATS-R-US code achieved 13.0 gigaFLOPS on 512 nodes of a T3D. This simulation placed one grid block on each processor. Using full adaptive mesh refinement, the code then achieved 64.4 gigaFLOPS on 512 nodes of the GSFC T3E and 212.4 gigaFLOPS on 1,024 nodes of a T3E-1200.
Relativistic Astrophysics and Gravitational Wave Astronomy
Paul Saylor, University of Illinois at Urbana-Champaign
The first fully coupled relativistic hydrodynamics and space-time evolution code, GR3D, has achieved well in excess of 100 gigaFLOPS in both 64- and 32-bit precision. This code is now being used to study neutron star mergers (collisions) and other relativistic phenomena.
For additional information about select team achievements of gigaFLOPS milestones, look for a September 1998 feature article on the SGI home page.
Successful use of highly scaled models adds a new dimension to the science of interest to NASA. Using the CRAY T3E scalable systems helps drive the computational goals for the Investigator teams. Previous generations of supercomputers relied on a limited number of exceptionally fast, though expensive, processors and memory. The CRAY T3E system approached the challenge for supercomputing by employing as many as 2,048 RISC microprocessors, each with dedicated DRAM memories. By distributing the application to hundreds, and even thousands of processors, the system can reach performance levels unattainable on classical supercomputer systems.
The challenge, of course, is to achieve effective distribution of the task and associated interprocessor communication. Long latencies for communication among processors impede the ability of science models to scale in performance. The ESS Round-2 teams, in concert with SGI analysts, are leading the way to effective implementation of Earth and space science models on the CRAY T3E massively scalable system.
SGI has been successful in meeting all milestones required to date. Most 50 gigaFLOPS milestone efforts have been completed and several 100 gigaFLOPS milestones have been conducted. Work on the remaining 50 and 100 gigaFLOPS milestones will continue through FY99.
A major new NASA-directed program has been initiated through the ESS community use of the system. The NASA Seasonal to Interannual Prediction Project (NSIPP) uses up to 512 processors of the CRAY T3E system for this strategic mission.
A primary goal for massively scalable systems is to perform with the reliability of traditional large-scale systems. During FY98, the CRAY T3E testbed was available 95 percent of the time. Further enhancements to system reliability will evolve throughout FY99.
Tom Formhals
Silicon Graphics, Inc.
Chantilly, VA
tformhals@sgi.com
703-227-8519
Tom Clune
Silicon Graphics, Inc.
NASA Goddard Space Flight Center
tclune@sgi.com
301-286-4635
Spencer Swift
Silicon Graphics, Inc.
NASA Goddard Space Flight Center
swift@sgi.com
301-286-2829