The 3-D compressible Rayleigh-Taylor (RT) instability is used as a test application on the CRAY T3E of an inhouse Flux-Corrected Transport (FCT) code.
The RT instability is a basic fluid instability that arises in a number of physical situations, such as supernovae and inertial confinement fusion. It occurs when a heavy fluid overlying a light fluid undergoes an interchange instability. The Eulerian hydrodynamics equations for compressible flows are solved via a finite-volume scheme, with FCT used for shock capturing and keeping discontinuities sharp. The domain is decomposed spatially using ghost cells for communications, which are accomplished via MPI.

The above figure shows density contours at a later time corresponding to a set of random perturbations of the interface. The case was computed on 256 nodes of the GSFC T3E and obtained a performance of 71 megaFLOPS per node (18 gigaFLOPS on 256 PE's). A timing run of the code obtained 108 megaFLOPS per node on a T3E-1200. The code I/O is based on Cray's direct access file parallel I/O, which was found to be a significant challenge to implement. The code's performance was studied with regard to breakdown within each component subroutine as shown in the following figure.
The high-order flux calculation takes the greatest proportion of the time, followed by the limiting procedure. Communications total the next highest, although a significant amount of that is in completion of incomplete sends and receives. In the code, these have been overlapped with some computation, although not significantly. The low-order flux calculations, next time level update, and equation of state calculations take about the same time. Boundary condition applications are small. Finally, barriers for synchronization at each half-step account for a significant portion. An interesting feature observed is that without some of these synchronizations, performance is actually degraded across processors, with some PE's in a staggering in performance. This is likely a consequence of a cascade process of asynchronous messaging delays across PE's. Its consequence is that barriers actually increase performance.
This message passing code obtains significance performance on the T3E without significant modifications of the code (most of the modifications required related to the aforementioned I/O; others are related to barriers and synchronization). Most of the performance was obtained via judicious combinations of compiler flags. This approach leaves the code intact for porting to other machines.
A report on these findings is being prepared.
Anil Deane
University of Maryland
deane@ipst.umd.edu
301-405-4866
and
NASA Goddard Space Flight Center
deane@laplace.gsfc.nasa.gov
301-286-7803