
The objective of this study is to develop an efficient parallel code to solve Euler's equations for compressible gas dynamics on the Convex Exemplar SPP-1000 scalable shared-memory parallel computer. The gas dynamics algorithm selected for this project is the Piecewise-Parabolic Method (PPM), which provides excellent numerical resolution and opportunity for parallelism.
The algorithm is parallelized using domain decomposition, where the grid is subdivided into rectangular tiles, and one or more tiles are assigned to each processor of the parallel computer. Each tile consists of a section of "real" zones surrounded by a frame of "ghost" points four zones wide, which are required since the algorithm uses a nine point stencil in each direction. The boundary conditions are handled using the ghost zones. The only communication required between processors involves exchanging the values of a few variables at these ghost points between adjacent tiles along each dimension. This communication must be performed twice during each iteration.
The gas dynamics variables (density, velocity, energy, and pressure) are stored in four-dimensional arrays, which are assigned to far_shared memory. The third and fourth indices of these arrays designate the coordinates of the tile, and the first two indices give the local coordinates of a point within the tile. The update is accomplished by first setting the appropriate values of the variables in the "ghost" points. The values around the outer edges of the grid are determined from the physical boundary conditions. The remaining ``ghost'' points are filled in by exchanging data with adjacent tiles. The next step is to copy the data from the four-dimensional arrays into two-dimensional scratch arrays, which are designated as thread_private. The computations required to update the values in each tile are then performed with no need for any communication between processors. Finally, the new values are stored back into the four-dimensional arrays. This localization of the computations results in better utilization of the data cache and minimized the amount of communication required between processors.
A previous portable PVM based implementation of the above algorithm was modified using shared memory directives for the Convex Exemplar. Several versions of the code were implemented. These utilize different memory class addressing schemes in order to determine the best memory layout for the shared data structures. The computational parts of the code were entirely rewritten to make better utilization of the registers and the data cache. These changes resulted in a factor of 1.5 improvement in performance over the previous version. In addition, tests were performed to determine the optimal tile size. Small tiles produce better cache performance but require more communication and parallelization overhead. The best performance seemed to be results from tile sizes of approximately 32 x 32. A calculation on a 480 x 1920 grid using the new version of the code delivers 836 megaFLOPS on 15 processors of the Convex Exemplar. The speedup factor attained on 15 processors was 14.1, which corresponds to an efficiency of 94%. Although additional work needs to be done to optimize the performance of the PVM version of the code, preliminary indications are that the shared-memory version is significantly faster and scales better with increasing number of processors.

Fixed and Scaled Speedups of PPM on the Convex Exemplar
U. Ranawake, B. Fryxell, and J. Dorband, "Performance Evaluation of Piecewise Parabolic Method on Convex Exemplar SPP-1000", CESDIS Technical Report # 96-184. (Also accepted for a poster exhibit at Supercomputing '96.)
PPM is an important computational technique first developed for simulations of various astrophysical systems, such as supernovae, accretion disks, and supersonic jets. It is now beginning to find uses in a wide variety of other fields as well. Although the cost to update each grid point is relatively high, the algorithm is among the highest resolution techniques currently available for compressible gas dynamics and is extremely efficient when measured on a basis of time to solution with a given accuracy. It is also a computationally intensive technique that benefits from the use of massively parallel computers.
Future plans include implementing an improved version of the PVM code by replacing the computational part of the original implementation with the subroutines from the new cache-optimized version and rewriting the communication part of the PVM code by using faster communication primitives of the PVM library such as pvm_psend and pvm_precv. In addition, scaling experiments for both the shared memory and PVM versions will be attempted for larger numbers of processors.
Bruce Fryxell
George Mason University
fryxell@neutrino.gsfc.nasa.gov
301-286-8567
Udaya A. Ranawake
UMBC/CESDIS
udaya@neumann.gsfc.nasa.gov
301-286-3046
John Dorband
Goddard Space Flight Center
dorband@nibbles.gsfc.nasa.gov
301-286-9419
Table of Contents | Section Contents -- Applications | Subsection Contents -- Inhouse Computational Scientists