Univ. of Chicago/Argonne National Laboratory - R. Rosner, F. Cattaneo, A. Dubey, W. Gropp, A. Malagoli, R. Stevens
Univ. of Colorado at Boulder - J. Toomre, N. Brummell, O. McBryan, C. Baillie, D. Grunwald
National Center for Atmospheric Research - P. Fox
Michigan State University - R. Stein
We describe the work done within the grant period below. Each project is labeled by a lead scientist who had principal responsibility for that aspect of our overall project.
Andrea Malagoli, Fausto Cattaneo, and Anshu Dubey have developed an application code for scalable parallel architectures to solve the equations of compressible hydrodynamics for a gas in which the thermal conductivity changes as a function of temperature. The code has been developed to study the highly turbulent convective envelopes of stars like the Sun, but simple modifications make it suitable for a much wider class of problems in astrophysical fluid dynamics.
The algorithm consists of two components: (a) a finite difference (higher--order Godunov method) for compressible hydrodynamics, and (b) a Crank--Nicholson method based on nonlinear (multigrid) method to treat the nonlinear thermal diffusion operator. These are combined together using a time splitting formulation to solve the full set of equations. The code has been developed using the PETSc library (Gropp & Smith 1994) to insure portability across architectures without loss of efficiency. A detailed performance analysis has been obtained (see below).
The following publication contains the description of the Chicago PPM+Multigrid code (Chicago Convection Code CCC):
Malagoli, A., Dubey, A., Cattaneo, F., Levine, D., "A Portable and Efficient Parallel Code for Astrophysical Fluid Dynamics," in Proceedings of Parallel CFD 95, Pasadena, June 1995. [Refereed conference proceedings]
Major restructuring of the 3-D Hybrid Pseudo Spectral (HPS) code, which involves pseudo-spectral representations in horizontal and finite-differences in vertical and used to study compressible convection constrained by rotation, has been completed in order to achieve high performance on the IBM SP2 class of massively parallel machines. Full MHD has been also incorporated into the HPS code, along with boundary conditions to accommodate penetration into stable surroundings. Two major issues have been addressed:
HPS involves repeated FFT operations, the implementation of which can be approached by either a strategy involving FFTs carried out in planes within processors followed by global transposes, or alternatively performing the FFTs across processors while overlapping communication and computation. Both approaches have been implemented and yield comparable performance when efficiently implemented on the SP2.
Efficient usage of the cache is critical for HPS, since such algorithms involve a modest number of operations per memory reference. Careful observation and monitoring of data flow through the cache has led to a reordering of the operations resulting in a two-fold increase in per node performance. A novel approach for construction of the nonlinear terms within cache memory has been a crucial step.
The full HPS code now runs at 40 Mflops/node on the IBM SP2 and scales roughly linearly with the number of processors provided the problem size is sufficiently large. Convection modeling with a 256^3 spatial grid runs at a sustained 4.5 Gflops on 128 processors, whereas one with 512^3 resolution runs at 8.1 Gflops on 256 processors, with the scaling influenced by interprocessor communication latency and I/O constraints.
A new version of the 3-D Spherical Harmonic Shell (SHS) code has been developed that is fully pseudospectral (spherical harmonics on shells and Chebyshev in the radial) in order to study global aspects of solar anelastic convection influenced by rotation. The use of Fortran 90 has led to a highly modular code which admits dynamic partitioning of the data space. To achieve scalable, high performance on IBM SP2 class of machines which is essential for studying highly turbulent flows, two major issues have been addressed:
SHS involves highly irregular data configurations that mandate sophisticated techniques to achieve good load balancing necessary for scalability. Usage of pointers and dynamic memory allocation have been of extreme importance in fine-graining the computational task.
Efficient implementation of Legendre transforms is critical for SHS, as their computation comprises over 90% of the floating point operations on large problems. By performing these transforms in-processor with a highly optimized routine, large gains in performance have been achieved.
The full SHS code now runs at 50 Mflops/node on the IBM SP2 and scales linearly with the number of processors for large problem sizes. Convection simulations with spherical harmonic degree extending up to values of 511 while employing 129 radial modes have yielded a sustained performance of 6.0 Gflops on 128 processors.
The aim of this portion of the project is to bring parallel computing to bear on the combined convective and radiative transport problem, an issue that dominates the physics of the surface layers of the Sun and other stars (where convective transport and radiative transport exchange roles).
As a particular example, Stein and collaborators have looked at the behavior of the Na lines in the presence of waves in their simulation. They find that the Doppler velocities reproduce the fluid velocities well, but the radiation temperatures are very different than the gas temperature.
They have also started a surface magneto-convection run: they have started relaxing at low resolution and adjusting the dissipation parameters to "get things smooth" but no more than necessary. They imposed an initial uniform field of 50 G. When the field gets carried to the granular boundaries there is not enough flux to completely fill the lanes (as there was at 500 G in their old runs), so the field is very intermittent, with concentrations reaching 2 kilogauss.
A collection of multi-dimensional FFTs has been developed by Anshu Dubey (Chicago). These FFTs are designed for the development of two and three-dimensional pseudospectral codes for turbulence simulations. Some versions use a traditional transpose-based implementation where the FFTs along each spatial direction are performed locally on the processors and a parallel transpose is used.
A more general FFT has also been developed that allows the FFT in each direction to be performed on data distributed across several processors (this method is particularly suitable for very large arrays that cannot be kept local).
The on-line paper "Parallel Multidimensional FFT" by Anshu Dubey describes the distributed FFT code.
The FFT algorithm described here has been used in the pseudospectral MHD code developed by Fausto Cattaneo for dynamo calculations. The average speed of the code for a 128^3 problem is 40 Mflops on the IBM SP2 in Maui. A short description of the project is available.
It is widely recognized that there is no hope of ever realizing a numerical scheme that can fully simulate the full range of physical scales on which convection in astrophysical objects such as stars operate. However, there are indications from geophysical fluid dynamics that it may be possible to construct schemes for modeling fluid behavior at small spatial scales, and then to incorporate such models for "sub-grid resolution" dynamics in simulations of the large-scale behavior of astrophysical systems.
Typically, in the case of stellar convection, turbulence dominates at very small spatial scales (high wavenumbers). Since we are limited to considering wavenumber ranges spanning at most three decades, we impose some constraints on the system to be modeled. Either we limit the dynamics at the smallest scales to be purely dissipative, or we model the dynamics of the unresolved part of the system. The former is the approach taken with direct numerical simulations (DNS), while the latter is what is done in large eddy simulations (LES). Considerable advantages exist for the utilization of LES simulations, with proven Sub-Grid-Scale (SGS) turbulence formulations, for the problem of stellar convection since we are then able to model, at a given spatial resolution, problems that would require considerably higher resolutions (i.e., beyond even the projected memory capacity for the next generation of supercomputers).
Under the support of this program, numerical simulations of highly stratified, compressible convection were performed and analyzed for the first time which included a number of new Subgrid-Scale (SGS) models for turbulent transport (Canuto, V. M. 1994, (Astrophys. J., 428, 729). The simplest SGS models use an expression that depends only on the shear in the velocity field, and these models are in widespread use in the engineering and terrestrial atmospheric communities. In astrophysics, however, there are additional important physical effects that must be included in SGS models: buoyancy and stratification. The buoyancy effects are maximal in upflow regions that are broader scale than the sites of strong shear, the narrow downflow regions. Thus the two contributions to the coefficients of viscosity and thermal diffusivity arise from different spatial locations and are of different scales. The contribution of buoyancy effects in the model produces detectable differences in energy spectra and the time dependence of some global quantities.
The present SGS model formulations have the advantage of being local (to the same extent that a finite difference scheme is) and do not require global averaging operations, thus they lend themselves to efficient programming, either vector or parallel. The model implementation was tested in an existing convection code, which uses an Alternating Direction Implicit Technique on a shared memory supercomputer (CRAY Y/MP-8). The code for the SGS models was multitasked and averaged 116 Mflops/processor with a processor utilization factor of 7.91 out of 8, sustained.
Realistic numerical simulations of solar convection are difficult because the temporal and spatial variation of density in a turbulent compressible medium is large, thus it is necessary to use the fully compressible Navier-Stokes and Maxwell equations instead of other approximations that may suppress important effects (like buoyancy). The time step limitation imposed by having to resolve both sound and Alfven waves for a deep atmosphere is also very severe, thus an implicit method of time integration can be preferable in some circumstances.
The ADISM method was designed to meet these requirements. An important advantage of the ADISM approach is that the CPU per time step grows only linearly with the number of grid points, just as for an explicit scheme. The staggered mesh system can prevent the development of numerical instabilities in the direction of gravity and avoids redundant computational effort.
As a minor part of this project a message passing version of the ADISM finite difference method was developed. A prototype version was written in VMS-Linda-C and was converted to utilize PVM-style message passing. It was tested in a primitive stage on a multiprocessor SGI Challenge with good performance; averaging 80% utilization. Further improvements are underway.
In order to develop a methodology for performance characterization and analysis, we focused on the CCC code (see 1.1 above). This code has been instrumented, and parallel performance data have been collected on the IBM SP2 at Maui. The MEDEA tool for performance characterization and analysis has been used to obtain detailed qualitative and quantitative information about the CCC code speedup behavior as a function of the problem size and of the number of processors, and about the sources of performance degradation.
This work has been carried out mainly by Daniele Tessera (who has visited the University of Chicago from the University of Pavia, Italy) in collaboration with Andrea Malagoli and Maria Calzarossa. The paper "Performance Analysis of a Parallel Hydrodynamic Application" by Daniele Tessera, Maria Calzarossa, and Andrea Malagoli has been submitted for publication to the Journal of Supercomputing Applications.
The CCC code has been included in a suite of I/O-intensive applications for use by the Scalable I/O Project (W. Gropp, E. Lusk, R. Thakur).
R. Hudson and A. Malagoli have developed a prototype virtual environment that makes use of the CAVE to perform interactive volume visualization of 3-D turbulence data from simulations. The application makes use of a virtual reality environment (a CAVE or an ImmersaDesk) connected to a scalable parallel computer via a high-speed network. The data are either computed in real time or precomputed on the parallel computer, then transferred to the virtual reality environment, where fast volume visualization is accomplished using sophisticated 3-D texture-mapping hardware (the Reality Engine of a SGI Onyx).
The application has been demonstrated as part of the I-WAY project at Supercomputing '95 in San Diego, CA.
The paper "Networked Virtual Reality for Real-Time 4D Navigation of Astrophysical Turbulence Data" by Randy Hudson and Andrea Malagoli has been presented at High Performance Computing '96 Conference in New Orleans.
Animation movies of the convection calculations by Andrea Malagoli have been used in Calvin Hamilton's educational Web publication "Views of the Solar System." (Many thanks to Los Alamos National Laboratory for providing the main server.)