ESS Project: FY98 Annual Report 

Applications


A Scalable Parallel Cell-Projection Volume Rendering Algorithm for Visualizing Data on Unstructured Meshes

Objective

Surface Mesh DiagramThree-dimensional aerodynamics calculations often use unstructured volume meshes to model objects with complex geometry. Figure 1 shows an unstructured mesh for an aircraft's wing with attachment. Note only a surface mesh is shown to avoid visual clutter. Unstructured meshes become popular because by applying finer meshes only to regions requiring high accuracy, both computing time and storage space can be reduced tremendously. However, this adaptive approach results in computational meshes containing data cells (often tetrahedra in 3-D and triangles in 2-D), which are highly irregular in both size and shape. The lack of a simple indexing scheme for these complex meshes makes visualization calculations on such meshes very expensive.


Figure 1: The surface mesh of an aircraft's wing configuration.

Furthermore, in a distributed computing environment, irregularities in cell size and shape make balanced load distribution difficult as well. The goal of this research is to develop a fast, efficient parallel volume rendering algorithm for visualizing this type of data on massively parallel distributed-memory supercomputers consisting of a large number of very powerful processors, and to use NASA-sponsored applications to help define the requirements and evaluate the results.

Approach

We have focused on scalability and flexibility of the rendering algorithm as two key design criteria. Thus, we have used cell-projection instead of ray-casting volume rendering to provide maximum flexibility in the data distribution and rendering steps. Effective static load balancing has been achieved with a round robin distribution of data cells among the processors. A spatial partitioning tree has been used to guide the rendering, optimize the image compositing step, and reduce memory consumption. Communication cost is reduced by buffering messages and by overlapping communication with rendering calculations as much as possible.

Based on our previous test results on the Intel Paragon and IBM SP2, we have determined to adopt a finer level of partitioning in the image space. Essentially, we use pixel interleaving rather than scanline interleaving. Second, we fine-tune buffer size and polling frequency based on previous test data. Finally, our current focus is to derive highest possible performance on the SGI/CRAY T3E as well as the SGI Origin 2000, both operated at NASA.

Accomplishments

Our previous tests on the IBM SP2 have demonstrated two frames per second and 70 percent parallel efficiency for a 400 x 400-pixel image using 128 processors for a dataset containing 500,000 tetrahedral cells. The following test results were obtained by using the same dataset. First, Figure 2 shows the impact of pixel interleaving image partitioning on rendering performance on the Intel Paragon.

Scanline Interleaving

Pixel Interleaving
Figure 2: Scanline interleaving (above) vs. pixel interleaving (below) on the Intel Paragon.

Figure 3 compares the impact of using different buffer sizes and polling frequencies on the CRAY T3E.

Unstructured Cell Renderer on T3E
Figure 3: Speed-up numbers on the CRAY T3E.

Figure 4 shows breakdown of overall rendering time on the CRAY T3E using 128 processors. Because of the faster CRAY T3E processors and load balance, we can achieve a rendering rate of seven frames per seconds.

Unstructured Cell Renderer on T3E
Figure 4: Breakdown of overall rendering time on the T3E.

In Figure 5, we see the same renderer performs poorly on the Origin 2000. At this moment, we suspect the poor performance was due to SGI's MPI implementation to support message-passing processing.

Unstructured Cell Renderer on O2K
Figure 5: Breakdown of overall rendering time on the Origin 2000.

Significance

Our work has been inspired by the trend toward larger numbers of processors in large-scale scientific computing platforms, as typified by the teraFLOPS-scale architectures being installed for the NASA HPCC Computational Aerosciences Project and the U.S. Department of Energy's ASCI program. Appropriate visualization tools are needed to make possible visual analysis of the large amount of data generated from simulations running on these systems at highest possible resolution, either as a runtime monitoring process or a post-process. We have focused on unstructured meshes and the issues of scalability to support applications that use these systems.

Status/Plans

The new generation of massively parallel systems like the CRAY T3E and Origin 2000 provide us with excellent platforms for examining scalability issues in the context of a more modern architecture. Because of the asynchronous nature of our algorithm, we need an efficient termination algorithm. We have decided to discard a centralized control approach and are currently designing a binary combining approach instead. The following task, then, is to investigate the poor performance results we obtained on the Origin 2000. We will perform a more comprehensive study. In addition, we will also port the message-passing-based renderer to take advantage of the shared-memory capability of the Origin 2000.

Point of Contact

Kwan-Liu Ma
Institute for Computer Applications in Science and Engineering (ICASE)
NASA Langley Research Center
kma@icase.edu
757-864-2195
http://www.icase.edu/~kma