Basic Research Banner

Physical I/O Requests for NASA Earth and Space Science Applications

Background

Device driver instrumentation can reveal systems characteristics that are of great importance to systems designers and application developers, and yet are not observable from the commonly used application instrumentation level. With such ability, it is possible to track the physical requests that are directly generated by the users, those that are generated by the system in response to the user, as well as pure system requests needed to manage the overall resources.

We report on our empirical study which used device driver instrumentation to investigate the physical input/output requests generated under a selected set of NASA Earth and Space Science (ESS) workloads.

Experimental Set Up

Selected System: Beowulf, the Linux cluster of PCs at CESDIS/NASA GSFC which is based on the network of workstations (now) approach, was targeted for this study due primarily to its open software architecture. Beowulf, consists of 16 486-based PCs, runs LINUX as an operating system and PVM for message passing.

Selected NASA ESS Workloads: Our current work includes Wavelet Decomposition used for NASA Earth imagery applications such as image compression and registration, 512 x 512 x 8 images from Landsat Thematic Mapper were used. Our workload set also includes Piecewise Parabolic Method (PPM), which solves Euler's equations for compressible gas dynamics on a structured, logically rectangular grids for computational astrophysical simulations such as supernova explosions and non-spherical accretion flows. Our experiments used four 240x480 grids per processor. The last member of this working set is N-body, which is used to study a wide-variety of dynamic astrophysical systems, ranging from small clusters of stars to galaxies and the formation of large structures in the universe.

Highlights for this Year

Refined the device driver instrumentation control to be seamlessly interoperable with the Beowulf System. Instrumentation and measurement collection can be now turned on/off without having to reboot and disturb the system.

Completed and refined the PVM-based porting of the three ESS codes (Wavelets, PPM, and N-body) to be used as workloads in the experimental studies.

Collected, presented, and analyzed extensive physical I/O request measurements for these applications

Experiments and Results

In order to gain a full understanding of the I/O workload and the related dominant factors and behaviors in such environments, five sets of experiments were designed and conducted. The observations are detailed in figures 1-8. The five experiments, along with the observations, are described below.

Baseline Workload: In this set of experiments we observed the I/O requirements under no explicit application workloads. This revealed the nature of I/O requirements under such conditions in terms of request sizes and locality of requests. Small and infrequent request sizes (1 Kbyte each) were observed in the lower numbered sectors, see figure 1, which was consistent with Kernel and system routine activities, where 1K was clearly the physical block size.

PPM Workload: In this set of experiments the requirements for such astrophysics simulation application were observed. Again, the measurements focused on the size of requests and their sector numbers. The request sizes have been used to differentiate among the different types of requests. Under the used hardware and software configuration, a request of 1K or more but less than 4 K was clearly identified as a small explicit request. A request of size 4K was identified as paging. In addition to what was observed in the baseline case, very infrequent paging was also observed in these experiments, see figure 2.

Wavelet Workload: Due to the input, output, and scratch image data manipulations, very intensive paging activities were seen, followed by a lull, where the working data set is established and processing gets underway, then additional paging for working with output images. The explicit large request sizes for reading the input image data were observed as a sequence of 16K physical requests, figure 3.

N-body Workload: This set of experiments was conducted in a similar fashion to the ones described above. As in the case of PPM, infrequent small requests were seen. Slightly more paging activities as compared to PPM, were observed, as per figure 4.

Collective Workload: In this set of experiments, all of the above workloads were run simultaneously and measurements were collected, figures 5-8. The observations were greatly similar to the linear combination of what is seen in the individual experiments with the exception that input image data were physically read in 32K chunks, which indicates that as more processes run, more cache blocks are allocated for I/O and larger physical requests will become possible.

Lessons Learned

This work has clearly shown that device driver instrumentation has the ability to distinguish among the different activities in the system, small explicit requests (less than page size), paging(4 Kbytes each in this case), and large objects (such as images). Further, it was shown ESS codes have high spatial I/O access locality, 90% of accesses into 10% of space. On the other hand, temporal locality was measured as frequency of accesses and observed to be as high as 6 repeated accesses per second. In general, astrophysics simulation codes (PPM and N-body) have similar I/O characteristics and have shown very little I/O requirements for the used problem sizes. Wavelet code, however, required a lot of paging due to the use of many different files for output and scratch pad manipulations, and could benefit from some tuning to improve data locality. It is therefore advised that a strategy for file usage and explicit I/O requests for this code be developed to do so. On the system side, Linux tends to allow larger physical requests when more processes are running, by allocating additional blocks for I/O. It is therefore recommended that Linux file caching should be further investigated and optimized to suite the big variability in the physical requests of NASA ESS domain

Direction

In our future work we intend to evolve with our device driver instrumentation into a multilevel instrumentation platform. Multilevel I/O instrumentation refers to furnishing the capability of monitoring and collecting measurements of I/O activities at the application level, as well as a variety of system software and hardware levels. Observing the I/O activities at one specific level has its own merits. However, fusing the information gathered from different levels has the potential of providing means for characterizing activities that are not directly observable from one specific instrumentation level. In addition, multilevel instrumentation can also allow filtering the intrusion effects due to the instrumentation overhead.

References

[Berry96] Mike Berry and Tarek El-Ghazawi. " Parallel Input/Output Characteristics of NASA Science Applications". Proceedings of the International Parallel Processing Symposium (IPPS '96), IEEE Computer Society Press. Honolulu, April 1996.

[El-Ghaz95] Tarek El-Ghazawi. "Characteristics of the MasPar Parallel I/O System". Frontiers '95, IEEE Computer Society, McLean, VA, February 1995.

[El-Ghaz96] Tarek El-Ghazawi and Jacqueline Le Moigne. "Wavelet Decomposition on High-Performance Computing Systems." Proceedings of the 25th International Conference on Parallel Processing (ICPP '96). Bloomingdale, IL, August 1996. (in press)

[Fryxell88] B. Fryxell and R. Taam, Numerical Simulations of Non-Axisymmetric Accretion Flow. Astrophysical Journal, 335:862-880, 1988.

[Meajil96] Abdullah Meajil, Tarek El-Ghazawi, and Thomas Sterling. "A Quantitative Approach for Architecture-Invariant Workload Characterization." PARA96, Copenhagen, August 1996. To appear in Lecture Notes on Computer Science, Springer-Verlag, Berlin, August 1996.

[Nastea96] S. Nastea, O. Frieder, and T. El-Ghazawi. " Parallel Input/Output Impact on Sparse Matrix Compression". Proceedings of the Data Compression Conference (DCC'96), IEEE Computer Society Press. Snowbird, April 1996.

[Olson94] K. Olson and J. Dorband, An Implementation of a Tree Code on SIMD Parallel Computer. Astrophysical Journal Supplement Series, 94:117-125, September 1994.

[Sterling95] S. Sterling, D. Becker, S. Savareese, J. Dorband, U. Ranawake, and C. Packer. Beowulf: A Parallel Workstation for Scientific Computation. Proceeding of the 24th International Conference on Parallel Processing (ICPP'95), August 1995.

Points of Contact

Tarek El-Ghazawi, Gideon Frieder, and Mike Berry
The George Washington University
Department of Electrical Engineering and Computer Science
tarek@seas.gwu.edu


Table of Contents | Section Contents -- Basic Research | Subsection Contents -- CESDIS University Research Program in Parallel Computing