ESS Project: FY98 Annual Report 

System Software R&D


Beowulf/Parallel Linux Operating System

Objective

Make the Beowulf clusters more accessible by incorporating Beowulf software in standard Linux distributions. Make clustering software robust and efficient by implementing the Beowulf cluster software with appropriate modification to the standard Linux kernel.

Approach

Development of software for Beowulf clusters now involves research groups at all the NASA centers and a large number of groups in labs and academic departments around the world. Each of these groups seek commodity-off-the-shelf (COTS) Cluster Computer solutions to their computational requirements. And by continuing to share their results through Web sites, mailing lists, and NASA sponsored workshops, the community has built up the Beowulf software distribution.

A Cluster Computer is not as tightly coupled, nor can it support as fine a grain of parallelism, as massively parallel processor (MPP) computers; however, a Cluster Computer can provide the same unified system image and a similar programming model. The development activity at CESDIS centers on the development of high-speed networks and tools to maintain large clusters and enhancements to the Beowulf software in order to provide the user with a programming model similar to that of an MPP.

A Cluster Computer is a dedicated resource that can be custom-designed to fit an individual's computation requirements. In addition to avoiding the obvious difficulties with a network of workstations (scheduling, robustness, and availability), there are subtle differences that have a significant impact on performance. A network of workstations is designed to make a large number of inactive users more productive. In a Cluster Computer, the nodes lose their individuality, and the operating system parameters are tuned to make a single parallel program run efficiently across the entire cluster. Many of the enhancements to Linux developed at CESDIS can be thought of as contributing to making Beowulf more MPP-like.

Linux is a POSIX-compliant operating system originally developed for the x86 architecture. It is publicly available and distributed with source for free on the Web. It is also provided at cost on CD-ROM by companies such as Red Hat Software, Inc. and Slackware, Inc., which provide support and configuration services. Linux has been ported to other popular commodity architectures (Alphas, Mips, PA-RISC) and currently has an installed base of more than 3 million machines. Linux is the ideal choice for this project, since it is the Unix for PC-hardware and provides an excellent technology transfer medium.

Much of the recent development work on Beowulf clusters focuses on making bigger clusters or on making small clusters easier to install and maintain. The first-order factor that determines the scalability of a general-purpose cluster is the network technology. Making small clusters easier to operate depends on education, tools, and documentation. CESDIS has been, and continues to be, the center of networking development activity within the Beowulf community. CESDIS is also committed to education, providing the mechanisms for communication within the Beowulf community, and the development of low-level software to make clusters more reliable and easier to operate.

Accomplishments

CESDIS continues to maintain a leadership role in the Cluster Computing community. Don Becker is a member of a team awarded the 1997 Gordon Bell Prize for Price/Performance "in recognition of their superior effort in practical parallel-processing research." The award was announced and presented at SC97. The prize was given for a Beowulf cluster of Pentium Pro's assembled a year earlier at SC96; the cluster achieved 2.1 gigaFLOPS on an N-body code, the equivalent of $50,000 per gigaFLOPS. The code simulates gravitational attraction among particles, such as dark matter in cosmology models. Other award recipients are Thomas Sterling/JPL-Caltech, Mike Warren, Patrick Goda/LANL, John Salmon/Caltech, Grégoire Winckelmans/Catholic University of Louvain, Belgium. The award represents a breakthrough in the HPCC community, which has now come to recognize Beowulf-class systems as an important type of parallel computing. The Gordon Bell Prize winners presented talks on their award-winning work at SC97. The paper "Pentium Pro Inside: I. A Treecode at 430 gigaFLOPS on ASCI Red, II. Price/Performance of $50/megaFLOPS on Loki and Hyglac," is available. The Gordon Bell Prize was established to reward practical use of parallel processors by giving monetary awards for the best performance and best price/performance on an application, and for automatic compiler parallelization. The award is sponsored by the IEEE Computer Society and IEEE Computer magazine.

The Beowulf software has reached the point where it can be presented as a package. CESDIS has been the leader in collecting and organizing all of the software and documentation that is required to construct and operate a Beowulf cluster onto a CD-ROM mirror . This is significant because it provides a complete distribution of the Beowulf package. Moreover, it has been formatted so that one can boot and install a Beowulf cluster directly from this image, greatly improving the current method of augmenting and patching a Linux distribution. The "Extreme Linux CD," as it is called, is important to the Beowulf community because it provides a focal point for the Beowulf software development effort. The Redhat version of this material was prepared in the late spring and the Beowulf "Extreme Linux CD" had its debut at Linux Expo at Duke. The "Red Hat, NASA Team on Beowulf Tech CD-ROM Price Under $30" was the number one requested article on HPCwire, 5/15/98.

Because the Beowulf community is a loosely connected community, communication is valid to its productivity. The first NASA workshop on Beowulf Class Cluster Computing was held in Pasadena, CA. Don Becker (CESDIS) and Clark Mobarry (GSFC) served on the programming committee. This meeting helped develop a sense of unity and direction for the diverse groups across NASA and other agencies that make up the Beowulf community as well as identify the areas of expertise among the different centers. This has lead to a natural division of labor, which helps to prevent excessive redundant work. On a day-to-day bases, CESDIS maintains several mailing lists for the exchange of ideas on the development of Beowulf software. The traffic on this list averages about 130 messages a month.

GSFC set a performance record of 10.2 gigaFLOPS on the Piecewise Parabolic Method code on a Beowulf cluster. The two large GSFC Beowulf clusters, theHIVE and ecgtheow, were connected together to form a single cluster for the purposes of the experiment. Obtaining a rate above 10 gigaFLOPS is significant within the ESS community. The second phase of the ESS program is a milestone-driven program with first milestone being 10 gigaFLOPS. In other words, the Beowulf class cluster computers have reached a performance level that is considered high-performance computing from the ESS perspective.

Don Becker continues to enhance Ethernet drivers for use in Beowulf cluster. We have also met with representatives from leading network vendors. For example, we met with HAL Computer Systems and negotiated an agreement to develop Linux drivers for their interconnection hardware and then evaluate that hardware on the Beowulf Bulk Data Server. In another instance, the Packet Engines' Gigabit Ethernet adapters have been installed in a pair of Alphas with 64-bit PCI slots. This provided the first opportunity to test the performance of Gigabit Ethernet cards and driver software at their full capability.

Education and technical transfer is an important aspect of the Beowulf program. The Beowulf project has been described at numerous conferences and workshops. The Beowulf tutorial at SC97 received the distinction of being the most highly attended tutorial of the conference. This tutorial was given several times across the country this year, including 1-day workshops in Pasadena and at the Florida Institute of Technology. The Beowulf project was presented (by Donald Becker) as a keynote session at IEEE Aerospace '97 and a CESDIS/JPL/Caltech collaboration produced a tutorial for the Cluster Computing Conference in Atlanta. Beowulf was one of the major agenda items at the "Extreme Linux" workshop, a by-invitation-only workshop of the core Linux developers. In addition to the presentations, tutorials, and published articles listed above, the Beowulf project has built and maintains a significant Internet presence, through Web pages and mailing lists, as its primary means of technical transfer.

Beowulf clusters are finding their way into universities as platforms on which to teach parallel computing. Phil Merkey is developing a course on Parallel and Distributed computing based on the Beowulf technology. This course was given in the fall semester at the University of Maryland, Baltimore County. After discussing parallel computing from an academic point of view, the students were given accounts on the Beowulf cluster called hrothgar. This "lab" component of the course provides hands-on experience with parallel programming and debugging parallel programs and puts the abstract analysis of parallel programs in a more tangible frame work.

Significance

The success of the Beowulf workstation and the spontaneous proliferation of Beowulf clusters has demonstrated the potential of exploiting very inexpensive and widely available components for high-performance computing. The current activities, improving the unified system image and incorporating Beowulf software in the Red Hat distribution, will make results of Beowulf software development more accessible to a broader user community.

Status/Plans

The current plan is to maintain a leadership role in the Beowulf community, encouraging the deployment of Beowulf systems through tutorials and personal collaborations. We plan to continue to be the focal point for networking. By working closely with vendors of high-performance networks, the Beowulf project will continue to ride the bow wave produce by these technologies. We plan to continue to work with Red Hat, Inc. to produce a second version of the Beowulf CD.

Point of Contact

Phillip R. Merkey
Center of Excellence in Space Data and Information Sciences (CESDIS)
NASA Goddard Space Flight Center
merk@cesdis.gsfc.nasa.gov
301-286-3805
http://cesdis.gsfc.nasa.gov/beowulf/