The Beowulf Scalable Mass Storage research project will implement a trivially reproducible system that provides at least a terabyte of secondary storage and a gigabyte-per-second aggregate external bandwidth. The performance characteristics of the system will be evaluated on real applications in the operational environment provided by the Earth and Space Sciences (ESS) Project.
The Beowulf Scalable Mass Storage research project builds on the Beowulf Parallel Linux technology and methodology. The availability of the Linux operating system and the performance levels attained by PC-market components, particularly disks and networks, has created the opportunity to provide cost-effective solutions to secondary storage problems. It is now feasible to construct a high-performance system entirely of commodity-off-the-shelf (COTS) components. Using an enhanced version of the Linux operating system provides a vendor-independent solution that promotes future robustness and cost-effectiveness. The availability of source and the strong network support inherent in Linux is important to the mass storage system. We require the ability to modify and enhance the kernel functionally to construct a system that will serve bulk data from component nodes that were designed for a single-user, multitasking environment. In addition, using Linux allows us to easily harvest the software built for networks of workstations.
The Beowulf Bulk Data Server has been upgraded to meet its phase-two goals. The cluster currently has 100 Intel Pentium Pro processors running at 200 MHz and 7.6 gigabytes of memory. The cluster is connected in a fat-tree network topology with Packet Engines Gigabit Ethernet at the root of the tree. Through sponsorship and collaboration with the team at Clemson University headed by Walter Ligon, CESDIS is meeting its milestones on the development and demonstration of a distributed file system for Beowulf-class computers.
The Beowulf Mass Storage Storage System is a large cluster. Much of the system development work being done to configure and maintain a large cluster is directed at this system. Erik Hendriks is responsible for numerous enhancements to the kernel and the system software that address the issues involved in such systems. For example, the BIOS image on the Intel PR440 FX motherboards has be modified to allow netbooting; this allows the nodes in a cluster to be stateless at boot time. A kernel performance counters package has been developed for the Pentium Pro; this hardware information proves very useful for debugging and performance tuning. Disk performance has been greatly enhanced by modifying disk reads and writes to fully exploit three IDE disks on three separate channels. With this bandwidth we can meet the design specifications for the Bulk Data Server. Reliability in a large cluster is active area of concern and research. A Linux driver for the LM78 hardware monitor (this is the onboard hardware monitor for the motherboards used in the Beowulf Bulk Data server) has been released. This driver provides an interface that allows easy reading of current status and easy manipulation of limit registers. Being able to monitor hardware statistics will become more and more important as clusters get bigger and bigger.
During the second phase of the project, the Bulk Data Server has served in several different capacities. It has served as the development platform for file system research and high-performance network research. It has also attracted attention as a compute server as well. It contributed to the GSFC 10 gigaFLOPS record on the PPM code. It runs the HTMT simulator and is the focal point for GSFC collaboration on the Hybrid Technology MultiThreaded (HTMT) petaflops project. As one of the large Beowulf clusters, it serves as a testbed for the develop of new system software and new system configuration and administration techniques.
The next phase of development will continue as planned. The system will be fully populated with disks. The availability of large network switches was not anticipated. Depending on the performance of these new large switches, we may be able to drop the fat-tree network topology in favor of a uniform, flat- switched network. GSFC will continue to develop and evaluate high-speed networks. The parallel file systems used on this project will operate on a variant of a client-server model. In order to exploit the high-speed network within the cluster, the functionality of the IOP daemons (the servers) must be split to match its hierarchical structure. The simpler network topology that a large switch affords will make this task easier.
The role of the Bulk Data Server as a compute server will be expanded. We will be opening the cluster for more general applications within GSFC and academic institutions as well. By connecting it to theHIVE, we plan to conduct scaling experiments on ESS applications. It will also continue to run the HTMT emulator, which provides GSFC and local interests a means to influence the development of that project.
Phillip R. Merkey
Center of Excellence in Space Data and Information Sciences (CESDIS)
NASA Goddard Space Flight Center
merk@cesdis.gsfc.nasa.gov
301-286-3805
http://beowulf.gsfc.nasa.gov/