
To create a design and prototype implementation of a database environment that is particular suited for handling the image, vision and scientific data associated with the NASA's EOS Amazon project. We are focusing on a data model and query facilities that are designed to execute efficiently on parallel computers. A key feature of the environment is an interface which allows a scientist to specify high-level directives about how query execution should occur. Using the interface does not require an understanding of the intricate details of parallel scheduling.
This report summarizes research activities to date. In the first year, we interviewed NASA scientists in order to understand their requirements and formulated an initial design for the database environment. In the second year, we refined the design and implemented a prototype system. In the third year, we refined the prototype, evaluated the environment and documented the work.
Our work was done in conjunction with the NASA Earth Observing System (EOS) Amazon Project, Thomas Dunne, University of Washington, PI. The mission of the EOS Amazon project is to contribute to understanding the dynamics of the Amazon system in a natural state, and how it would evolve under possible change scenarios (from instantaneous deforestation to more subtle longer term climatic/chemical changes). The overall goal of the project is to determine how extensive land-use changes in the Amazon would modify the routing of water and its chemical load from precipitation, through the drainage system, and back to the atmosphere and ocean. The work is being undertaken by a number of groups here at the University of Washington including researchers in Hydrology headed by Thomas Dunne, in Biogeochemistry headed by Jeffrey Richey and in Remote Sensing headed by John Adams.
We interviewed these scientists in order to understand their computing requirements. In summary, the scientists are working with data sizes on the order of hundreds of megabytes and processing algorithms whose completion time is on the order of minutes to hours. The scientists identified the following desirable properties for a computing environment to support their scientific research:
The scientific database environment we created has these desirable properties. The approach we used to create this environment was:
We identified two keys areas which were not well supported by existing software. These areas are:
The environment we designed consists of the following components:
Data input to the environment includes resource information, a program graph and the user's scheduling directives. Available processors are specified initially by the system administrator. The program graph is specified using the visual programming environment, cantata/Khoros. The user's scheduling directives are specified using a constraint-based scheduling language based on SQL. The program graph and resources are used by the automatic performance prediction tool to create a cost model of program execution and processor utilization. The scheduler inputs the resource information, the program graph, the user's scheduling directives and performance estimate information. The scheduler outputs a schedule which fulfills the scheduling directives. The program is then executed on a network of workstations using the distributed executor, implemented using PVM. During execution, performance data is collected and sent to the performance database for future use by the performance prediction tool.
We created an automated algorithm which allows a scientist to create a parameterized experiment and execute it in parallel. Experiments concisely express a collection of interactions a scientist would like to have with the environment.
We collected a repository of 40+ query graphs from vision, image processing and remote-sensing researchers. We created an automated testing process and tested the environment using these graphs. Currently we are testing the environment with a large number of scientific experiments and testing how well the environment responds to user scheduling directives. We are documenting the algorithms and design decisions used to create the environment as part of James Ahrens's doctoral thesis work.
Our database environment provides support for computer-based scientific research work. Its design was based on the requirements of NASA scientists and uses existing packages along with new tools to support the automatic parallel scheduling and execution of scientific experiments.
Linda G. Shapiro, Steven L. Tanimoto, and James P. Ahrens
Department of Computer Science and Engineering
University of Washington
shapiro@cs.washington.edu
Table of Contents | Section Contents -- Basic Research | Subsection Contents -- CESDIS University Research Program in Parallel Computing