Basic Research Banner

A Visual Database System for Image Analysis on Parallel Computers and its Application to the EOS Amazon Project

Final Report (1993-1996)

Task Objective

To create a design and prototype implementation of a database environment that is particular suited for handling the image, vision and scientific data associated with the NASA's EOS Amazon project. We are focusing on a data model and query facilities that are designed to execute efficiently on parallel computers. A key feature of the environment is an interface which allows a scientist to specify high-level directives about how query execution should occur. Using the interface does not require an understanding of the intricate details of parallel scheduling.

Introduction

This report summarizes research activities to date. In the first year, we interviewed NASA scientists in order to understand their requirements and formulated an initial design for the database environment. In the second year, we refined the design and implemented a prototype system. In the third year, we refined the prototype, evaluated the environment and documented the work.

Year 1 - Requirements and Initial Design

Our work was done in conjunction with the NASA Earth Observing System (EOS) Amazon Project, Thomas Dunne, University of Washington, PI. The mission of the EOS Amazon project is to contribute to understanding the dynamics of the Amazon system in a natural state, and how it would evolve under possible change scenarios (from instantaneous deforestation to more subtle longer term climatic/chemical changes). The overall goal of the project is to determine how extensive land-use changes in the Amazon would modify the routing of water and its chemical load from precipitation, through the drainage system, and back to the atmosphere and ocean. The work is being undertaken by a number of groups here at the University of Washington including researchers in Hydrology headed by Thomas Dunne, in Biogeochemistry headed by Jeffrey Richey and in Remote Sensing headed by John Adams.

We interviewed these scientists in order to understand their computing requirements. In summary, the scientists are working with data sizes on the order of hundreds of megabytes and processing algorithms whose completion time is on the order of minutes to hours. The scientists identified the following desirable properties for a computing environment to support their scientific research:

Design approach

The scientific database environment we created has these desirable properties. The approach we used to create this environment was:

  1. We identified how existing software tools fulfill the requirements described above.
  2. We created new algorithms and tools which fill the gap left by existing software tools.
  3. We integrated all these tools into a seamless whole.

We identified two keys areas which were not well supported by existing software. These areas are:

  • 1. Support for automated parallel program scheduling and execution
  • To achieve high-performance, programs are scheduled and executed on multiple processors. Parallel scheduling is a complex problem and automation is a welcome solution for scientists. One disadvantage of traditional tools is that they optimize for a fixed collection of preset scheduling goals. Another is that they do not fully automate the scheduling process. An automated scheduling system which is responsive to the scientists' scheduling needs would improve both their satisfaction with computer systems and their productivity.
  • 2. Support for scientific experimentation
  • An environment needs to provide a computer-based framework for the interactions that scientists have with computers. One typical interaction that scientists perform is parameterized experimentation with their programs. This experimentation helps the scientist to understand the effects of input parameter and coding changes. With automated support scientists could focus on analyzing their experimental results instead of the process required to generate the results.
  • Year 2 - Design and Implementation

    The environment we designed consists of the following components:

    Data input to the environment includes resource information, a program graph and the user's scheduling directives. Available processors are specified initially by the system administrator. The program graph is specified using the visual programming environment, cantata/Khoros. The user's scheduling directives are specified using a constraint-based scheduling language based on SQL. The program graph and resources are used by the automatic performance prediction tool to create a cost model of program execution and processor utilization. The scheduler inputs the resource information, the program graph, the user's scheduling directives and performance estimate information. The scheduler outputs a schedule which fulfills the scheduling directives. The program is then executed on a network of workstations using the distributed executor, implemented using PVM. During execution, performance data is collected and sent to the performance database for future use by the performance prediction tool.

    Year 3 - Evaluation and Documentation

    We created an automated algorithm which allows a scientist to create a parameterized experiment and execute it in parallel. Experiments concisely express a collection of interactions a scientist would like to have with the environment.

    We collected a repository of 40+ query graphs from vision, image processing and remote-sensing researchers. We created an automated testing process and tested the environment using these graphs. Currently we are testing the environment with a large number of scientific experiments and testing how well the environment responds to user scheduling directives. We are documenting the algorithms and design decisions used to create the environment as part of James Ahrens's doctoral thesis work.

    Conclusions

    Our database environment provides support for computer-based scientific research work. Its design was based on the requirements of NASA scientists and uses existing packages along with new tools to support the automatic parallel scheduling and execution of scientific experiments.

    Points of Contact

    Linda G. Shapiro, Steven L. Tanimoto, and James P. Ahrens
    Department of Computer Science and Engineering
    University of Washington
    shapiro@cs.washington.edu


    Table of Contents | Section Contents -- Basic Research | Subsection Contents -- CESDIS University Research Program in Parallel Computing