
The goal of the Paradise project is to prototype a scalable database system for storing, browsing, and reprocessing EOSDIS data sets. Paradise is taking a DB-centric point of view in which both data and metadata is stored in the database system. We think that this approach is superior for a couple of reasons. First, commercial parallel database management systems have been shown to provide excellent scalability. Second, database systems already automatically deal with two levels of the storage hierarchy (primary and secondary storage). When a query is submitted for execution, the database system optimizes it to minimize execution time by selecting an execution plan that minimizes both CPU usage and the movement of data pages between disk and primary memory. Extending the optimizer and execution algorithms to handle a three-level storage hierarchy provides a number of opportunities. For example, consider a scientist who wants to process a years worth of AVHRR images corresponding to a particular region (described by a polygon). With a non-integrated approach, the scientist first must query the database system for the names of the files containing the images of interest. Then he/she must submit a request to the hierarchical storage system to move the appropriate files to disk from tertiary storage. Finally, the user can then execute his program to process the images. With a database-centric approach, the user can simply issue a query for the data of interest and let the database system deal with migrating the data from tertiary to secondary storage. Furthermore, since the user's request specifies retrieval of only a subset of each AVHRR image (the portion clipped by the polygon of interest), the database system may be able to move only a subset of each AVHRR image from tertiary storage to secondary storage and/or primary memory. The primary goal of the Paradise project is to demonstrate that a database centric approach, when combined with integrated support for tertiary storage and scalable parallelism, can provide a superior solution to many of the problems facing EOSDIS.
During year 3 we focused on five major activities:
Over the past year we completed the initial parallel implementation Paradise and got it working both on a cluster of Sparc workstations and a 16 node SP2 that IBM donated to the project. This effort involved a major rewrite of the client-server version of the system. First, each operator process was redesigned (and re-implemented) to take each of its inputs from an input stream and send its output to an output stream. This pipelined approach to query processing makes it simple to connect operators running on the same or different processors in a fashion that is transparent to the operator. Next, the overall software structure of the Paradise was rearchitected. Instead of a single, multithreaded process that performs all functions associated with query execution, the new architecture consists of a master process that is responsible for optimizing and compiling queries plus a slave process on each processor of the cluster or multiprocessor. After a query has been optimized and compiled, the master process walks the execution plan, initiating operators on each of the slave processors. Third, we implemented a communications infrastructure that enables operators executing on different processors to communicate with one another.
We are currently adding support for the parallel manipulation of rasters/arrays, all of Paradise's geospatial types including polygons, polylines, and points, plus video.
In February 1996, Intel donated 20 dual processor Pentium boxes to the project. This cluster is interconnected using 100 Mbit/second Ethernet and a CISCO Catalyst 5000 switch.
To simplify the task of using Paradise for those who do not "speak" SQL, we implemented an HDF-compatible, call-level interface to Paradise. Two major extensions were needed. First, we extended Paradise's type system by adding support for 8 and 24 bit raster images as well as multidimensional arrays. Each of these three types are a standard Paradise base type and thus can be used like any other type (int, float, string, polygon, etc.) when defining a relation. Since HDF's concept of V-data is analogous to the concept of a relation in a relational database, nothing new was necessary to support this construct.
The second extension involved modifying the HDF library to replace that portion that deals with reading and writing HDF files with calls to Paradise instead. To use the Paradise version of HDF, all one has to do is to relink an HDF application with the Paradise version of the HDF library. At run-time when the application makes an HDF call, the call is converted to a database query that gets shipped to the Paradise server for execution. As tuples are returned by the Paradise server to the application process, they are converted by the Paradise HDF Library to the format expected by the HDF API Library. This process is illustrated by the following figure.

The compatibility of this mechanism was tested by taking a copy of NCSA Collage and relinking it with the Paradise HDF library. As we had hoped, this was done without recompiling or modifying Collage. At the February 1996 NASA meeting for MTPE grant awardees, we demonstrated Collage running on top of Paradise. We are in the process of conducting a through benchmark of HDF on top of Paradise, comparing it with the standard version of HDF as distributed by NCSA.
A key benefit of taking a DB-centric approach to EOSDIS is that the database system can manage migration of data from tertiary storage to secondary storage as an extension of its existing mechanisms for migrating data from secondary storage to primary storage. To support tape-based tertiary storage we extended the Shore Storage Manager to support the DLT 4700 drive. Since DLT tapes cannot be updated in place, SHORE builds a log-structured file system on the tape. When a block on tape is first referenced it is initially cached in a memory resident buffer pool. This buffer pool is managed on an LRU basis. Tape blocks that are removed from the buffer pool by the LRU policy are buffered in a disk cache in case they are subsequently re-referenced. When a block is updated, the updated block is appended to the logical 'end' of the tape followed by a new directory block.
While modern tape technology such as the Quantum DLT 4700 is dense and relatively fast, a typical tape seek still takes almost a minute! Our solution is two pronged. First, we employ a novel query execution paradigm that we term query pre-execution. The idea of pre-execution grew from the experimental observation that queries which accessed data on tape were so slow that we could actually afford to execute the query twice! During the pre-execution phase, Paradise executes the query normally except when a reference is made to a block of data residing on tape. When such a reference occurs, Paradise simply collects the reference without fetching the data and proceeds with the execution of the query. Once the entire query has been "pre-executed", Paradise has a very accurate reference string of the tape blocks that the query needs. Then, using a cache-conscious tape scheduling algorithm, which reorders the tape references to minimize the number of seeks performed, the query is executed normally. While the idea of query pre-execution sounds impractical for a disk-based system, we demonstrate that it actually works very effectively when dealing with large raster images on tape.
The second major technique that we employ to make query processing on tape efficient is termed query batching. Query batching is a variant of traditional tape-based batch processing from the 1970s and what Gray refers to as a data pump. The idea of query batching is simple: dynamically collect a set of queries from users, group them into batches such that each batch uses the same set of tapes, pre-execute each query in the batch to obtain its reference string, merge the reference strings, and then execute the queries in the batch together. The processing of a batch is done essentially in a "multiple instruction stream, single data stream" (MISD) mode. The ultimate goal is to scan each tape once sequentially, "pumping" tape blocks through the queries that constitute the batch as the blocks are read from tape.
We have completed an implementation of these mechanisms in Paradise and have conducted a detailed performance evaluation. A copy of the paper is available from the Paradise web site. It will be submitted to the 1997 SIGMOD Conference.
Users of a spatial database system frequently need to combine two inputs based on some spatial relationship - for example, a user might want to find all rivers that overlap with some landuse polygons. This operation, called a spatial join, can be very expensive and efficient algorithms for evaluating it are required. As part of handling spatial data in Paradise, we have developed a new spatial join algorithm called PBSM (Partition Based Spatial-Merge), which partitions large inputs into manageable chunks, and joins them using a computational geometry based plane-sweeping technique. A novel spatial partitioning function has been developed as part of this algorithm. A performance comparison of PBSM with existing spatial join algorithms demonstrates the advantages of PBSM, especially in cases when neither of the inputs to the join has an index on the joining attribute. A paper describing the PBSM was presented at the 1996 SIGMOD conference. A copy of the paper can be found on the Paradise web site as well as in the proceedings of the conference.
We are currently implementing a parallel version of PBSM.
In addition to the activities above we have also been actively porting the client-server version of Paradise to a number of other platforms including Solaris on both Pentium and Sparc processors, SGI, HP, and NT.
Over the past year, ARPA "discovered" Paradise and is planning on using the system for a number of projects including the JTF Metoc Anchor Desk and as part of a new program to disseminate geospatial and satellite data sets via direct-broadcast satellite into battlefield environments. In the future, ARPA will be providing support for the Paradise project.
We completed the following papers this year:
"Partition Based Spatial Merge Join", (Jignesh Patel and D. DeWitt), to appear, Proceedings of the 1996 SIGMOD Conference, Montreal, CA, June, 1996.
"Query Pre-Execution and Batching in Paradise: A Two-Pronged Approach to the Efficient Processing of Queries in Tape-Resident Data Sets", (JieBing Yu and D. DeWitt), to be submitted to the 1997 SIGMOD Conference, Tucson, Arizona.
"Processing Raster Images on Tertiary Storage: A Study of the Impact of Tile Size on Performance," (JieBing Yu and D. DeWitt), to appear at the 1996 NASA Mass Storage Conference.
Copies of all Paradise publications can be found on the Paradise Web site.
Talks on Paradise were given this past year at ARPA (August 1995), Georgia Tech (October 1995), NASA AISRP (February 1996), Intel (January 1996), IBM (March 1996), Oracle (March 1996), and NASA CESDIS (May 1996).
JieBing Yu, a graduate student working on the Paradise project, spent the summer at HAIS working on the EOSDIS prototype.
There are no significant outstanding contracting issues. We expect to expend all funds by the end of the contracting period.
DeWitt continues to serve as a member of the CESDIS Advisory Board. In addition, he served as a member of the panel evaluating NASA's response to the NRC report on EOSDIS.
David DeWitt traveled to CESDIS in May 1996 to participate in the HPCC workshop.
Jignesh Patel traveled to the 1996 SIGMOD conference in Montreal to present his paper on his new spatial join algorithm PBSM.
David J. DeWitt
University of Wisconsin
608-263-5489
dewitt@cs.wisc.edu
http://www.cs.wisc.edu/paradise/
Table of Contents | Section Contents -- Basic Research | Subsection Contents -- CESDIS University Research Program in Parallel Computing