Next: References

Atmospheric Data Assimilation

P. M. Lystergif
University of Maryland
Department of Meteorology
/DAO_people/lys
Email: lys@dao.gsfc.nasa.gov

16th International Conference on the Numerical Simulation of Plasmas

It has long been understood that knowing just the equations that describe physical processes doesn't necessarily mean that one can model ``reality''. Aside from incompleteness in the equations due to, say, missing or misunderstood physics, or errors in a computer implementation of the models, a key problem is the specification of initial conditions, model parameters, and boundary conditions for the algorithm. One can prescribe the initial conditions analytically, and use boundary conditions that resemble a realistic situation. In that case the scientific study provides understanding about the underlying physical processes, and some knowledge of ``climate'' may be determined. Data assimilation takes a different tack. One tries to obtain a best estimate of the true state of a system given a model and a finite number of observations. Typically, these observations are distributed inhomogeneously in both space and time. One may be searching simply for gridded accurate initial conditions, or one may be searching for an ongoing assimilated dataset. Atmospheric data assimilation matured significantly of the last 5 decades. This was pushed by the obvious need for accurate weather forecasts, such as performed at weather centers around the world, e.g., at NOAA's National Centers for Environmental Prediction (NCEP) or the European Center for Medium-range Weather Forecasts (ECMWF). Recently, data assimilation for climate research, such as at the NASA Data Assimilation Office (DAO), has also become important.

Data assimilation in the Earth Sciences[Daley, 1992][Ghil et al., 1997] is rapidly becoming a vast field. It is now clearly recognized that the field sits squarely on the intersection between physics (or physical modeling) and estimation theory. Rather than survey the field, this presentation discusses atmospheric data assimilation with emphasis on the work at NASA's DAO. The goal is to produce accurate gridded datasets of atmospheric fields, called ``the analysis'', by assimilating a range of observations along with physically consistent model forecasts. This work produces datasets that are used by the climate research community. The DAO has a healthy mix of mission requirements and theoretical research. This talk will discuss both aspects.

First, I will discuss the Goddard Earth Observing System Data Assimilation System (GEOS DAS) which is used for scientific analysis and NASA mission support. This system is similar to regular analysis systems since the basic meteorological variables (T,v,moisture) must be handled correctly; other variables such as ozone are, in a sense, value added for the purpose of climate research. GEOS DAS is described extensively in the Algorithm Theoretical Basis Document[DAO, 1996]

Typically, analyses of meteorological variables are performed for six hourly (synoptic) sets of observations (radiosondes, earth surface measurements, and satellite retrievals) that are collected from the Global Telecommunication System (GTS). An atmospheric General Circulation Model (GCM) is used to provide a 6 hour forecast. These forecasts (at 0Z, 6Z, 12Z, and 18Z) may be regarded as estimates of the state of the atmosphere that are augmented by observations to form an analysis. Let tex2html_wrap_inline108 be a vector of p observations that passed the quality control tests, and tex2html_wrap_inline112 a vector of n forecast variables produced by the GCM. The objective is to produce an analyzed state vector tex2html_wrap_inline116 (i.e., a vector of physical quantities that describe directly the physical state of the atmosphere). This is accomplished by solving the following equations[DAO Office Notes, 1998]:


equation72

{where tex2html_wrap_inline118 is the specified forecast error covariance matrix, tex2html_wrap_inline120 is the specified observation error covariance matrix, and tex2html_wrap_inline122 represents a generalized interpolation from the model grid to the observations. This expression is quite general; it is a multivariate, high dimensional expression for the statistical combination of two datasets each of known errors, i.e, error covariances. It arises in the so called variational, or least squares formulation, or it may be determined from a more complete estimation theoretic Bayesian approach. In data assimilation where the dimension of the model space is larger than the number of observations available at a particular time, the equation represents the use of a model to condition an underdetermined problem (i.e., too few observations).

There are a number of different modes of operation for the GEOS DAS. Briefly, mission support involves real-time data assimilation and sometimes the production of up to 10-day model forecasts. Currently, data sets are made available directly from the Goddard DAAC, this mode of operation ingests about 50 megabytes of data per day into the Core system. In the coming year, satellite-retrieved profiles of atmospheric parameters will be produced as part of the DAS preprocessing system, which will increase the data ingest rate to about 1 gigabyte per day. The output analysis (gridded) datasets are about 1 gigabyte per day in real-time mode, while the production of model-forecast fields can increase this quantity by over an order of magnitude. Periodically the DAO conducts reanalysis projects that involve multi-year analysis whose data sets are then studied and distributed to the climate research community. In this mode of operation, the DAO plans for a production rate of 30 days of assimilation per wall-clock day. The currently available total number of useful archive meteorological observations, which mostly spans the past fifty years, is about tex2html_wrap_inline124. A reanalysis of that data into gridded datasets would produce about 30 terabytes of data.

The Kalman filter provides a consistent dynamical method for determining the forecast error covariance matrix tex2html_wrap_inline126. Research on this will be discussed below. The KF is computationally prohibitive, and its application to large-scale assimilation is still in the research phase. For most current analyses of the three-dimensional atmosphere, the statistics are determined self consistently using the innovation relationship:


equation74

where <> represents the ensemble average, and the left hand side is evaluated using sample innovations, tex2html_wrap_inline130. tex2html_wrap_inline126 and part of R are typically modeled using parameterized correlation functions[Daley, 1992] and computed variances (variances are diagonal elements of the error covariance matrices). A key component is the use of balance conditions to generate a multivariate forecast error covariance matrix which helps to provide information about unobserved variables and to generate a balanced, low noise, analysis. The most important relationship that is used is geostrophic balance:


equation76

where tex2html_wrap_inline136 is the horizontal wind velocity, g is the gravitational acceleration, tex2html_wrap_inline140 is the coriolis parameter, k is the vector out of the surface, and tex2html_wrap_inline142 is the gradient of height on a pressure surface. This is used in the formulation of the forecast error covariances, and it helps provide balanced wind-height analyses.

To solve Eq. 1, GEOS DAS uses the Physical-space Statistical Analysis System (PSAS). For a typical 6 hour synoptic (i.e., analysis) period the number of observations accumulated is tex2html_wrap_inline144. A 6,000 kilometer cutoff is applied for the forecast error correlation function, hence the innovation matrix (tex2html_wrap_inline148) is of size tex2html_wrap_inline150, and is approximately 26% full. For PSAS, Eq. 1 is solved using a nested preconditioned conjugate gradient algorithm. For a single analysis the complexity of this is tex2html_wrap_inline154, where tex2html_wrap_inline156 is the number of iterations of the CG algorithm; typically tex2html_wrap_inline158. For GCMs with tex2html_wrap_inline160 horizontal resolution and 70 levels, tex2html_wrap_inline164. The DAO has developed both shared-memory parallel and distributed-memory parallel versions of PSAS and the GCM[DAO Office Notes, 1998].

The Kalman filter is one of the advanced methodologies for DAS (others include, e.g., 4DVAR and Ensemble Filtering) that are being studied worldwide. The main advantage of the KF is the ability to more accurately calculate the forecast error covariance matrix tex2html_wrap_inline126 based on a dynamical approach [Jazwinski, A. H., 1970] [Lyster et al., 1997]. The equation for the evolution of tex2html_wrap_inline126 is:


equation78

{where tex2html_wrap_inline170 is the analysis error covariance matrix that is evaluated based on an optimizing principle (such as minimum variance), M represents the linearized model (e.g., transport model) operator, and Q is the model error covariance matrix. Both the forecast and analysis covariance matrices are of size n x n, where n is the size of the state vector tex2html_wrap_inline112; tex2html_wrap_inline164 for three-dimensional models, and tex2html_wrap_inline188 for two-dimensional (horizontal) models. Even for sparse models this problem has memory and complexity that scales as tex2html_wrap_inline190. Full three-dimensional implementations await tera- and petaflop/s capacity. The DAO has developed a two-dimensional Kalman filter to study trace chemicals in the stratosphere, where the model is relatively simple (passive advection with prescribed, analyzed winds), and motion is known to be constrained to isentropic surfaces. The key component of the algorithm is the distributed-memory (MPI) parallel implementation of Eq. 4.

There are a number of relevant articles on data assimilation on the home page of Peter Lyster at: /DAO_people/lys




Next: References

Peter Lyster
Fri Jan 16 18:04:12 EST 1998