P. M. Lyster
University of Maryland
Department of Meteorology
/DAO_people/lys
Email: lys@dao.gsfc.nasa.gov
16th International Conference on the Numerical Simulation of Plasmas
It has long been understood that knowing just the equations that describe physical processes doesn't necessarily mean that one can model ``reality''. Aside from incompleteness in the equations due to, say, missing or misunderstood physics, or errors in a computer implementation of the models, a key problem is the specification of initial conditions, model parameters, and boundary conditions for the algorithm. One can prescribe the initial conditions analytically, and use boundary conditions that resemble a realistic situation. In that case the scientific study provides understanding about the underlying physical processes, and some knowledge of ``climate'' may be determined. Data assimilation takes a different tack. One tries to obtain a best estimate of the true state of a system given a model and a finite number of observations. Typically, these observations are distributed inhomogeneously in both space and time. One may be searching simply for gridded accurate initial conditions, or one may be searching for an ongoing assimilated dataset. Atmospheric data assimilation matured significantly of the last 5 decades. This was pushed by the obvious need for accurate weather forecasts, such as performed at weather centers around the world, e.g., at NOAA's National Centers for Environmental Prediction (NCEP) or the European Center for Medium-range Weather Forecasts (ECMWF). Recently, data assimilation for climate research, such as at the NASA Data Assimilation Office (DAO), has also become important.
Data assimilation in the Earth Sciences[Daley, 1992][Ghil et al., 1997] is rapidly becoming a vast field. It is now clearly recognized that the field sits squarely on the intersection between physics (or physical modeling) and estimation theory. Rather than survey the field, this presentation discusses atmospheric data assimilation with emphasis on the work at NASA's DAO. The goal is to produce accurate gridded datasets of atmospheric fields, called ``the analysis'', by assimilating a range of observations along with physically consistent model forecasts. This work produces datasets that are used by the climate research community. The DAO has a healthy mix of mission requirements and theoretical research. This talk will discuss both aspects.
First, I will discuss the Goddard Earth Observing System Data Assimilation System (GEOS DAS) which is used for scientific analysis and NASA mission support. This system is similar to regular analysis systems since the basic meteorological variables (T,v,moisture) must be handled correctly; other variables such as ozone are, in a sense, value added for the purpose of climate research. GEOS DAS is described extensively in the Algorithm Theoretical Basis Document[DAO, 1996]
Typically, analyses of meteorological variables are performed for six
hourly (synoptic) sets of observations (radiosondes, earth surface measurements,
and satellite retrievals) that are collected from the Global Telecommunication
System (GTS). An atmospheric General Circulation Model (GCM)
is used to provide a 6 hour forecast. These forecasts
(at 0Z, 6Z, 12Z, and 18Z) may be regarded as estimates of the state
of the atmosphere that are augmented by observations to form an analysis.
Let
be a vector of p observations that passed the quality control
tests, and
a vector of n forecast variables produced by the GCM.
The objective is to produce an analyzed
state vector
(i.e., a vector of physical
quantities that describe directly the physical state of the atmosphere).
This is accomplished by solving
the following equations[DAO Office Notes, 1998]:
![]()
{where
is the specified
forecast error covariance matrix,
is the specified observation error covariance matrix, and
represents a generalized
interpolation from the model grid to the observations.
This expression is quite general; it is a multivariate, high
dimensional expression for the statistical combination of two
datasets each of known errors, i.e, error covariances.
It arises in the so called variational, or least squares
formulation, or it may be determined from a more complete estimation
theoretic Bayesian approach. In data assimilation where the
dimension of the model space is larger than the number of observations
available at a particular time, the equation represents
the use of a model to condition an underdetermined problem
(i.e., too few observations).
There are a number of
different modes of operation for the GEOS DAS. Briefly, mission support
involves real-time data assimilation and sometimes
the production of up to 10-day model forecasts. Currently, data sets
are made available directly from the Goddard DAAC, this mode of operation
ingests about 50 megabytes of data per day into the Core system. In the
coming year, satellite-retrieved profiles of atmospheric parameters will be
produced as part of the DAS preprocessing system, which will increase the data
ingest rate to about 1 gigabyte per day. The output analysis
(gridded) datasets are about 1 gigabyte per day in real-time mode, while
the production of model-forecast fields can increase this quantity by over
an order of magnitude. Periodically the DAO conducts reanalysis projects that
involve multi-year analysis whose data sets are then studied and distributed to
the climate research community. In this mode of operation, the DAO plans for
a production rate of 30 days of assimilation per wall-clock day.
The currently available total number of useful archive meteorological
observations, which mostly spans the past fifty years, is about
. A reanalysis of that data into gridded datasets would
produce about 30 terabytes of data.
The Kalman filter provides a consistent dynamical method for
determining the forecast error covariance matrix
.
Research on this will be discussed below. The KF is
computationally prohibitive, and its application to large-scale
assimilation is still in the research phase. For most current
analyses of the three-dimensional atmosphere,
the statistics are determined self consistently using the innovation
relationship:
![]()
where <> represents the ensemble average, and the left hand side is
evaluated using sample innovations,
.
and part of R are typically modeled using parameterized correlation
functions[Daley, 1992] and computed variances
(variances are diagonal elements of the error covariance matrices).
A key component is the use of balance conditions to generate
a multivariate forecast error covariance matrix which helps to provide
information about unobserved variables and to generate
a balanced, low noise, analysis. The most important relationship
that is used is geostrophic balance:
![]()
where
is the horizontal wind velocity, g is the gravitational
acceleration,
is the coriolis parameter,
k is the vector out of the surface, and
is the
gradient of height on a pressure surface. This is used in the
formulation of the forecast error covariances, and it helps
provide balanced wind-height analyses.
To solve Eq. 1, GEOS DAS uses the Physical-space Statistical Analysis
System (PSAS).
For a typical 6 hour synoptic (i.e., analysis) period the number of
observations accumulated is
.
A 6,000 kilometer cutoff is applied
for the forecast error correlation function, hence the innovation
matrix (
) is of size
, and is approximately
26% full. For PSAS,
Eq. 1 is solved using a nested preconditioned conjugate gradient
algorithm. For a single analysis the complexity
of this is
, where
is the number
of iterations of the CG algorithm; typically
.
For GCMs with
horizontal resolution and 70 levels,
.
The DAO has developed both shared-memory parallel and
distributed-memory parallel versions of PSAS
and the GCM[DAO Office Notes, 1998].
The Kalman filter is
one of the advanced methodologies for DAS (others include, e.g., 4DVAR
and Ensemble Filtering) that are being studied worldwide. The main
advantage of the KF is the ability to more accurately calculate the
forecast error covariance matrix
based on a dynamical approach [Jazwinski, A. H., 1970] [Lyster et al., 1997].
The equation for the evolution of
is:
![]()
{where
is the analysis error covariance matrix that is
evaluated based on an optimizing principle (such as
minimum variance), M represents the linearized model (e.g., transport
model) operator,
and Q is the model error covariance matrix.
Both the forecast and analysis covariance
matrices are of size n
n, where n is the
size of the state vector
;
for three-dimensional models, and
for two-dimensional (horizontal) models.
Even for sparse models this problem has memory and complexity
that scales as
. Full three-dimensional
implementations await tera- and petaflop/s capacity.
The DAO has developed a two-dimensional Kalman filter
to study trace chemicals in the stratosphere, where the
model is relatively simple (passive advection with
prescribed, analyzed winds), and motion is known to
be constrained to isentropic surfaces.
The key component of the algorithm is the distributed-memory (MPI) parallel
implementation of Eq. 4.
There are a number of relevant articles on data assimilation on the home page of Peter Lyster at: /DAO_people/lys