Suppose we have found the maximum likelihood solutions for each parameter,
, then the likelihood function can be approximated by
another multivariate Gaussian about this point;

where
is the distance to the maximum in
parameter space and the parameter covariance matrix is given by the inverse of
, the Fisher Information matrix;

(if the means of the data are dependent on the parameters, this is modified --
see Tegmark, Taylor & Heavens (1997)). The far right hand side expression can be calculated for
Gaussian distributed data sets (ie equation 1), where
is the slope of the log of the data covariance
matrix in parameter space.
By considering the Fisher matrix as the information content contained in the data set about each parameter, we see that the solution to our problem is to reduce the data set without changing the parameter information content. Hence to solve the problem of efficiency, we need to make a linear transformation of the data set

where
is a
matrix where
, and so
may be a
smaller data set than
. If
the transformation is not invertible
and some information about the data has been lost. To ensure that the lost
information does not affect the parameter estimation (requirement (a)), we also
require

where
is the transformed Fisher matrix. In order to avoid
learning the unhelpful fact that no data is an optimal solution, we add in the
constraint that data exists. Since we have the freedom to transform the data
covariance matrix, we add the constraint
, where
is
the unit matrix and
is a Lagrangian multiplier.
It can be shown (Tegmark, Taylor & Heavens (1997)) that this is equivalent to a generalised
Karhunen-Loève eigenvalue problem, which has a unique solution
for each
parameter. These solutions have the property that

where
are the eigenvalues of the transformed data set
and the inverse errors associated with each eigenmode of the new data set.
The new, compressed data set,
, can now be ordered by decreasing
eigenvalue, so that the first eigenmode contains the most information about the
desired parameter, the second slightly less information, and so on. The total
error on the parameter is then simply given by the inverse of the
Fisher matrix

We are now free to choose how many eigenmodes to include in the likelihood
analysis. A compression of 10 will lead to a time saving of
. However
this is only exact if we know the true value of the parameters used to
calculate
. But if we are near the maximum likelihood solution then we can
iterate towards the exact solution.
This procedure is optimal for all parameters -- linear and nonlinear -- in the
model. In the special case of linear parameters that are just proportional to
the signal part of the data covariance matrix (for example the amplitude of
, if the data are the
), the eigenmodes reduce to
signal-to-noise eigenmodes (Bond (1994)). Hence our eigenmodes are more general
than signal-to-noise eigenmodes. Furthermore, as our eigenmodes satisfy the
condition that the Fisher matrix is a maximum, they are the optimal ones for
data compression. Any other choice, including signal-to-noise eigenmodes, would
give a higher variance.
In Figure 1 we plot the uncertainty on 3 parameters for COBE-type data,
the quadrupole,
, the spectral index of scalar perturbations,
and the
re-ionization optical depth,
.
Figure 1: The 3 heavy lines show the error bars on 3 CMB
parameters as a function of the number of modes used. Each set of modes has
been optimised for the parameter in question. Note that approximately 400
modes are all that is required to get virtually all the information from the
entire 4016 cut COBE dataset. The thin lines show the conditional errors from
the SVD procedure outlined in section 5: virtually all the (conditional)
information on all 3 parameters is obtained from the best 500 SVD modes.