Non-rigid registration (NRR) of both pairs and groups of images is widely used as a basis for medical image analysis. Applications include structural analysis, atlas matching and change analysis [#!Crum_BJR!#]. The problem is highly under-constrained and many different algorithms have been proposed.
The aim of non-rigid registration is to automatically find a meaningful, dense correspondence across a pair (hence pairwise registration), or group (hence groupwise) of images. A typical algorithm consists of a representation of the deformation fields that encode the spatial variation between images, an objective function that quantifies the degree of mis-registration, and a method of optimising the objective function. And different algorithms tend to produce slightly different results when applied to the same set of images [#!Zitova_2003!#] - there is a need for methods to evaluate the results of such registrations.
Various methods have been proposed for assessing the results of NRR [#!Fitzpatrick_TMI_2001!#,#!Hellier!#,#!Validation-NRR!#,#!Schnabel!#]. One obvious approach is to compare the results of the registration to anatomical ground truth. However, this suffers from the problem that such ground truth is often difficult to obtain. For instance, expert annotation is time consuming, subjective, and very difficult in 3D. Other evaluation approaches involve the construction of artificial test data, which limits application to `off-line' evaluation. Furthermore, such artificially generated and manipulated correspondence does not necessarily capture the type of deformation seen in real data. These problems motivate the search for a method of evaluation that does not depend on the existence of ground-truth data, or on making possibly unrealistic assumptions about the nature of the actual correspondence.
The method we will present here is based on the idea of constructing statistical models of sets of images, models which consider both the shape and texture variation of the objects imaged (appearance models). Such models have been extensively used as the basis for image interpretation by synthesis. The link between registration and modelling is given by the fact that the output of registration is a dense correspondence across the set of images. Such a set of correspondences is required to construct the shape and texture models [#!Cootes_ECCV_1998!#,#!Edwards!#]. Varying the correspondence across a set varies the appearance model built upon this correspondence. If the NRR results in a poor estimate of correspondence across a set, then the appearance model constructed from the data will be unsatisfactory. A better correspondence ought to produce a better appearance model. This allows use to map the problem of evaluation of registration to that of evaluating the model generated from the output of the registration.
The structure of this paper is as follows. We first give a brief description of the background to both the assessment of registration, and of the construction of appearance models, and explain in more detail the link between the two. We present quantitative measures which can be used to assess the quality of such models, hence of the registration upon which we will build such models. The behavior of these measures is investigated, and in particular, how they compare to an assessment based on overlaps of manually annotated ground-truth data. The results demonstrate that our method gives measures closely correlated with such ground-truth overlap estimates, and that our measures are actually more sensitive to mis-registration than the overlap measures.
Finally, we use the measures we have developed to compare various registration algorithms when applied to the registration of sets of 2D MR images of human brains. In particular, we are able to show the quantitative superiority of groupwise registration over a pairwise method.