This paper presents a generic method for assessing the quality of
non-rigid registration (NRR) algorithms, that
does not
depend on the existence of any ground truth, but depends solely on
the data itself. The data is taken to be a set of images. The
output of any non-rigid registration of such a set of images is a
dense correspondence across the whole set. Given such a dense
correspondence, it is possible to build a generative statistical
model of appearance variation across the set. When accurate
correspondences are supplied, a good model can be generated.
When poor correspondences are given, the model is degraded.
By evaluating the
quality of the resulting generative model, we obtain a measure
of the quality of the correspondences.
We derive measures of model specificity and generalisation that can
be used to assess the quality of such models, and thus the quality
of the original correspondences.
It should be noted that this approach does
not depend on the specifics of the registration algorithm used or on
the specifics of the modelling approach used.
The method is validated by comparing our assessment of registration quality with
that derived from overlap measures using ground-truth anatomical labelling. We
demonstrate that not only is our approach capable of reliably
assessing NRR without ground truth, but it is also more sensitive
than the ground-truth-dependent approach. Finally, to demonstrate
the practicality of our method, different NRR algorithms - both
pairwise and groupwise- are compared in terms of their
performance on MR brain data.