Discussion

The results of the validation experiments reported in Chapter 7 are the most important outcome of the work presented here. They demonstrate a causal relationship between our Specificity and Generalisation measures, and a known (up to an additive constant) mean pixel displacement, . A strong correlation between these model-based measures and a Generalised Overlap measure, based on ground truth, adds further weight to this interpretation. The fact that the relationship with held good over many different instantiations of a very general class of perturbing warps, makes it unlikely (though not impossible) that there is any significant dependence on variation patterns.

The results obtained with added noise are also encouraging, since it is a reasonable concern that the use of an intensity-based distance measure might make the model-based measures sensitive to noise. In the event, the approach seems robust to quite significant levels of noise. The fact that the absolute values of specificity and generalisation change when noise is added, mean that they would not be useful for comparing registration results for different image sets. Their ability to compare the performance of different registration algorithms applied to the same set of images, the main intended use, is, however, unaffected.

Our results comparing the performance of different registration algorithms demonstrate that the model-based measures, and Specificity in particular, are sufficiently sensitive to misregistration to provide useful discrimination in a practical setting. There is, however, a potential concern that is important to address. It might be argued that using a model-based approach to assessing registration favours methods which use a model-based objective function for registration (as in the experiments reported here). In practice, there is no reason to believe that this is a problem.

First, as argued above, the validation results show that there is a causal relationship between the mean pixel displacement, , and Specificity or Generalisation. It is thus irrelevant how a registration (or misregistration) has been obtained. Second, the MDL objective function optimised in the model-based registration method measures a quite different property of the model to those used in evaluation, so there is no element of `self-fulfilling prophecy'. In an ideal world it would, of course, be preferable to avoid even the possibility of bias, though it seems unlikely that one could devise a strategy for evaluation that had no relevance to achieving a good registration in the first place. In due course, other ground-truth-free methods of evaluation may be developed, allowing a multi-perspective assessment of performance.

One obvious limitation of this approach to evaluation is that it can only be applied to groups of images. This could be considered an important restriction, since many practical applications involve registration of pairs or very short temporal sequences of images. One would argue that, in fact, this is a necessary restriction, because it is only possible to arrive at a meaningful assessment of registration in the context of a population of images.

There are a number of issues that merit further investigation. A particular method of measuring image separation has been studied, but others, such as local correlation, would be worth exploring. Another interesting issue is whether it is possible within this framework to localise registration errors. Some initial experiments have been performed, summing the shuffle difference maps between all pairs of images in the registered set. This gives some interesting results, highlighting areas of common misregistration, but it is not clear what quantitative interpretation could be placed on such maps. Finally, it is clear that the current measures of Specificity and Generalisation are not normalised - their values depend on the size of the set of registered images, the number of synthetic images generated and so on []. It is clearly worth exploring the possibility of measuring more fundamental properties of the relationship between the real and synthetic image distributions, with a view to achieving a `natural' normalisation as discussed in Chapter 9.

Roy Schestowitz 2010-04-05