To illustrate the application of model-based evaluation in practice, NRR results obtained using three different methods for registering a group of images were compared, as described below. The intent was to establish whether it was possible, in a practical setting, to detect significant differences in performance between different NRR algorithms. All three registration methods used the same piecewise affine representation of image warps [] and the same multi-resolution optimisation framework. The same number of iterations (function evaluations) were used in each case.
The three registration algorithms were applied to two datasets. The first datasets, which is referred to as ``MGH", was described in the previous chapter. The second set of images, which will be referred to as the ``Dementia Dataset", consisted of a set of 2-D transaxial mid-brain slices, extracted at an equivalent level from each of a set of affinely-aligned T1-weighted 3-D MR scans of
subjects entered into a clinical study of dementia.
The MGH Dataset was used because it allowed the evaluation results obtained using Specificity and Generalisation to be compared with an evaluation based on the Generalised Overlap measure (using ground truth). For these experiments = 500 synthetic images were used to estimate Specificity and Generalisation. The Dementia Dataset was used because it was more representative of a typical clinical study, and it was important to demonstrate that the results were not dataset-dependent. For these experiments
= 1000 synthetic images were used.
The three registration methods used were as follows.