**Tanimoto overlap**for the ground-truth data labels () for varying values of the label weighting 10#10.**Specificity & Generalisation**(() & (), 79#79), for varying definitions of image distance (Euclidean and shuffle distances), and for varying values of the shuffle neighbourhood radius.

In Figure are the results from the Tanimoto overlap-based measure (), which computes a measure that is based on ground truth, that is, the overlap of the annotated labels. As can be seen from the Figure, all overlap measures decay monotonically as a function of misregistration, showing that our perturbed dataset does indeed have the systematic behaviour we require.

Results for the measures of specificity 44#44 () and generalisation 60#60 () as a function of the magnitude of the displacement are shown in Figures & . Note that the values for Generalisation and Specificity are in error form, i.e. they increase with decreasing performance. The various graphs are for differing choices of the distance on image space, encompassing Euclidean distance, as well as shuffle distance for varying values of the shuffle neighbourhood radius 67#67.

It should be noted that both measures show a monotonic decrease in performance with respect to the size of the registration degradation, for all choices of image distance. Since the overlap measure also shows such a monotonic decrease, this validates the model-based metrics inasmuch as they then also vary monotonically with respect to the ground truth measure.

What remains to be investigated are the effects of varying the various parameters in the definitions of the model-based measures. For the shuffle distance, the parameter is the neighbourhood radius 67#67, the effect of which is studied in the next section. We also investigate the various forms of the Tanimoto overlap.

We note here that various other overlap measures as possible. For instance, we also considered the Dice overlap, but it was found to be inferior to the Tanimoto, and so is not considered further.