Validation

The purpose of the validation experiment was to establish if our measures of Specificity and Generalisation were able to detect a known model degradation. We also intended to investigate the effect of varying shuffle radius. Experiments were performed using two very different data sets. The first consisted of corresponding 2D mid-brain T1-weighted slices obtained from 3D MR scans of 36 subjects. In each of the images, a fixed number (167) of landmark points were positioned manually on key anatomical structures (cortical surface, ventricles, caudate nucleus and lentiform nucleus), and used to establish a ground-truth dense correspondence over the entire set of images, using locally affine interpolation. The second consisted of 68 frontal face images with blacked out backgrounds (to avoid biasing the distance measurements) with ground truth correspondence defined using 68 landmark points positioned consistently on the facial features in each image.

**Figure 4:** Model constructed from ground-truth annotation: left, and models constructed with increasingly degraded registration: centre and right (variation of $\pm 2.5 \sigma _{0}$ ) in first three modes
[width = 0.95 ]../Graphics/face_models_with_varying_deformation_modified.png

The first 3 modes of variation of the the face model built using the ground-truth correspondence is shown in Figure 4(left). Keeping the shape vectors defined by the landmark locations fixed, smooth pseudo-random spatial warps, based on bi-harmonic Clamped Plate Splines (CPS) were then applied to the training images. The warps comprised 25 knot-points and the extent of these warps was carefully studied. By increasing the warp magnitude, successively increasing mis-registration was achieved. The mis-registered training images were used to construct degraded versions of the original model. Figure 4(centre and right) shows the models obtained using progressively degraded training data.

Models degraded using a range of mean pixel displacement (from the correct registration) were evaluated using the method described in section 3, using Euclidean distance ( $ r$

= 0) and three different values of shuffle radius $ r$

= 1.5, 2.9 and 3.7. In each case, $ m=1000$

images were synthesised using the first 10 modes of the model, and Specificity and Generalisation were estimated.

Results are shown for the brain data in Figure 5. The results for the face data are similar, as shown in 6, but they are based on a single instantiation rather than 10, which makes the curves worse beyond the range of a 3 pixel mean displacement. As expected, Specificity and Generalisation both degrade (increase in value) as the mis-registration is progressively increased. In most cases there is a monotonic relationship between Specificity/Generalisation and model degradation, but this is not the case when Euclidean distance is used. Note that there is a measurable difference in both metrics, even for fairly small registration perturbations (eg the model of 4(center)). The steepness of the curve in the results for Euclidean distance already suggests that the use of shuffle distance gives better results.

**Figure 5:** Specificity and Generalisation of degraded brain models.
[width = 0.9 ]../Graphics/BW_MIAS_generalisation.png [width = 0.9 ]../Graphics/BW_MIAS_specificity.png

**Figure 6:** Specificity and Generalisation (with error bars) of degraded faces models.
[width = 0.9 ]../Graphics/faces_generalisation_complete.png [width = 0.9 ]../Graphics/faces_specificity_complete.png