Validation

The purpose of the validation experiment was to establish if our measures of Specificity and Generalisation were able to detect a known model degradation. The effect of varying shuffle radius. Experiments were performed using tow very different data sets. The first consisted of corresponding 2D mid-brain T1-weighted slices obtained from 3D MR scans of 36 subjects. In each of the images, a fixed number (167) of landmark points were positioned manually on key anatomical structures (cortical surface, ventricles, caudate nucleus and lentiform nucleus), and used to establish a ground-truth dense correspondence over the entire set of images, using locally affine interpolation. The second consisted of 68 frontal face images with blacked out backgrounds (to avoid biasing the distance measurements) with ground truth correspondence defined using 68 landmark points positioned consistently on the facial features in each image.

**Figure 4:** Model constructed from ground-truth annotation: left, and models constructed with increasingly degraded registration: centre and right(variation of $\pm 2.5 \sigma _{0}$ )in first three modes
[width = 0.95 ]../Graphics/face_models_with_varying_deformation_modified.png

The first 3 modes of variation of the the face model built using the ground-truth correspondence is shown in Figure 4(left). Keeping the shape vectors defined by the landmark locations fixed, a series of smooth pseudo-random spatial warps, based on bi-harmonic Clamped Plate Splines (CPS) were then applied to the training images, resulting in successively increasing mis-registration - see Figure $[*]$ . The average pixel displacement from the original is shown below each image. Visually the warps vary from hardly noticeable (1 or 3 warps) to gross distortion (20 warps). The mis-registered training images were used to construct degraded versions of the original model. Figure 4(centre and right) shows the models obtained using 1 and 11 concatenated CPS warps respecively.

Models degraded using 1,2,3,5,8,11 and 15 concatenated CPS warps were evaluated using the method described in section 3, using Euclidean distance ( $ r$

= 0)and three different values of shuffle radius $ r$

= 1.5, 2.9 and 3.7. In each case, $ m=1000$

images were synthesised using the first 10 modes of the model, and Specificity and Generalisation were estimated.

Results are shown for the brain data in Figure 5. The results for the face data were similar. As expected, Specificity and Generalisation both degrade (increase in value) as the mis-registration is progressively increased. In most cases there is a monotonic relationship between Specificiity/Generalisation and model degradation, but this is not the case when Euclidean distance is used. Note that there is a measurable difference in both metrics, even for very small registration perturbations (eg the model of 4(center)).

The lack of monotonicity in the results for Euclidean distance already suggests that the use of shuffle distance gives better results. To investigate this further we calculated the sensitivity of each method - defined as (average slope of Specificity/Generalisation)/(average standard error) - for each value of image warp. The results for the face data are shown in Figure 5(bottom). The results for the brain data were similar. This shows a very significant advantage with increasing shuffle distance, suggesting that better discrimination between models is possible if shuffle distance is used.

**Figure 5:** Specificity and Generalisation (with error bars) of degraded faces models.
[width = 0.9 ]../Graphics/faces_generalisation_complete.png [width = 0.9 ]../Graphics/faces_specificity_complete.png

**Figure 6:** Specificity and Generalisation of degraded brain models.
[width = 0.9 ]../Graphics/BW_MIAS_generalisation.png [width = 0.9 ]../Graphics/BW_MIAS_specificity.png