Validation

The purpose of the validation experiment was to establish if our measures of Specificity and Generalisation were able to detect a known model degradation. We also wished to investigate the effect of varying shuffle radius. Experiments were performed using two very different data sets. The first consisted of equivalent 2D mid-brain T1-weighted slices obtained from 3D MR scans of 36 subjects. In each of the images, a fixed number (167) of landmark points were positioned manually on key anatomical structures (cortical surface, ventricles, caudate nucleus and lentiform nucleus), and used to establish a ground-truth dense correspondence over the entire set of images, using locally affine interpolation. The second consisted of 68 frontal face images with blacked out backgrounds (to avoid biasing the distance measurements), with ground truth correspondence defined using 68 landmark points positioned consistently on the facial features in each image.

**Figure 5:** **Left:** Model constructed from ground-truth annotation. **Centre and right:** models constructed with increasingly degraded registration. Variation of $\pm 2.5 \sigma _{0}$ about the mean in first three modes.

The first 3 modes of variation of the the face model built using the ground-truth correspondence are shown in Figure 5 (left). Keeping the shape vectors defined by the landmark locations fixed, smooth pseudo-random spatial warps, based on biharmonic Clamped Plate Splines (CPS) were then applied to the training images. The warps were controlled by sets of 25 randomly placed knot-points, each displaced in a random direction by a distance drawn from a Gaussian distribution. The relationship between the mean of the displacement distribution and the mean pixel displacement for the whole image was carefully calibrated. This allowed a controlled misregistration to be introduced by changing the parameters of the displacement distribution.

By increasing the warp magnitude, successively increasing mis-registration was achieved. The mis-registered training images were used to construct degraded versions of the original model. Figure 5 (centre and right) shows the models obtained using progressively degraded training data. Models degraded using a range of values of the mean pixel displacement (from the correct registration) were evaluated using the method described in section 3. The image distances used were Euclidean distance (

= 1) and three different values of shuffle radius $r=1.5,\: 2.1$ and

. In each case,

images were synthesised using the first 10 modes of the model, and Specificity and Generalisation were then estimated.

Results for the brain data are shown in Figure 6. Each point represents the average of 10 random instantiations of the perturbing warps. The results for the face data are similar, and shown in 7, but they are based on a single instantiation of each warp, which results in more noisy data. As expected, Specificity and Generalisation both degrade (increase in value) as the mis-registration is progressively increased. In most cases there is a monotonic relationship between Specificity/Generalisation and model degradation, but this is not the case when Euclidean distance is used. Note that there is a measurable difference in both measures, even for fairly small perturbations to the initial registration (Figure 5 (center)).

**Figure 6:** Specificity and Generalisation of degraded brain models.

**Figure 7:** Specificity and Generalisation (with error bars) of degraded face models.