Our approach to model evaluation is based on measuring, directly, key properties of the model. This approach is based on the work of Davies et al , who defined specificity and generalisation ability for shape models. To be effective, a model needs the ability to generate a broad range of examples of the class of images that have been modelled. We refer to this as Generalisation ability. Although this property is necessary , it is not sufficient. We also require that the model can only generate examples that are consistent with the class of images modelled. We refer to this as Specificity. We define both of these measures by comparing the distribution of training images and the distribution of images generated using the model. An overview of the approach is given in Figure 2. Any image can be considered as a point in a high-dimensional space (defined by its intensity values). The training set forms a cloud of points in such a space. If we sample from the model, we generate a second cloud of points in this space. For an ideal model, the two clouds are coincident. We define Generalisation and Specificity in terms of the distance from each training image to the nearest model-generated image, and the distance from each model-generated image to the nearest training image respectively. We discuss the choice of an appropriate distance metric in section 3.3.