We have introduced an objective method of assessing appearance models that depends only on the model to be tested and the training data from which it was generated. Validation experiments, based on perturbing correspondences obtained using ground truth, show that we are able to detect increasing model degradation relaiably. The results obtained for different sizes of shuffle neighbourhood show that the use of shuffle distance rather than Euclidean distance ensures monotonicity and increases the sensitivity of the method. We have also shown that the approach is capable of detecting statistically significant differences between models based on different approaches to automated model building. We believe that this work makes a valuable contribution, by providing an objective basis for comparing different methods of constructing generative models of appearance.