We performed two sets of experiments, one designed to validate our model-based approach for evaluating NRR, the other to demonstrate its use in a practical application. In the first set of experiments our aim was to show that Specificity and Generalisation are valid measures of the degree of misregistration of a group of images. We took a set of registered images for which ground-truth labels were available, and applied a series of deformations which introduced progressively increasing misregistration. This allowed us to investigate how our measures of Specificity and Generalisation varied, as a function of the known misregistration. We also measured generalised overlap, using the ground-truth labels, to provide a comparison with an existing method. In the second set of experiments our aim was to demonstrate that we could usefully discriminate between different NRR algorithms, by comparing results for the same dataset.