An alternative approach is based on measuring the alignment , or overlap [4,6] of anatomical structures annotated by an expert, or obtained as a result of (semi-)automated segmentation. This has the disadvantege that manual annotation is expensive to obtain and prone to subjective error, whilst reliable automated or semi-automated segmentation is extremely difficult to achieve - indeed if it was available it would often obviate the need for NRR.
We have used an overlap-based approach to provide a 'gold standard' method of assessment. The method requires manual annotation of each image - providing an anatomical/tissue label for each voxel - and measures the overlap of corresponding labels following registration, using a generalisation of Tanimoto's overlap coefficient . Each label for a given image is represented using a binary image but, after warping and interpolation into a common reference frame, based on the results of NRR, we obtain a set of fuzzy label images. These are combined in a generalised overlap score  which provides a single figure of merit aggregated over all labels and all images in the set:
where 2#2 indexes voxels in the registered images, 3#3 indexes the labels and 4#4 indexes image pairs (all permutations are considered). 5#5 and 6#6 represent voxel label values for a pair of registered images and are in the range 7#7. The 8#8 and 9#9 operators are standard results for the intersection and union of fuzzy sets. This generalised overlap measures the consistency with which each set of labels partitions the image volume. The standard error in 10#10 can be estimated in the normal way from the standard deviation of the pairwise overlaps.
The parameter 11#11 affects the relative weighting of different labels. With 12#12, label contributions are implicitly volume-weighted with respect to one another. This means that large structures contribute more to the overall measure. We have also considered the cases where 11#11 weights labels by the inverse of their volume (which makes the relative weighting of different labels equal), where 11#11 weights labels by the inverse of their volume squared (which gives regions of smaller volume higher weighting), and where 11#11 weights labels by their complexity, which we define as the mean absolute voxel intensity gradient over the labelled region.
An overlap score based on a generalisation of the popular Dice Similarity Coefficient (DSC) would also be possible but, since DSC is related monotonically to the Tanimoto Coefficient (TC) by DSC = 2TC/(TC+1)  we have not considered this further.