An alternative approach is based on measuring the alignment [3,4], or overlap [4,6] of anatomical structures annotated by an expert, or obtained as a result of (semi-)automated segmentation. Manual annotation is expensive to obtain and prone to subjective error. Reliable automated or semi-automated segmentation is extremely difficult to achieve - indeed if it was available it would often obviate the need for NRR.
We have used an overlap-based approach to provide a 'gold standard' method of assessment. The method requires manual annotation of each image - providing an anatomical/tissue label for each voxel - and measures the overlap of corresponding labels following registration, using a generalisation of Tanimoto's overlap coefficient. Each label for a given image is represented using a binary image but, after warping and interpolation into a common reference frame based on the results of NRR, we obtain a set of fuzzy label images. These are combined in a generalised overlap score [8] which provides a single figure of merit aggregated over all labels and all images in the set:
where indexes voxels in the registered images, indexes the labels and indexes image pairs (all permutations are considered). and represent voxel label values for a pair of registered images and are in the range . The and operators are standard results for the intersection and union of fuzzy sets. This generalised overlap measures the consistency with which each set of labels partitions the image volume.
The parameter affects the relative weighting of different labels. With , label contributions are implicitly volume-weighted with respect to one another. This means that large structures contribute more to the overall measure. We have also considered the cases where weights labels by the inverse of their volume (which makes the relative weighting of different labels equal), where weights labels by the inverse of their volume squared (which gives regions of smaller volume higher weighting), and where weights labels by their complexity, which we define as the mean absolute voxel intensity gradient over the labelled region.
An overlap score based on a generalisation of the popular Dice Similarity Coefficient (DSC) would also be possible but, since DSC is related monotonically to the Tanimoto Coefficient (TC) by DSC = 2TC/(TC+1) [5] we have not considered this further.