One approach to assessing the results of NRR is to create a set of test images by taking original images and applying known spatial deformations. Evaluation involves comparing the deformation fields recovered by NRR to those known to have been applied [15,16]. This approach can be used to test a given NRR method 'off-line', but cannot be used to evaluate the results when the method is applied to real data as part of a registration-based analysis.
An alternative approach involves measuring the coincidence of anatomical annotations following registration. Variants of this approach include measuring the mis-registration of anatomical landmarks [8,10], and the overlap between anatomically equivalent regions obtained using manual or semi-automatic segmentation [10,15]. These methods are of general application, but are labour-intensive and error prone.
This paper will use a generalised overlap-based approach to provide a 'gold standard' method of assessment. The method requires manual annotation of each image - providing an anatomical/tissue label for each voxel - and measures the overlap of corresponding labels following registration, using a generalisation of Tanimoto's overlap coefficient. Each label for a given image is represented using a binary image but, after warping and interpolation into a common reference frame based on the results of NRR, we obtain a set of fuzzy label images. These are combined in a generalised overlap score [4], which provides a single figure of merit aggregated over all labels and all images in the set:
where indexes voxels in the registered images, indexes the label and indexes the two images under consideration. and represent voxel label values in a pair of registered images and are in the range [0, 1]. The and operators are standard results for the intersection and union of a fuzzy set. This generalised overlap measures the consistency with which each set of labels partitions the image volume. The parameter affects the relative weighting of different labels. With , label contributions are implicitly volume weighted with respect to one another. We have also considered the cases where weights for the inverse label volume (which makes the relative weighting of different labels equal), where weights for the inverse label volume squared (which gives labels of smaller volume higher weighting) and where weights for a measure of label complexity (which we define as the mean absolute voxel intensity gradient in the label).