Recovery of Deformation Fields: One obvious way to test the performance of a registration algorithm is to apply it to some artificial data where the actual correspondence is known. Such test data is typically constructed by applying sets of known deformations (either spatial or textural) to actual images. This artificially-deformed data is then registered. The process of evaluation is based on comparison between the deformation fields recovered by the registration and those which have originally been applied [#!Validation-NRR!#,#!Schnabel!#]. This type of approach can be used to test NRR methods 'off-line'. However, the validity of this method presumes that we have the ability to construct artificial deformations which are sufficiently close to the types of deformation seen in real-world situations. Furthermore, there are situations where such artificial data sets are a poor representation of the actual variation between images. For example, images taken from different subjects may display a much more complicated and extensive variation that that which can be simulated by such simple deformations.
Overlap-based Assessment: The overlap-based approach involves measuring the overlap of anatomical annotations before and after registration. A good NRR algorithm will be capable of aligning similar image intensities - in particular those which indicate the location of anatomical structures. Alignment of image intensities leads to better overlap between anatomical structures, so the two are closely-correlated.
Similar approaches involve measurement of the mis-registration of anatomical regions of significance [#!Fitzpatrick_TMI_2001!#,#!Hellier!#], and the overlap between anatomically equivalent regions obtained using segmentation. This process is either manual or semi-automatic [#!Hellier!#,#!Validation-NRR!#]. Although these methods cover a general range of applications, they are labour-intensive and are often prone to errors. They also rely on one's ability to faithfully extract anatomical structures from the image intensities alone.
This paper explores one such method, which assesses registration using the spatial overlap. The overlap is defined using Tanimoto's formulation of corresponding regions in the registered images. The correspondence is defined by labels of distinct image regions (in this case brain tissue classes), produced by manual mark-up of the original images (ground-truth labels). A correctly registered image set will exhibit high relative overlap between corresponding brain structures in different images and, in the opposite case - low overlap with non-corresponding structures. A generalised overlap measure [#!Crum_MICCAI_2005!#] is used to compute a single figure of merit for the overall overlap of all labels over all subjects:
where 2#2 indexes voxels in the registered images, 3#3 indexes the label and 4#4 indexes the two images under consideration. 5#5 and 6#6 represent voxel label values in a pair of registered images and are in the range 7#7. The 8#8 and 9#9 operators are standard results for the intersection and union of a fuzzy set. This generalised overlap measures the consistency with which each set of labels partitions the image volume.
The parameter 10#10 affects the relative weighting of different labels. With 11#11, label contributions are implicitly volume weighted with respect to one another. This means that large labels contribute more to the overall measure. We have also considered the cases where 10#10 weights for the inverse labelled region volume (which makes the relative weighting of different labels equal), where 10#10 weights for the inverse label volume squared (which gives regions of smaller volume higher weighting) and where 10#10 weights for a measure of label complexity. We define label complexity rather arbitrarily as the mean absolute voxel intensity gradient in the labelled region.
More formulations of overlap, other than Tanimoto's, have also been investigated. Their results were shown to be less accurate and they are omitted in the interest of brevity. While our main focus remains assessment that requires no ground truth, the approach above provides a good reference to compare against for validity with respect to ground-truth annotation.