Overlap-Based Assessment

An alternative approach is based on measuring the alignment [], or overlap [,] of anatomical structures annotated by an expert, or obtained as a result of (semi-)automated segmentation. This has the disadvantage that manual annotation is expensive to obtain and prone to subjective error, whilst reliable automated or semi-automated segmentation is extremely difficult to achieve - indeed if it was available it would often obviate the need for NRR.

Let one suppose that such annotation is available, so that one is provided with pixel-by-pixel binary label information for each image. For example, for an image of a brain, the set of labels could be tissue-type labels such as CSF, white matter, grey matter, or at a finer level of detail corresponding to individual structures. The Tanimoto overlap between a pair of binary label images A and B is considered here (see Figure $[*]$ ). Tanimoto overlap [] is the ratio between the intersection of A and B, and the union of the two. This can also be written $O_{p}=\frac{N(A\cap B)}{N(A\cup B}$ which deals naturally with cases where applying the deformation field to a label image results in label values between 0 and 1.

is just the size of the region in this binary case.

**Figure:** Two binary labels, A and B, overlap one another
$\includegraphics[scale=0.7]{Graphics/overlap-labels}$

This idea can be generalised to the case of a group of images with multivalued tissue labels for each voxel. Each label for a given image is represented using a binary image but, after warping and interpolation into a common reference frame, based on the results of NRR, a set of fuzzy label images is obtained. These are then combined in a generalised overlap score [] which provides a single figure of merit aggregated over all labels and all images in the set:

$\begin{displaymath}\mathcal{O} = \frac{ \sum\limits_{\mbox{\small pairs},k}\: \s... ...{l} \sum\limits_{\mbox{\small voxels},i} MAX(A_{kli},B_{kli})} \end{displaymath}$

(2.1)

where

indexes voxels in the registered images,

indexes the labels and

indexes image pairs (all permutations are considered). $A_{kli}$ and $B_{kli}$ represent voxel label values for a pair of registered images and are in the range

. This generalised overlap measures the consistency with which each set of labels partitions the image volume. The standard error in $\mathcal{O}$ can be estimated in the normal way from the standard deviation of the pairwise overlaps.

The parameter $\alpha_{l}$ affects the relative weighting of different labels. With $\alpha_{l}=1$ , label contributions are implicitly volume-weighted wrt one another. This means that large structures contribute more to the overall measure. One can also consider the case where $\alpha_{l}$ weights labels by the inverse of their volume (which makes the relative weighting of different labels equal), where $\alpha_{l}$ weights labels by the inverse of their volume squared (which gives regions of smaller volume higher weighting), and where $\alpha_{l}$ weights labels by their complexity, which is defined as the mean absolute voxel intensity gradient over the labelled region.

An overlap score based on a generalisation of the popular Dice Similarity Coefficient (DSC) would also be possible, but, since DSC is related monotonically to the Tanimoto Coefficient (TC) by DSC = 2TC/(TC+1) [], it need not be considered further.