Statistical Models of Appearance

Next: Model-Based Evaluation of NRR Up: Background Previous: Overlap-Based Methods

Statistical Models of Appearance

Our approach to ground-truth-free evaluation of NRR depends on the ability, given a set of registered images, to construct a generative statistical model of appearance. We have adopted the approach of Cootes et al [9,10], who introduced models that capture variation in both shape and texture (in the graphics sense). These have been used extensively in medical image analysis in, for example, brain morphometry and cardiac time-series analysis [11,12,13]. Other approaches to appearance modelling could also be considered - we rely only on the generative property in this application

**Figure 1:** The effect of varying the first (top row), second, and third model parameters of a brain appearance model by $\pm 2.5$ standard deviations

The key requirement in building an appearance model from a set of images, is the existence of a dense correspondence across the set. This is often defined by interpolating between the correspondences of a limited number of user-defined landmarks. Shape variation is then represented in terms of the motions of these sets of landmark points. Using the notation of Cootes et al [9], the shape (configuration of landmark points) of a single example can be represented as a vector $\mathbf{x}$ formed by concatenating the coordinates of the positions of all the landmark points for that example. The texture is represented by a vector $\mathbf{g}$ , formed by concatenating the image values for the shape-free texture sampled from the image.

In the simplest case, we model the variation of shape and texture in terms of multivariate gaussian distributions, using Principal Component Analysis (PCA) [15], obtaining linear statistical models of the form:

$\displaystyle \mathbf{x}$	$\displaystyle =$	$\displaystyle \mathbf{\overline{x}}+\mathbf{P}_{s}\mathbf{b}_{s}$
$\displaystyle end{tex2html_deferred}$			(2)
$\displaystyle end{tex2html_deferred} \mathbf{g}$	$\displaystyle =$	$\displaystyle \overline{\mathbf{g}}+\mathbf{P}_{g}\mathbf{b}_{g}$	(3)

where $\mathbf{b}_{s}$ are shape parameters, $\mathbf{b}_{g}$ are texture parameters, $\mathbf{\overline{x}}$ and $\overline{\mathbf{g}}$ are the mean shape and texture, and $\mathbf{P}_{s}$ and $\mathbf{P}_{g}$ are the principal modes of shape and texture variation respectively.

In generative mode, the input shape ( $\mathbf{b}_{s}$ ) and texture ( $\mathbf{b}_{g}$ ) parameters can be varied continuously, allowing the generation of sets of images whose statistical distribution matches that of the training set.

In many cases, the variations of shape and texture are correlated. If this correlation is taken into account, we then obtain a combined statistical model of the more general form:

$\displaystyle \mathbf{x}$	$\displaystyle =$	$\displaystyle \bar{\mathbf{x}}+\mathbf{Q}_{s}\mathbf{c}$
$\displaystyle end{tex2html_deferred}$			(4)
$\displaystyle end{tex2html_deferred} \mathbf{g}$	$\displaystyle =$	$\displaystyle \mathbf{\bar{g}}+\mathbf{Q}_{g}\mathbf{c}$	(5)

where the model parameters $\mathbf{c}$ control both shape and texture, and $\mathbf{Q}_{s}$ , $\mathbf{Q}_{g}$ are matrices describing the general modes of variation derived from the training set. The effect of varying different elements of $\mathbf{c}$ for a model built from a set of 2D MR brain images is shown in Figure 1.

Generally, we wish to distinguish between the meaningful shape variation of the objects under consideration, and the apparent variation in shape that is due to the positioning of the object within the image (the pose of the imaged object). In this case, the appearance model is generated from an (affinely) aligned set of images. Point positions $\mathbf{x}_{im}$ in the original image frame are then obtained by applying the relevant pose transformation $T_{\mathbf{t}}(\cdot)$ :

$\displaystyle \mathbf{x}_{im}=T_{\mathbf{t}}(\mathbf{x}_{model})$

(6)

where $\mathbf{x}_{model}$ are the points in the model frame, and $\mathbf{t}$ are the pose parameters. For example, in 2D, $T_{\mathbf{t}}$ could be a similarity transform with four parameters describing the translation, rotation and scale of the object.

In an analogous manner, we can also normalise the image set with respect to the mean image intensities and image variance,

$\displaystyle \mathbf{g}_{im}=T_{gtrans}(\mathbf{g}_{model}),$

(7)

where $T_{gtrans}$ consists of a shift and scaling of the image intensities. For further implementation details see [9,10].

As noted above, a meaningful dense groupwise correspondence is required before an appearance model can be built. NRR provides a natural method of obtaining such a correspondence, as noted by Frangi and Rueckert [11,12]. It is this link that forms the basis of our new approach to NRR evaluation.

The link between registration and modelling is further exploited in the Minimum Description Length (MDL) [16] approach to groupwise NRR, where modelling becomes an integral part of the registration process. This is of one of the registration strategies evaluated later in the paper.

Next: Model-Based Evaluation of NRR Up: Background Previous: Overlap-Based Methods

Roy Schestowitz 2007-03-11