Specificity and Generalisation

Our approach to model evaluation is based on directly measuring key properties of a given model. An effective model is one which is able to generate a broad range of example of the class of modelled images. This property is referred to as Generalisation ability. This property is not sufficient since the model must also generate examples that are consistent with the class of modelled images. This property is referred to as Specificity. As will be shown later, the two properties are related to (and their estimates can even be substituted by) the notion of Shannon's entropy [22].

The approach to the assessment of NRR relies on the close relationship between registration and statistical model building, and extends the work of Davies et al. on evaluating shape models [8]. We note that NRR of a set of images establishes the dense correspondence which is required to build a combined appearance model. Given the correct correspondence, the model provides a concise description of the training set. As the correspondence is degraded, the model also degrades in terms of its ability to reconstruct images of the same class, not in the training set (Generalisation), and its ability to only synthesise new images similar to those in the training set (Specificity). If we represent training images and those synthesised by the model as points in a high dimensional space, the clouds represented by training and synthetic images ideally overlap fully (see Fig. 2). Given a measure of the distance between images (as described in the next subsection), Specificity,

, Generalisation,

, and their standard errors $\sigma_{\mathcal{S}}$ and $\sigma_{\mathcal{G}}$ can be defined as follows:

**Figure 2:** The model evaluation framework: A model is constructed from the training and images are generatedfromthe model. Each image is vectorised and embedded in hyperscape. Many such points can be visualised as though they form a cloud.

Let $\{ I_{a}(X_{0})\,:\, a\,=\,1,...m\}$ be a large image set which has been sampled from the model and has the same distribution as the model. The distance between two images is described by $\vert\cdot\vert$ which enables us to define:

$\begin{displaymath} G=\frac{1}{n}\begin{array}{c} n\\ \sum\\ i=1\end{array}min_{j}\,\vert I_{i}-I_{j}\vert,\end{displaymath}$

(5)

$\begin{displaymath} S=\frac{1}{m}\begin{array}{c} m\\ \sum\\ j=1\end{array}min_{i}\,\vert I_{i}-I_{j}\vert.\end{displaymath}$

(6)

$\begin{displaymath} \mathbf{\sigma_{\mathcal{G}}}=\frac{SD(min_{\, j}\,\vert I_{i}-I_{j}\vert)}{\sqrt{n-1}},\end{displaymath}$

(7)

$\begin{displaymath} \mathbf{\sigma_{\mathcal{S}}}=\frac{SD(min_{\, j}\,\vert I_{i}-I_{j}\vert)}{\sqrt{m-1}}.\end{displaymath}$

(8)

where { $I_{j}:j=1..m\}$ is a large set of images sampled from the model, $\vert\cdot\vert$ is the distance between two images and SD is standard deviation.

Both values are low for a good model as short distances imply image proximity. Specificity measures the mean distance between images generated by the model and their closest neighbours in the training set, whilst Generalisation measures the mean distance between images in the training set and their closest neighbours in the synthesised set. The approach is illustrated diagrammatically in Fig. 3.

**Figure 3:** Simplified hyperspace representation of the model indices calculation. Specificity and Generalisation are derived from the distances between images.

It can be observed that Specificity and Generalisation fail to account for many of image distances. This might lead to poor and incomplete results. While these measures provide good approximations, we strive to make use of a less simplistic method and exploit work that is related to the MDL principle. The principle was shown to be valuable when dealing with shapes alone.