Next: Measuring Image Separation
Up: Model-Based Evaluation of NRR
Previous: Model-Based Evaluation of NRR
Specificity and Generalisation
A good model of a set of training data
should possess several properties. Firstly, the model should be
able to extrapolate and interpolate effectively from the training
data, to produce a range of images from the same general class as
those seen in the training set. We will call this generalisation ability. Conversely, the model should not produce
images which cannot be considered as valid examples of the class
of image modelled. That is, a model built from brain images should
only generate images which could be considered as valid images of
possible brains. We will call this the specificity of the
model. In previous work, quantitative measures of specificity and generalisation were used to evaluate shape
models [17]. We present here the extension of these ideas
to images (as opposed to shapes). Figure provides
an overview of the approach.
Consider first the training data for the model, that is, the set
of images which were the input to NRR. Without loss
of generality, each training image can be considered as a single
point in an 43#43-dimensional image space. A statistical model is
then a probability density function (pdf)
44#44 defined on this
space.
Figure:
The model
evaluation framework: A model is constructed from the training set and used to generate synthetic images. The training set and
the set generated by the model can be viewed as clouds
of points in image space (
45#45 represented by stars, and
46#46 represented by dots).
47#47 |
To be specific, let
48#48
denote the
49#49 images of the training set when
considered as points in image space. Let
44#44 be the
probability density function of the model. We define a
quantitative measure of the specificity 50#50 of the model
with respect to the training set
51#51 as follows:
where 53#53 is a distance on image space, raised to some
positive power 54#54 (for the remainder of this paper we will
consider only the case 54#54 = 1). That is, for each point
55#55 on image space, we find the nearest-neighbour to this
point in the training set, and sum the powers of the
nearest-neighbour distances, weighted by the pdf
44#44.
Greater specificity is indicated by smaller values of 50#50,
and vice versa. In Figure , we give diagrammatic examples
of models with differing specificity.
The integral in equation can be approximated using a
Monte-Carlo method. A large random set of images
56#56 is generated,
having the same distribution as the model pdf
44#44. The
estimate of the specificity () is:
with standard error:
where 59#59 is the standard deviation of the set of 60#60
measurements. Note that this definition of 50#50 does not require
that we construct the space of images, we simply need to be able
to define distances between images. This is discussed in
Section below.
Figure:
Training set (points) and model pdf
(shading) in image space. Left: A model
which is specific, but not general. Right:
A model which is general, but not specific.
61#61
|
We define a measure of generalisation similarly, simply reversing
the direction of the nearest-neighbour distance measure:
with standard error:
That is, for each member of the training set
45#45, we compute the distance to the
nearest-neighbour in the sample set
64#64.
Large values of 65#65 correspond to model
distributions which do not cover the training set
and have poor generalisation ability, whereas
small values of 65#65 indicate models
with better generalisation ability.
We note here that both measures can be further extended, by
considering the sum of distances to 4#4-nearest-neighbours, rather
than just to the single nearest-neighbour. However, the choice of
4#4 would require careful consideration and in what follows, we
restrict ourselves to the single nearest-neighbour case.
Figure:
A comparison between shuffle difference images
evaluated using various size neighbourhoods (radius 66#66).
Left: original image, right: warped
image, centre, from the left: shuffle
distance with 67#67(Euclidean), 68#68 and
69#69 pixels.
70#70
|
Next: Measuring Image Separation
Up: Model-Based Evaluation of NRR
Previous: Model-Based Evaluation of NRR
Roy Schestowitz
2007-03-11