Appearance Model Evaluation

February 2005

The Problem

Given:
- Appearance model
- Training set that generated the model
Sought:
- Measure of model quality
- Possibilities: specificity, generalisability, etc.
- These method are based on distance
- The question: How can distance be measured?

Distance

Can articulate distance in terms of parameters
Intensity differences are problematic
Wish to account for shape and intensity variation
- It is not clear how to consider both
- They are incommensurate

Requirements

The measure need to be:
- Easily/quickly computable
  - The value will need to be calculated for entire training set
  - Complexity is proportional to set size
- Robust to:
  - 'Folding' of mappings, e.g. in shuffling (search for match within a fixed window)
  - ...
- Other properties:
  - Distance from A to C is greater or equal to aggregated distance from A to B and B to C.
  - ...

Motivation/Prospects

The approach will allow to measure faithfullness w.r.t. model
It is known what the model encapsulates: shape and intensity
Therefore, the behaviour is known
Is is based on the synthesis/instantiation
Model-building procedure is well-understood

Visual Illustrations

Data lies in space of e.g. parameter, intensity

Visual Illustrations

Each model synthesis is looked at in turn

Visual Illustrations

Distance measured to training set

Visual Illustrations

In specificity, nearest distance is of interest

Visual Illustrations

Generalisability reverses the roles of model syntheses and training set

Visual Illustrations

Does so to ensure the model does not span a large volume in space

Returning to Questions

What distance should be measured?
How to treat a finite yet large sets efficiently?
Some ideas follow...

Visual Illustrations Again

Let us take a brain and a distorted version of it (whirling filter)

Visual Illustrations Again

Let us assume one of them is model synthesis

Visual Illustrations Again

The other one is arbitrarily taken from the training data

Visual Illustrations Again

Let us remember that we have a set of training data

Visual Illustrations Again

The aim is to show that the model is not far apart from the training data (at least some instances)

Visual Illustrations Again

The measure cannot be solely intensity-based

Visual Illustrations Again

Same brain, different position in space

Explanation

This is an example of translation inconsistency
Shape has similar properties
Example: Brain is wider/narrower

Visual Illustrations of the Example

The shape change causes great difference in intensities

Visual Illustrations of the Example

Same brain stretched so must account for shape

Alternative Way to Measuring Difference

Try to match point in one image to another within a boundary

Alternative Way to Measuring Difference

But this can produce awkward mappings

An Idea

Can measures like MI be of help here?

Discussion

Using 'general-purpose' similarity measure
Good for registration
Will not take advantage of all knowledge
Does not have proper notion of shape and intensity
However, quick to compute

More Ideas

Let us look at the set again

More Ideas

The distinction between model synthesis and training data instance can be neglected

Discussion Again

All that is needed is a metric of distance
Takes only 2 images (or volumes) at a given time
Distance relates to intensity and shape
Care for efficiency

Simplified View

Look at only two instances

Simplified View

Model synthesis holds extra information

What is Required

Showing that model describes instances fairly thoroughly
Model does not describe illegal instances
Specificity and generalisability do this
Specificity 'handles' the former condition
Generalisability 'handles' the latter

Several Contraints

The metric needs to be robust to awkward instances
Example #1: Void image
Example #2: Reversed (flipped/mirrored) image
Example #3: Strange shape variation in uniform areas like background

Returning to Simplified View

What if subsets of training sets taken?

Returning to Simplified View

Construct model of subsets and perform model comparisons?

Another Thought

What if the set is taken

Another Thought

What if the images are taken

Another Thought

Points of correspondence to be used

Another Thought

Triangulate and treat as features

Another Thought

Model image segments

Explanation of the Ideas

Taking intensity of points of correspondence (control nodes) is unreliable
Not enough nodes in practice
Sampling along them might not give good matching locally, pixel-to-pixel
By modelling, there is a more tolerant measure

Other Ideas to Ponder About

Once similarity is obtained...
...how does one use it to measure quality of entire model
Most reasonable to look at the problem in terms of representation in space
Reliability and efficiency have trade-offs