Differences

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+!!!Assessing the Accuracy of Non-Rigid Registration With and Without Ground Truth
+R. S. Schestowitz^{1}, W. R. Crum^{2}, V. S. Petrovic^{1}, C. J. Twining^{1}, T. F. Cootes^{1} and C. J. Taylor^{1}
+^{1}Imaging Science and Biomedical Engineering, University
+of Manchester
+Stopford Building, Oxford Road, Manchester M13 9PT,
+United Kingdom
+^{2}Centre for Medical Image Computing, Department of
+Computer Science, University
+College London, Gower Street, London WC1E 6BT, United Kingdom
+A diverse collection or methods exist for the problem
+of non-rigid registration, whereby a set of images is
+to be aligned. We perceive a deficiency, however, in
+the ways such registrations are validated or even
+evaluated. Hereby we present two methods for evaluating
+non-rigid registration. One of the methods requires
+ground-truth solutions to be provided a priori, yet the
+other does not. We shall present results, which confirm
+that both methods are valid and proceed to calculating
+their sensitivities. We find that the method which
+requires ground-truth solutions is not as sensitive as
+the method which need not have anything but the raw
+images and the corresponding deformation fields.
+The aim of registration is to transform images until
+corresponding structures across them overlap.
+Registration is an optimisation problem wherein the
+degree of overlap, as measured by some metric, needs to
+be increased. Overlap is established by transformation
+of the images. Transformations and measures of
+similarity fall under a framework that we call the "
+objective function", which fully describes the approach
+a registration algorithm takes.
+There are further factor that distinguish one
+registration approach from another. Most notably, there
+is divide over whether pairs of images should be
+handled apart (pair-wise) rather than the whole group
+of images simultaneously (group-wise). Therefore, there
+needs be an unbiased method for assessing the
+performance of registration algorithms. Such a method
+must first be validated using careful experimentation,
+which incorporates the notion of correct solutions.
+The first of the methods to be described relies on the
+existence of ground-truth data such as boundaries of
+image elements or the location of distinguishable
+points. Having registered an image set, the method can
+measure overlap between elements that have been
+annotated, thus implying how good a registration was.
+Our latter method is able to assess registration
+without ground truth of any form. The approach involves
+automatic construction of appearance models from the
+registered data, subsequently evaluating, using model
+syntheses, the quality of that model. Quality of the
+registration is tightly-related to the quality of its
+resulting model and the two tasks, namely model
+construction and image registration, are innately the
+same. Both involve the identification of corresponding
+points, also known as landmarks in the context of
+model-building. Expressed differently, a registration
+produces a dense set of corresponding points and models
+of appearance require nothing but the images and the
+correspondences in order to be built.
+To put the validity of both methods to the test, we
+assembled a set of 2-D 38 MR images of the brain. Each
+of these images was carefully annotated to identify
+different compartments within the brain. These
+anatomical compartments can be perceived as simplified
+labels that faithfully define brain structure. Our
+first method of assessment uses the Tanimoto overlap
+measure to calculate the degree to which labels across
+the image set overlap. In that respect, it exploits
+ground truth, which has been identified by an expert,
+to reason about registration quality.
+The second method takes an entirely different approach.
+It feeds on the results of a registration algorithm,
+where correspondences have been highlighted, and builds
+an appearance model given the images and their
+correspondences. From that model, many synthetic brain
+images are derived. Vectorisation of these images
+allows us to embed (or mentally visualise) them in a
+high-dimensional space. We can then compare the spatial
+cloud that these synthetic images form with the cloud
+that is composed from the original image set -- the set
+from which the model has been build. Computing the
+overlap between these clouds gives insight into the
+quality of the registration. Simply put, it is a model
+fit evaluation paradigm. The better the registration,
+the greater the overlap between those clouds will be.
+To compute overlap between two clouds of data, we have
+devised measures that we refer to as Specificity and
+Generalisablity. The former tells how well the model
+fits its seminal data, whereas the latter tells how
+well the data fits its derived model. It is a
+reciprocal relationship that 'locks' a data to its
+model and vice versa. We calculate Specificity and
+Generalisablity by measuring distances in space. As we
+seek a measure that is tolerant to slight differences,
+we use the shuffle distance, not neglecting to compare
+it against Euclidean distance.
+Our assessment framework, by which we test both
+methods, uses non-rigid registration, whereby many
+degrees of freedom are involved in image
+transformations. To systematically generate data over
+which our hypotheses can be tested, we perturb the
+brain data using clamped-plate splines. In this brain
+data, correspondences among images are said to be
+perfect so they can only ever be degraded. We then wish
+to show that as the degree of perturbation increases,
+so do the measures of our registration assessment methods.
+In our extensive batch of experiments we perturbed the
+datasets at progressively increasing levels, which led
+to well-understood misregistration of the data. We
+repeated these experiments 10 times to demonstrate that
+both approaches to assessment are consistent are all
+results unbiased. Having investigated and plotted the
+measures of overlap for each perturbation extent, we
+see a rather linear decrease in the amount of overlap
+(Figure X). This means that, as ground-truth-based
+registration is eroded, the overlap-based measure is
+able to detect that and the response is very
+well-behaved, thus meaningful and reliable.
+<Graphics file: ./Graphics/1.eps>
+    <Graphics file: ./Graphics/2.eps>
+          Figures X&Y. a
+          b
+          c.
+We then undertake another assessment task, this time
+exploiting the method which does not use ground truth.
+We notice a very similar behaviour (Figure Y), which is
+evidence that the latter is a powerful and reliable
+method of assessing the degree of misregistration, or
+conversely the quality of registration.
+As a last step, we embark on the task of comparing the
+two algorithm, identifying sensitivity as the factor
+which is most important. Sensitivity reflects on our
+ability to confidently tell apart a good registration
+from a worse one. The slighter the difference which can
+be correctly detected, the more sensitive the method.
+To calculate sensitivity, we compute the amount of
+change in terms of mean pixel deformation --
+deformation from the correct solution, that is. We then
+look at differences in our assessor's value, be it
+overlap, or Specificity, or Generalisation. We also
+stress the need to take account of the errors bars as
+there is both an inter-experiment error and
+measure-specific error; the two must be composed
+carefully. The derivation of sensitivity can be
+expressed as follows:
+placeholder
+where X is something... (TODO)
+<Graphics file: ./Graphics/3.eps>
+          Figure Z. a
+          b
+          c.
+Figure Z suggests that for roughly any selection of
+shuffle distance neighbourhood, the method which does
+not require ground truth is more sensitive than the
+method which depends on it. If the trends of these
+curves are looked at closely, it can be observed that
+they approximately overlap, which implies that the two
+methods are very closely correlated.
+In summary, we have shown two valid methods for
+assessing non-rigid registration. The methods are
+correlated in practice, but the principles they build
+upon are quite different. Their pre-requisites -- if
+any -- likewise. Registration can be evaluated with or
+without ground-truth annotation and the behaviour of
+the measures are consistent across distinct data, are
+well-behaved, and are sensitive. Both methods have been
+successfully applied to assessment of non-rigid
+registration algorithms and both methods led to the
+expected conclusions. That aspect of the work,
+nonetheless, is beyond the scope of this paper.

Schestowitz Wiki

User Tools

Site Tools

Differences

Page Tools