This shows you the differences between two versions of the page.
— |
mias-irc-2005 [2014/05/31 17:37] (current) admin created |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | !!!Assessing the Accuracy of Non-Rigid Registration With and Without Ground Truth | ||
+ | R. S. Schestowitz^{1}, W. R. Crum^{2}, V. S. Petrovic^{1}, C. J. Twining^{1}, T. F. Cootes^{1} and C. J. Taylor^{1} | ||
+ | |||
+ | |||
+ | ^{1}Imaging Science and Biomedical Engineering, University | ||
+ | of Manchester | ||
+ | Stopford Building, Oxford Road, Manchester M13 9PT, | ||
+ | United Kingdom | ||
+ | |||
+ | |||
+ | ^{2}Centre for Medical Image Computing, Department of | ||
+ | Computer Science, University | ||
+ | College London, Gower Street, London WC1E 6BT, United Kingdom | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | A diverse collection or methods exist for the problem | ||
+ | of non-rigid registration, whereby a set of images is | ||
+ | to be aligned. We perceive a deficiency, however, in | ||
+ | the ways such registrations are validated or even | ||
+ | evaluated. Hereby we present two methods for evaluating | ||
+ | non-rigid registration. One of the methods requires | ||
+ | ground-truth solutions to be provided a priori, yet the | ||
+ | other does not. We shall present results, which confirm | ||
+ | that both methods are valid and proceed to calculating | ||
+ | their sensitivities. We find that the method which | ||
+ | requires ground-truth solutions is not as sensitive as | ||
+ | the method which need not have anything but the raw | ||
+ | images and the corresponding deformation fields. | ||
+ | |||
+ | The aim of registration is to transform images until | ||
+ | corresponding structures across them overlap. | ||
+ | Registration is an optimisation problem wherein the | ||
+ | degree of overlap, as measured by some metric, needs to | ||
+ | be increased. Overlap is established by transformation | ||
+ | of the images. Transformations and measures of | ||
+ | similarity fall under a framework that we call the " | ||
+ | objective function", which fully describes the approach | ||
+ | a registration algorithm takes. | ||
+ | |||
+ | There are further factor that distinguish one | ||
+ | registration approach from another. Most notably, there | ||
+ | is divide over whether pairs of images should be | ||
+ | handled apart (pair-wise) rather than the whole group | ||
+ | of images simultaneously (group-wise). Therefore, there | ||
+ | needs be an unbiased method for assessing the | ||
+ | performance of registration algorithms. Such a method | ||
+ | must first be validated using careful experimentation, | ||
+ | which incorporates the notion of correct solutions. | ||
+ | |||
+ | The first of the methods to be described relies on the | ||
+ | existence of ground-truth data such as boundaries of | ||
+ | image elements or the location of distinguishable | ||
+ | points. Having registered an image set, the method can | ||
+ | measure overlap between elements that have been | ||
+ | annotated, thus implying how good a registration was. | ||
+ | |||
+ | Our latter method is able to assess registration | ||
+ | without ground truth of any form. The approach involves | ||
+ | automatic construction of appearance models from the | ||
+ | registered data, subsequently evaluating, using model | ||
+ | syntheses, the quality of that model. Quality of the | ||
+ | registration is tightly-related to the quality of its | ||
+ | resulting model and the two tasks, namely model | ||
+ | construction and image registration, are innately the | ||
+ | same. Both involve the identification of corresponding | ||
+ | points, also known as landmarks in the context of | ||
+ | model-building. Expressed differently, a registration | ||
+ | produces a dense set of corresponding points and models | ||
+ | of appearance require nothing but the images and the | ||
+ | correspondences in order to be built. | ||
+ | |||
+ | To put the validity of both methods to the test, we | ||
+ | assembled a set of 2-D 38 MR images of the brain. Each | ||
+ | of these images was carefully annotated to identify | ||
+ | different compartments within the brain. These | ||
+ | anatomical compartments can be perceived as simplified | ||
+ | labels that faithfully define brain structure. Our | ||
+ | first method of assessment uses the Tanimoto overlap | ||
+ | measure to calculate the degree to which labels across | ||
+ | the image set overlap. In that respect, it exploits | ||
+ | ground truth, which has been identified by an expert, | ||
+ | to reason about registration quality. | ||
+ | |||
+ | The second method takes an entirely different approach. | ||
+ | It feeds on the results of a registration algorithm, | ||
+ | where correspondences have been highlighted, and builds | ||
+ | an appearance model given the images and their | ||
+ | correspondences. From that model, many synthetic brain | ||
+ | images are derived. Vectorisation of these images | ||
+ | allows us to embed (or mentally visualise) them in a | ||
+ | high-dimensional space. We can then compare the spatial | ||
+ | cloud that these synthetic images form with the cloud | ||
+ | that is composed from the original image set -- the set | ||
+ | from which the model has been build. Computing the | ||
+ | overlap between these clouds gives insight into the | ||
+ | quality of the registration. Simply put, it is a model | ||
+ | fit evaluation paradigm. The better the registration, | ||
+ | the greater the overlap between those clouds will be. | ||
+ | |||
+ | To compute overlap between two clouds of data, we have | ||
+ | devised measures that we refer to as Specificity and | ||
+ | Generalisablity. The former tells how well the model | ||
+ | fits its seminal data, whereas the latter tells how | ||
+ | well the data fits its derived model. It is a | ||
+ | reciprocal relationship that 'locks' a data to its | ||
+ | model and vice versa. We calculate Specificity and | ||
+ | Generalisablity by measuring distances in space. As we | ||
+ | seek a measure that is tolerant to slight differences, | ||
+ | we use the shuffle distance, not neglecting to compare | ||
+ | it against Euclidean distance. | ||
+ | |||
+ | Our assessment framework, by which we test both | ||
+ | methods, uses non-rigid registration, whereby many | ||
+ | degrees of freedom are involved in image | ||
+ | transformations. To systematically generate data over | ||
+ | which our hypotheses can be tested, we perturb the | ||
+ | brain data using clamped-plate splines. In this brain | ||
+ | data, correspondences among images are said to be | ||
+ | perfect so they can only ever be degraded. We then wish | ||
+ | to show that as the degree of perturbation increases, | ||
+ | so do the measures of our registration assessment methods. | ||
+ | |||
+ | In our extensive batch of experiments we perturbed the | ||
+ | datasets at progressively increasing levels, which led | ||
+ | to well-understood misregistration of the data. We | ||
+ | repeated these experiments 10 times to demonstrate that | ||
+ | both approaches to assessment are consistent are all | ||
+ | results unbiased. Having investigated and plotted the | ||
+ | measures of overlap for each perturbation extent, we | ||
+ | see a rather linear decrease in the amount of overlap | ||
+ | (Figure X). This means that, as ground-truth-based | ||
+ | registration is eroded, the overlap-based measure is | ||
+ | able to detect that and the response is very | ||
+ | well-behaved, thus meaningful and reliable. | ||
+ | |||
+ | <Graphics file: ./Graphics/1.eps> | ||
+ | <Graphics file: ./Graphics/2.eps> | ||
+ | |||
+ | |||
+ | Figures X&Y. a | ||
+ | b | ||
+ | c. | ||
+ | |||
+ | We then undertake another assessment task, this time | ||
+ | exploiting the method which does not use ground truth. | ||
+ | We notice a very similar behaviour (Figure Y), which is | ||
+ | evidence that the latter is a powerful and reliable | ||
+ | method of assessing the degree of misregistration, or | ||
+ | conversely the quality of registration. | ||
+ | |||
+ | As a last step, we embark on the task of comparing the | ||
+ | two algorithm, identifying sensitivity as the factor | ||
+ | which is most important. Sensitivity reflects on our | ||
+ | ability to confidently tell apart a good registration | ||
+ | from a worse one. The slighter the difference which can | ||
+ | be correctly detected, the more sensitive the method. | ||
+ | To calculate sensitivity, we compute the amount of | ||
+ | change in terms of mean pixel deformation -- | ||
+ | deformation from the correct solution, that is. We then | ||
+ | look at differences in our assessor's value, be it | ||
+ | overlap, or Specificity, or Generalisation. We also | ||
+ | stress the need to take account of the errors bars as | ||
+ | there is both an inter-experiment error and | ||
+ | measure-specific error; the two must be composed | ||
+ | carefully. The derivation of sensitivity can be | ||
+ | expressed as follows: | ||
+ | |||
+ | placeholder | ||
+ | |||
+ | where X is something... (TODO) | ||
+ | |||
+ | <Graphics file: ./Graphics/3.eps> | ||
+ | |||
+ | |||
+ | Figure Z. a | ||
+ | b | ||
+ | c. | ||
+ | |||
+ | Figure Z suggests that for roughly any selection of | ||
+ | shuffle distance neighbourhood, the method which does | ||
+ | not require ground truth is more sensitive than the | ||
+ | method which depends on it. If the trends of these | ||
+ | curves are looked at closely, it can be observed that | ||
+ | they approximately overlap, which implies that the two | ||
+ | methods are very closely correlated. | ||
+ | |||
+ | In summary, we have shown two valid methods for | ||
+ | assessing non-rigid registration. The methods are | ||
+ | correlated in practice, but the principles they build | ||
+ | upon are quite different. Their pre-requisites -- if | ||
+ | any -- likewise. Registration can be evaluated with or | ||
+ | without ground-truth annotation and the behaviour of | ||
+ | the measures are consistent across distinct data, are | ||
+ | well-behaved, and are sensitive. Both methods have been | ||
+ | successfully applied to assessment of non-rigid | ||
+ | registration algorithms and both methods led to the | ||
+ | expected conclusions. That aspect of the work, | ||
+ | nonetheless, is beyond the scope of this paper. |