Progress Report

March 13^th, 2006

Recently-Listed Aims

Annul the effect of various free parameters in the calculation of graph entropy
The primary goal is, at present, normalisation for NRR (and model) assessment
Devise methods for improving the efficiency of NRR assessment in 3-D
Await 3-D data for evaluation (38 training brain volumes, 1000+ syntheses)

Recently Listed Aims - Ctd.

Re-consider the image similarity method proposed by Liwei Wang (namely IMED), recite implementational issues
Calculate the pseudo-entropy properly for brain data at hand
Make calculation independent of the numbers of examples under consideration
Calculate the constants in entropy and use a K nearest neighbours (KNN) approach to get the results

Accounting and Correcting for Set Sizes

Investigate the entropy term which is related to graph length
- The first term as a function of the number of synthetic examples (100-1000)
- The first term as a function of the number of training examples keeping the number of synthetic images remains fixed at 1000

The Effect of Changing Set Size - Slide #1

The entropy as the function of the synthetic image set size

The Effect of Changing Set Size - Slide #2

graph-length-entropy-number-of-training-images-varies.png

Entropy as function of the size of the training image set

Further Investigation

In both experiments, one could average the results from multiple random samples
To generate better curves of training set size versus entropy, the following experiment was engineered:
- Subset of the training images drawn stochastically
- Checking the range of 1 training image to 37 training images
- Repeating the experiments 10 times to make the curves more consistent
Safe to assume that a similar curve with variation of the synthetic set size would be less interesting
For training set, one example gave an unstable curve
For synthetic images, we have a smooth curve already; no need to repeat the experiments either

Repeated Experiments - Set Size Variation

graph-length-entropy-number-of-training-images-varies-repeated-10-randomised.png

Repeated experiment, same as in previous slides. Results shown as the mean over several set choices

Resolving the Imbalance

Reading through and following some citations in papers
Describing what functional form we should expect the variation to take
Still trying to discover a formulation which annuls this effect of instability, by predicting the point of balance in the curve, perhaps by fitting
Judging by the paper, the behaviour we get is not out of the ordinary

Resolving the Imbalance - Ctd.

Values vary as function of the:
- number of dimensions
- number of synthetic images
- number of images in the training set
- ...

Resolving the Imbalance - Ctd.

Attempt to flatten the curves where the numbers vary
So far this has been successful only to a limited degree, no true principles involved
We probably ought to work with 10,000 synthetic images overnight, use curve fitting and predict the value attained when many images are included in the evaluation

Similar Experiments Involving Sample Size

Different yet related experiments

Subsequent Steps

Need such working normalisation to justify a paper submission
After a lot of work on code (curve fitting, scaling) and accompanying reading, the task at hand seemed too daunting (time-consuming too)
Several free parameters need to be accounted for. Hard to arbitrarily handle different data and keep entropy values (of graph length) uniform. References are not targetted at this particular aspect of the problem.

3-D NRR Assessment - Optimisations

Multi-resolution approach
- Handle the data at coarser levels first
- Pros: As much as a cubic efficiency improvement, trivial to implement
- Cons: Misses out small detail in data, limits shuffle neighbourhood size

3-D NRR Assessment - Optimisations - Ctd.

Reducing intensity depth, rearranging in memory for parallel processing of several voxels
- Pros: Can double, triple even quadruple calculation of Euclidean distance
- Cons: Need to consider architecture, tough to implement, may need 64-bit processors
Predicting entropy (or Specificity)
- Pros: Work in progress, reduces the need for many synthetic images/volumes
- Cons: Not accurate, may be difficult to discern good NRR from a worse one

3-D NRR Assessment - Optimisations - Ctd.

Work on all images latereally, rather than deal with just a pair at a time, pixel by pixel or voxel by voxel
- Pros: Position of pixels of voxels under consideration remains fixed (possible flexibility, groupwise algorithms made easier)
- Cons: High memory consumption (probably several image have to reside in memory simultaneously)
- Idea #1: Normalise/weight voxel differences based on inter-group variance
- Idea #2: Create mean image from training set and synthetic set, assess it w.r.t. to sets, e.g. smoothness, relationship to sets

Image Distances: A Greedy Approach

We once wanted to see the entropic measure and possibly replace Euclidean/shuffle by the approach of Wang et al.

I was following the general approach that was named IMage Euclidean Distance (IMED). I relied on a gross implementation though. It took into account angles and locations of pixels to infer image distances. Instead of assuming a hyperspace with orthogonal axes, where each axis corresponds to one image position (x,y), we took into consideration their spatial relationships. Think of Euclidean distance in the image (between pixels) rather than the space images get embedded in.

Image Distances Greedy Approach - Ctd.

Shown in the next slides are:
- Binary(*) images which show, for a given level of grey, positions where that greylevel is believed to lie in. This is based on analysis in the horizontal and vertical axes, as well as the diagonals (8 cases in total, not 4).

(*) not fuzzy at the moment, but in principle, this would be easy to extend.

Image Distances Greedy Approach - Ctd.

Shown in the next slides are also:
- Image that shows, for every given pixel, its distance from a given grey level (in this case 20). Efficiency is poor at the moment, so it takes over 15 minutes to generate one such image. We need compute 255x1037 such images, but we haven't got 66,108 hours of computer power to spare (let us not think about 3-D, yet). I'll work on efficiency as we are currently on complexity O(N⁴), just because I wanted to get previews quickly.

It will be very intersting to see how this compares with the shuffle distance. There are some free parameters to tweak, such as how to treat infinite distances. I haven't got around to thinking about it yet.