Debugging ICP

It is worth clarifying that the better performance shown before was achieved by applying the algorithm to a different set which was too easy to deal with. Further improvements are still needed to avoid the rare occasions of mislocation of the face (edge cases) and also ICP stepping out of line. It is only in the interests of speed that we still deal with coarse images such as the one in Figure $[*]$ . It impedes performance improvements but makes tweaking/debugging considerably simpler, even if it's an interim phase.

**Figure:** An 8x8 separation between points in the image (shown from two angles), with downsampling done for debugging purposes

Whilst introducing various improvements, a side effect was the emergence of some bugs, an annoying one of which affects ICP and leads to some failures that are difficult to explain by regression.

I found some bugs, but identifying the main culprit is still an elusive task which affects all 4 families of ICP currently in use. Figures $[*]$ , $[*]$ , and $[*]$ can help some light on the debugging process.

**Figure:** A slice or subset of the data being used for ICP (on the left) and the masked face from which it is extracted (right)

**Figure:** Top: Two images taken from the same individual being compared when there is insufficient compensation for noise. Bottom: another set of such images but where smoothing is applied to reduce noise-imposed anomalies

**Figure:** On the left: the result of poor or buggy ICP (difference); on the right, an image is shown of the type of image we expect to have and also get when ICP performs well

Following further regressions, the bug which some previous changes had introduced along the way was found as per Figure $[*]$ and then removed (resolved by reverting back to correct code), leading to the sorts of image differences pre- and post-ICP that are seen in Figure $[*]$ , The new distribution is shown in Figure $[*]$ , but in order to start showing competitive performance many hours are being spent going through the thousands of images - including those which are easy to handle - and sorting them for intra-subject sets that are necessary for model training and later for easier assessment (we mostly death with difficult cases so far). Rather than train a model using just dozens of people with various facial expressions we can use many hundreds of them with and without expressions (mostly with none), then show high performance as before. This has required a massive time investment so far, but it is likely to pay off. Two universities which appear to have data of this kind had been contacted months ago, but this engagement was unfruitful. Organising the reminder of the images accurately can take many hours. It s also cumulative in the sense that faces already sorted can be merged into the newly-organised sets.

**Figure:** Difference between the first 4 images before and after ICP (rotation and translation), with two of the first reference images shown at the bottom just for a sense of what the images at the top are derived from

**Figure:** The effect of the bug demonstrated by showing misalignment on the X axis (and to a lesser degree in Y too).

**Figure:** The new distribution of modes following the bugfix

Preparation of many images for experiments with superior results was an important next step.

Not-so-considerable improvements are arrived at by taking a Spring Semester set and building an ICP-free model from it (not complete, about 250 pairs) with sampling separation of 8 so as to avoid running out of memory at the PCA stage (dense sampling makes the model unhelpfully vast). This was tested on a separate Spring Semester set of real pairs versus random pairs from NIST/FRGC, where the examples tested on are unseen, i.e. none got used to train the model (if some of the probes are used for training, performance comes near 98% because the model is familiar with the probe). With a lot more data at hand it should be possible to produce much smoother curves. There is still debugging and fine-tuning around ICP (as shown in Figure $[*]$ and Figure $[*]$ ) with 4 different implementations that give different results. Clearly these have a lot impact on the results provided they work correctly. In many of the experiments so far ICP rotation gets switched off. This enables the modeling of rotation although, ideally, we should try to remove head rotation also in the probes. To put it differently, the model already incorporates rotation as part of the variation, whereas aligning around the centre of the face can (and probably should) be assured.

What makes this while process enormously time consuming is the adequate division into sets, which makes the reference arbitrary and the process rather autonomous. The goal is not to cheat with statistics by biasing the results with a training set not belonging to the targets; it seems to be what some others are doing in order to prepare the matcher for particular observations. In any event, much bigger sets (with almost 1000 images to cycle through) are now generally available for the next experiments, which will compare ICP algorithms and yield results with less human intervention. The computational server had been under a lot less load recently, so getting results like those shown in figures $[*]$ and $[*]$ takes about 4 hours.

My flight arrives at Israel next month (booked now). Looking forwards to it! Cold and rainy here...

**Figure:** The effect of perturbing the points on ICP

**Figure:** The effect of noise on ICP studied by aligning images 1-5 at the top to images 6-10 at the bottom

**Figure:** The purely median-based performance on the Spring Semester set, without ICP

**Figure:** The purely model-based (determinant) performance on the Spring Semester set, without ICP