Exploratory GMDS Integration

Code was customised and integrated into the main framework with the aim of putting it in a dimensionality reduction algorithm of another type, alongside signal of nature other than geometric (and geometry-invariant). If done improperly or applied to faces of different people (as the figures below show), it can be demonstrably shown that the resultant correspondence is rather poor. The data dealt with in this case is illustrated in Figure $[*]$ . Figure $[*]$ shows this with

and Figure $[*]$ shows the same for

. Conversely, as seen in Figure $[*]$ , even with

the found correspondence is considerably better when handling images acquired of the same person.

**Figure:** Transformation from 3-D face (left) to a subset of rigid parts and then GMDS handling of the underlying surface (right)

**Figure:** Nose and eye regions from different people (FRGC 2.0) as treated by GMDS ()

**Figure:** Nose and eye regions from different people (FRGC 2.0) as treated by GMDS when

**Figure:** Nose and eye regions of the same person (FRGC 2.0) as treated by GMDS

**Figure:** The first pair in the set of real matches (same person in different poses)

**Figure:** An example of a problematic pair with a false signal spike (left)

By resolving issues associated with fatal exceptions in the pipeline it should be trivial to utilise the generalised MDS, which by far simplifies experiments performed with MDS (still part of the program, at least as an option to be explored or compared to later).

Further debugging has facilitated a rather reliable algorithm that is able to assemble GMDS-related metrics (not strictly a metric per se) from a large group of images, with or without smoothing and some other parameters that help make the process more robust (e.g. in case of misalignment). While it is possible to derive a similarity measure from raw values without a training process (involving a model), for localised information to bear meaning there ought to be a template or a more high-level abstraction/model that deforms itself to targets or specifies a quality of match. The order of points needs to be consistent with the anatomy and also consistent across examples however, otherwise no consistent markup can be worked on and the discriminant is accordingly weak. Examples of matching between dissimilar faces from different people can be seen in figures $[*]$ , $[*]$ , $[*]$ , and $[*]$ .

**Figure:** A view of the program's front end (framework wrapper)

**Figure:** A view of the handling of image pairs and their comparison using GMDS

**Figure:** A simple visualisation of the algorithm's processing of images, by numbers

**Figure:** The correspondence problem in GMDS and an abstraction of the data by consideration of a top-down representation

Using GMDS, the recognition performance reached at this stage is around 90% (see Figure $[*]$ ), but there are many improvements left to be made, either in pre-processing or in the suiting of GMDS to the task at hand. The main barrier was removal of some bugs relating to triangulation, as summarised in very few words in Figure $[*]$ , which does not delve into pertinent details as it is uninteresting.

**Figure:** Performance tests on very basic GMDS algorithm applied to rigid face parts

What GMDS does right now is basic and is not yet incorporated with (G)PCA, which would require consistent ordering of points. This is just a set of baseline results to serve as a sanity check.

**Figure:** Examples of some of the bugs encountered and overcome while working on GMDS implementation for faces

Regarding (G)MDS versus (G)PCA, it would be reasonable to say that the right mix should probably be some hybrid, where some sort of GMDS is used for alignment (as we do right now) and then PCA for efficient recognition. We are not so sure yet where the line between the two should be, but it is obvious that the truth is there. Figure $[*]$ shows the results from a still-buggy algorithm.

**Figure:** Set of results for 10x10 grid sampling (GMDS)

We changed sampling density, changing it from 10x10 to 5x5 grids. Preliminary results on 30 images are as follows:

'Predictivity' of negative test (probability that a subject is identical when it is not): 92.9%

As work continues on refinement, it may be possible to find new ways of further improving the sampling, e.g. by selecting particular features.

By disabling ICP we can possibly justify the use as GMDS as its replacement, essentially by taking a template image and performing GMDS on it wrt to each image of the current pair. However, ICP should get us a good initialisation for the GMDS phase.

By shrinking the data sampling rate further the recognition performance is further improved to the point where the ROC curve reaches 95%.

Following some further low-level refinements, there is considerably less attention paid to minor details around shady areas formerly occupied with voids/holes (a bug with a MATLAB toolbox was also found but not reported after it had wasted hours in vain). This was the result of tedious debugging and tweaking by observation.

This leads to very good detection rates, however nose detection is still short of perfect and provided this can be overcome ~99% of the time¹⁹, matching can exceed 95% detection rate. The FRVT FRGC documents on the Web²⁰ provide a more formal set of steps to follow, but until the pre-processing stages can be coupled to form a robust enough process, there is no point to adding PCA variants to the pipeline and then performing benchmarks. The pieces are already in place, but it is the failure to accurately and consistently carve out faces (despite hair occlusion) that merits increased attention and effort. In the latest small test involving 30 correct pairs (same person) and 30 incorrect pairs, the only misdetections were due to arbitrary face parts being assumed to be nose, incorrectly. The reasons vary and solution has been found and implemented many times before, encouraging reuse now rather than a reinvention of the wheel. See figures $[*]$ and $[*]$ .

**Figure:** Early performance measures for GMDS more properly done

**Figure:** Larger scale examples of early performance measures

Following some preliminary overnight experiments, it is possible to show the practicality of a PCA-GMDS hybrid framework, wherein the values on which dimensionality reduction is invoked are the geodesic distances between salient points. The idea is, by studying the variation of distances between analogous facial landmarks - almost as though there are strings between every pair - one can know which ones are expected to vary not across people but within them (intra-person/intrinsic), in which case these variations are very much expected and predictable. The model which is built only from correct pairs (8 pairs in an initial toy example, 76 in the coming tests) is supposed to penalise for variation in areas of the face that do not exhibit much variation in the training phase. Results are shown in figures $[*]$ and $[*]$ .

**Figure:** Results from poor PCA model, obtained using GMDS

**Figure:** Model modes distribution, corresponding to Figure $[*]$

The subsequent steps delved into ways of improving the data and its preparation for classification for an accurate determination of match/no match status. While in principle the method works quite reliably, a lot of room remains both for improvement in the ordering of points and in the quality of the pre-processing, as most of the false positives and false negatives are a result of the latter. Additionally, removal or conversely proper sampling of points around the cheeks should be considered.

The charts in Figure $[*]$ show the distribution of mode weights based on the building of two models, one of 10 people (around 80 pairs), and one of 76 people (around 400 pairs).

**Figure:** Model modes distributions (of 10 people and 76 people), built with the proper weight, albeit with very heavy and sometimes excessive smoothing

We then prepared a short report for a decision to be made regarding how long we give this face recognition project, which could otherwise be morphed to measure distances on a surface where corresponding points can less effectively be identified, e.g. anatomical parts inside the body where there is no easily identified part such as the nose, mouth, and eyes, let alone any photometric data to take advantage of. The strength of GMDS is that it autonomously finds points that are otherwise difficult for humans to mark up.

How the current results compare to the scores reported in FRGC FRVT etc. is still an important question and we can we combine mine with Bar's code for improved performance based on prior work. We can work effectively from a distance because there are fewer distractions. In general, the bottleneck is pace of work (about 2 hours per day), but the intervals allow for more results to be processed and delivered in-between. Since a lot of the work is done on computational servers anyway, locality has access to informed people as its main advantage. The weakness of work for long periods of time is that time taken for results to arrive must be dedicated to observation or further coding, which would still depend on the observation of results that had not arrived.

There have been no known attempts to apply GMDS methodology for diagnosis based on deformable atlases (training from patients with atrophies compared to normals). Half a decade ago, Davies, Cootes, and Taylor used reparameterisation on the sphere (Cauchy kernels) in order to classify the 3-D shape (surface, not volumetric) of the hippocampus with the aim is diagnosing disease characteristics of this interesting structure (with known correlation to illnesses), based upon fully automatic training from datasets we may have access to. The work done by Aflalo et al. is reminiscent the above, at least from an analytical angle.

The first to use conformal maps for computational anatomy is probably Eric Schwartz in the 80s. The more recent examples that immediately crop up come from ``http://picsl.upenn.edu/caph08/MICCAI 2008 WORKSHOP ON THE COMPUTATIONAL ANATOMY AND PHYSIOLOGY OF THE HIPPOCAMPUS''. Xie et al. [37], for instance, use shape analysis for Alzheimer's Disease detection.

A. Elad used MDS to map surfaces to spheres. It was around 2002 as far as I recall, but it was definitely not conformal. The mapping to the sphere in Davies' case (his work is still ongoing, but he too only spends only about 50 hours per week on research) is one that warps correspondences onto a sphere (or circle, at least in 2-D) and then applies particular functions to space up the correspondences and make reasonable candidates over which to optimise a groups shape concurrently []. The overall goal is to automatically identify and choose points that represent shapes. My own work extended these ideas to full intensity (texture), seeking points that take both grey-level and spatial values into account at the same time (using a combined shape and appearance mode, or AAM). I published papers on the subject over half a decade ago.

If it is true, as claimed by several people whom we spoke to, that face recognition is best handled by carving out few features that never vary in their relative geometry, then GMDS seems a little unnatural as the only absolute points on which to measure distances are easy to identify either by hand or by template (colour can help too). The continuous mapping that depends not on interpolation but on surface characteristics like curvature or distances on surface may be inadequate (an overkill) unless only few fiducial points whose location can be determined accurately get used. This point is worth getting across when GMDS is criticised for utility in face analysis, wherein simpler algorithms can outdo it.

One would completely agree with the observation about GMDS if indeed faces had been rigid. They are not. This is especially valid if you take the face as a whole and just crop out the mouth. Still, cropping only the upper mushroom part and considering close to neutral expressions, then, ICP alone could be enough. one would guess that GMDS could enhance it by a small notch, but this may be wrong.

ICP appears to be essential for improved initialisation of GMDS. It is important to be clear about whether we wish to model/sample entire faces with GMDS or not. The common facial expressions can lead to degradation in the results, but then again, with PCA these ought to be weighted accordingly, e.g. with the expectation of large variation (an already-seen variation, owing to the training set) in particular regions, whereas other regions remain stable, i.e. distances within those regions hardly vary or alternatively vary only along particular dimensions (in hyperspace of $M^{2}$ dimensions, where

is the number of points, not in 3-D). I will prepare an experiment which broadens the scope to entire faces. It oughtn't yield good results (on a comparable scale), but at least from an academic/scholarly perspective it ought to validate the inclusion and contrariwise exclusion of particular parts, e.g. those that accommodate mustaches and caused detection problems in previously-run large-scale experiments. Likewise, a Euclidean versus geodesic benchmark (Gaussian fitting for instance) can be produced to provide validation, similarly to the preparatory work from the 2006 BBK paper in IEEE TPAMI. If it can be proven - empirically - that geodesic distances always trump Euclidean equivalents, then at least in the case of 3-D it can be argued that all those leading algorithms (claiming 99.9% accuracy) can be further improved with FMM. Bar Shalem's work partly applied some of the same principles but fell short performance-wise. It is therefore unclear what paths should and should not be explored. By applying GMDS with just 5 points (classically the eye corners and the nose) we might be able to attain good performance but also merely replicate previous attempts by Bar Shalem, thus studying too little. This is why, upon the inquiry about code fusion, I remained a tad reluctant. To what extent, for example, were the algorithms tested and then refined? Was the newer version of FRGC tested on as well? Since we have got access to code from BBK papers on face recognition (2005), which route would be better explored? How many parts are merely reimplemented. Anastasia has argued that GMDS, as a black box, has not really changed since 2009, so the other building blocks are probably the only candidates for swapping.

Our job is to prove or disprove this issue which involves feasibility. Starting point should be state of the art ROC curves. This is hopefully a reachable goal. If state of the art is now an error of 1 in a thousand or thereabouts, then it seems like a monumental task.

ICP could be interpreted as a Gromov-Hausdorff distance when the inter-points distance is Euclidean and points are allowed to move in 3D. It would be interesting if coordinate-wise descent could work as well as ICP (one may doubt it, though using multi-grid it could actually work). So, GMDS could in-fact be used like ICP. Therein lies a possible micro-study which compares the R and T matrices that our 4 (currently) ICP methods output, perhaps rationalising the use of GMDS for alignment. Alternatively, it ought to be possible to compare recognition results with and without ICP as a peripheral/separate part from GMDS.

Regarding the comment about existing GMDS implementation and its age, it is likely that Carmi Grushko introduced some changes to the GMDS, and in fact he is currently working on further refinements (of the geodesic distance computation).

A different measure to try is using diffusion distances rather than Euclidean or geodesic. One could also consider diffusion on the surface, diffusion inside the surface, as well as geodesics in the interior of the face, etc. One distance should provide the best discriminative power among all possible ones. We must check it.

The current experiment deals with the performance reached by adding and removing parts of the face, using binary masks that make very basic sense. In all cases, depth values from X and Y (averaged over each grid) are used to scale the binary mark, such that consistent cropping is assured regardless of distance from the camera's aperture. This is one of the crucial areas of improvement, one of about 6 areas that need further improvement.

It is agreeable that 1/1000 is a challenging goal, but one may strongly feel we could get there, and then just play with building blocks to check which metric gives the best results. The hunch is that geodesics should play a leading role there. Either as dense or sparse matching of surfaces.

How would geodesics deal with eye sockets? The problem is, with the eyes being filled the signal is too noisy and without any filling there is a difference in distance/s which depends on how open the eye is. Euclidean distances do not suffer from this apparent drawback. One solution devised so far is almost excessive smoothing, whereby just the very basic geometry is preserved and a lot of the rest vanished out of signal. The fine details are unlikely to be present in different acquisition sites/times.