Multiple Initialisations or Multi-Scale Approach

GMDS is a non-convex problem, and indeed it would give the right solution only if we start at the basin of attraction of the global optimum. Multi-resolution and any other good initialisation should get us there. We then wondered. Have there been any attempts to apply an approach which attempts multiple initialisations and then selects the best match among those? It means something along the lines of simulated annealing? Could be interesting and some people did try something related for symmetry detection (i.e. used GMDS to refine several initial conditions determined by feature points). See Can Raviv's work as well as Anastasia Dubrovina's main efforts in her masters (again linking feature matching to GMDS relaxation). But it was not explored much beyond that.

We could plot the accumulated stress per point so that we see where the mapping fails. One might still suspect the cheeks are to blame. We have been looking at different segments of the face and how dealing with each of these in isolation leads to similar problems in cases where GMDS struggles. A look at the stress map reveals nothing too unusual and almost the entire time cheeks are included as well (only the earliest experiments excluded them because these were the default settings).

After further tests on several smaller areas of the face (assuming that a piecewise approach of GMDS-PCA with LDA and maybe fiducials for different distinct features might work), it seems like the issue is fundamentally related to the way correspondences are found, or in about 5-10% of the cases simply not found. The success of recognition in all 3 databases is hinged on the ability to always do this correctly.

A group of small-scale experiments were run with the aim of investigating how selecting different regions of the face would affect the success of GMDS. A systematic scale increase of about 10% at a time (in terms of the relative size of the region in question) was used to show, although not on a statistical basis, that for very small regions with very uninteresting structure the variation is too large, an order or magnitude apart sometimes, which made GMDS inadequate for the task. Thus, a an investigation of the problem domain at macro-scale was undertaken. There was also some experimentation with binary masks and smoothing, under the presumption that if more date and less pertinent details are available, then performance will improve. Part of the problem is still be explored. The stress maps show no evident problem around the cheeks, which - even when included as shown in the attachment - are not exclusively where extreme stress is found.

In the chart, stress score for the pairs is (from left to right): 3.4866, 2.4497, 2.4718, 10.9726, and 173.6779 (see Figure $[*]$ ). The images were flipped upside down where the correspondence found was asymmetric (left being right and vice versa). Since all are pairings belonging to the same person, it is expected GMDS that should succeed most of the time, rather than in about 90% of the cases. The main pitfall here is that there is no guarantee that correct correspondences will be found; the option which seems reasonable is reseeding the stochastic process and retrying, although that would be wasteful and quite computationally expensive. Perhaps some fiducial point can instead be used to improve the initialisations.

**Figure:** Pairs of facial expressions from the same person, cut in half beneath the nose and tilted sideways, then shown with GMDS applied. The score (from left to right): 3.4866, 2.4497, 2.4718, 10.9726, and 173.6779

It also depends which kind of smoothing gets used, if any. Note that it should be intrinsic. I.e. convolution with a Gaussian which is coordinates dependent (w.r.t. to plane) would not do the trick for between-expressions matching.

Smoothing was used at high levels of granularity when handling expression files that were GIP-formatted and exceptionally noisy. Without a fair deal of smoothing or averaging (or median) the data was hard to work with as any raw image was too noisy. This was part of an image sequence taken in the form of video. For FRGC data, smoothing was a lot more limited and most of the time it was not necessary because the images were generally of good quality. The same goes for the data from Texas, which probably offers the best image quality. Smoothing was only applied to it experimentally, although based on intuition it was not necessary. As smoothing was applied to range images, it was perpendicular to a fixed plane rather than tangential w.r.t. the shape's surface. It is abundantly clear that smoothing along the surface and not along the viewing angle would be the defensible filter to apply. In case smoothing is revisited, it will be implemented properly, but as it stands - given the high fidelity of images - smoothing only degrades signal and removes no apparent noise. It also does not appear to impact performance for the better, although more systematic experiments would be needed to validate this type of contention. These are probably not the right experiments to pursue at this stage.

It is obvious that the smaller the region the less discriminative it would be. This is why one could be hoping that at the end of the day we will add as much region as possible. Having spent at least an hour or two viewing the results of ICP-aligned, full-face GMDS, it seemed evident that the same problems persist. This was after a set of systematic debugging/scoring round on two desktops and two 8-core servers. At first it appeared to be achieving perfect detection, but later on some false negatives and false positives could be spotted. This was due to GMDS failing to find the correct correspondences. In these experiments, areas around the hair, neck and ears had been culled out. The way this was implemented did not make a geodesic criterion for culling though. GMDS discrimination per scale would require that underlying issues in GMDS are first resolved. Using geodesic circles we can make the surfaces more consistent. We could slice out a union of geodesic circles about features rather than the mushroom on the projection plane? One might suspect wrong support could also lead to distortions. And the support should be intrinsic rather than projection plane dependent.

Also the graph I was referring to is a texture mapping of the integration over the distortion between a point and the rest of the points. It would give you an indications of where the largest distortion happens...

looking again at the examples you sent, the support is a problem in the last example. Wrong support leads to wrong alignment.

We shall work on each of the suggested areas of improvement in turn. The correct fix seems to be within arm's reach and the addition of a new menu option for geodesic circles sure seems necessary. The results at the moment are not as negative as that last illustrative set of images may suggest; this was supposed to accentuate the differences between good matches (single digit) and those which are 1 order of magnitude or even 2 above the rest.

As means of showing that GMDS on the full face does not work reliably, shown are a bunch of results (including stress maps with overall scores) obtained from pairs of the same person. Noteworthy are the observations where left and right need to be flipped and in one case, the one where the score is extremely high, the matching is found to be upside down (chin matching forehead). See Figure $[*]$ and Figure $[*]$ . Although the author's experience with GMDS is somewhat limited, it does not appear as though carving out a on a geodetically-defined boundary will magically resolve the issue, so we will explore some other tricks until the missing code can be fetched (or alternatively be re-implemented).

**Figure:** Some of the raw images (full face) after GMDS with 50 Voronoi cells displayed

**Figure:** A top-down view showing the matching and the corresponding score. with flipping manually corrected. The third example from the top got the topology completely upside down.

The images show how stress is distributed over the shapes/surfaces. But stress mapping is something else. For each point on the surface we plot a gray level or red level that is equal to the sum of its distortion from the rest of the points. The total stress of each point on the surface, $p_{i}$ , w.r.t. to all points in

. If one views the matrix of geodesic distances between each point to another, then the desired outcome would be a figure, e.g. with Voronoi tessellation, where for each point we know how much distortion (or total distance) it has. This would help debug GMDS as applied to faces, but such functionality might not exist yet. To clarify, we have two shapes, X and Y, with sets of N points on them, { $x_{i}$ } and { $y_{i}$ } accordingly. Say these points were found by GMDS, that is pairs ( $x_{i}$ , $y_{i}$ ) are corresponding points. Now, one can take these points on one shape, say $x_{i}$ on X, and use Fast Marching to find distances from them to the rest of the points on X. The same you can do for Y. Basically, all that's required is that for each such point, { $x_{i}$ }, the total of all those distances from { $x_{i}$ } to all points on X (or Y) should somehow be shown graphically. I wonder if the visualisation tools for GMDS - not necessarily those with Voronoi cells - can be used to show us the stress per point, { $x_{i}$ }. What we try to achieve is, basically, something that tells us where the fitting between two surfaces (faces) fails. At the moment we can plot the matrix of distances between disparate points, but we do not have a way of visualising where each point number exists on the actual surface. We hope to find a simple way to find or visualise which point shown in surf/mesh/plot3 corresponds to point

. This is needed for debugging or tweaking.

The next step will properly look at areas of difficulties. In the mean time, we will explore the effect of the algorithms on some synthetic data, at the very least in order to improve our understanding of how and why things fail. A blob that varies in terms of its size, location, and aspect ratio will be used very briefly as a form of validation data, a fourth dataset even. Later on it might help reason about the validity of this approach, showing its ability to recover the correct solution and the ground-truth correspondence.

Synthesis is shown in Figure $[*]$ . The short set of experiments used data consisting of three types of consistent patterns with different scales (width), location in Y and rotational orientation (small variation). This was designed to test using well-behaved data how GMDS copes. The results are shown in increasing order of difficulty and they seem to suggest that the appearance of new features around edges causes confusion, so a stress analysis and geodetically-defined cutoff will be the next logical step.

**Figure:** 5 examples of experiments with synthetic data, where the top part shows the pair of images in their classic form, the middle shows a top-down view, and the bottom part is the range image

It may not be understand what is going on in the examples. The synthetic data should ideally be richer, i.e. the surface should contain effective Gaussian curvature, else the mapping is ambiguous.

The examples in Figure $[*]$ show the general results one gets when running the algorithm on true pairs (same person) and false pairs (different people). There is one case (shown by the score) where there is a misdetection, meaning that for a correct/real pair the correct correspondence is not being found. By looking at a geodesic slicing mechanism (ring/circle) and inspecting the spatial distribution of stress we can hopefully overcome this issue for all datasets at once, hitting several birds with one stone. The value of the method is very much hinged on our ability to resolve this ambiguity, so it is worth the extra effort and continued research. Having looked at other possible selections of a sub-surface, they don't seem to offer a noticeable advantage, so for the time being the top half of the face will be used, centred around the nose.

**Figure:** The scores in black show the pairings between different people and in green are the scores of matches between the same person

We've finished implementing a variant of the Voronoi code that displays stress as colours, where the brighter the colour, the greater the stress (see Figure $[*]$ ). As rightly argued at the start, if cheeks are culled out, then the stress around them is very high.

Wort noting is the inability to access more GMDS code (we use the TOSCA demo code only).

**Figure:** A new visualisation form where the dots signify stress at the given point

The colours of dots have been improved so as to take full advantage of the whole range of greyscale levels and shown as an example in Figure $[*]$ is the difference - stress-wide - between inclusion and exclusion of the cheeks.

**Figure:** Top: A mapping of GMDS stress when cheeks are included in match-finding. Bottom: same as above but cheeks excluded. One can assume dark means low stress and white is high stress.

The next thing to implement is a geodesic mask. Geodesic masks are indeed necessary. This would be sensitive to the source location and, still, with partial matching it could do the job. While trying to avert some mysterious problems with unknown program crashes (hardware exception, not consistently reproducible) the results are approaching what needs to be achieved (several features with union of geodesic circles around them).

**Figure:** Original images, erroneous cropping effects (still in the process of debugging) and retriangulation of the points after omission.

The next step concentrates on finding an adequate cropping methodology which preserves surface area based on the anatomy and not Euclidean measures (the former is invariant, whereas the latter changes with expression and pose). Having found out that Fast Marching was causing the crashes as deprecated interfaces had been used, this step ought to be a simpler one, but for qualitative results it might be necessary to run many different experiments like those tested for Figure $[*]$ and Figure $[*]$ . Previously it was found out that just taking the top half of the face and removing clutter along the sides worked better than taking the entire face, which sometimes led to flipping (matching upside-down). Taking a smaller portion than the 'half face' led to lack of signal (lacking gradient).

**Figure:** A toy example of a very small couple of surfaces cropped from the centre of a face of the same person, where the pairs shown correspond to top-down view and GMDS' results