GMDS is a non-convex problem, and indeed it would give the right solution only if we start at the basin of attraction of the global optimum. Multi-resolution and any other good initialisation should get us there. We then wondered. Have there been any attempts to apply an approach which attempts multiple initialisations and then selects the best match among those? It means something along the lines of simulated annealing? Could be interesting and some people did try something related for symmetry detection (i.e. used GMDS to refine several initial conditions determined by feature points). See Can Raviv's work as well as Anastasia Dubrovina's main efforts in her masters (again linking feature matching to GMDS relaxation). But it was not explored much beyond that.
We could plot the accumulated stress per point so that we see where the mapping fails. One might still suspect the cheeks are to blame. We have been looking at different segments of the face and how dealing with each of these in isolation leads to similar problems in cases where GMDS struggles. A look at the stress map reveals nothing too unusual and almost the entire time cheeks are included as well (only the earliest experiments excluded them because these were the default settings).
After further tests on several smaller areas of the face (assuming that a piecewise approach of GMDS-PCA with LDA and maybe fiducials for different distinct features might work), it seems like the issue is fundamentally related to the way correspondences are found, or in about 5-10% of the cases simply not found. The success of recognition in all 3 databases is hinged on the ability to always do this correctly.
A group of small-scale experiments were run with the aim of investigating how selecting different regions of the face would affect the success of GMDS. A systematic scale increase of about 10% at a time (in terms of the relative size of the region in question) was used to show, although not on a statistical basis, that for very small regions with very uninteresting structure the variation is too large, an order or magnitude apart sometimes, which made GMDS inadequate for the task. Thus, a an investigation of the problem domain at macro-scale was undertaken. There was also some experimentation with binary masks and smoothing, under the presumption that if more date and less pertinent details are available, then performance will improve. Part of the problem is still be explored. The stress maps show no evident problem around the cheeks, which - even when included as shown in the attachment - are not exclusively where extreme stress is found.
In the chart, stress score for the pairs is (from left to right):
3.4866, 2.4497, 2.4718, 10.9726, and 173.6779 (see Figure
).
The images were flipped upside down where the correspondence found
was asymmetric (left being right and vice versa). Since all are pairings
belonging to the same person, it is expected GMDS that should succeed
most of the time, rather than in about 90% of the cases. The main
pitfall here is that there is no guarantee that correct correspondences
will be found; the option which seems reasonable is reseeding the
stochastic process and retrying, although that would be wasteful and
quite computationally expensive. Perhaps some fiducial point can instead
be used to improve the initialisations.
|
It also depends which kind of smoothing gets used, if any. Note that it should be intrinsic. I.e. convolution with a Gaussian which is coordinates dependent (w.r.t. to plane) would not do the trick for between-expressions matching.
Smoothing was used at high levels of granularity when handling expression files that were GIP-formatted and exceptionally noisy. Without a fair deal of smoothing or averaging (or median) the data was hard to work with as any raw image was too noisy. This was part of an image sequence taken in the form of video. For FRGC data, smoothing was a lot more limited and most of the time it was not necessary because the images were generally of good quality. The same goes for the data from Texas, which probably offers the best image quality. Smoothing was only applied to it experimentally, although based on intuition it was not necessary. As smoothing was applied to range images, it was perpendicular to a fixed plane rather than tangential w.r.t. the shape's surface. It is abundantly clear that smoothing along the surface and not along the viewing angle would be the defensible filter to apply. In case smoothing is revisited, it will be implemented properly, but as it stands - given the high fidelity of images - smoothing only degrades signal and removes no apparent noise. It also does not appear to impact performance for the better, although more systematic experiments would be needed to validate this type of contention. These are probably not the right experiments to pursue at this stage.
It is obvious that the smaller the region the less discriminative it would be. This is why one could be hoping that at the end of the day we will add as much region as possible. Having spent at least an hour or two viewing the results of ICP-aligned, full-face GMDS, it seemed evident that the same problems persist. This was after a set of systematic debugging/scoring round on two desktops and two 8-core servers. At first it appeared to be achieving perfect detection, but later on some false negatives and false positives could be spotted. This was due to GMDS failing to find the correct correspondences. In these experiments, areas around the hair, neck and ears had been culled out. The way this was implemented did not make a geodesic criterion for culling though. GMDS discrimination per scale would require that underlying issues in GMDS are first resolved. Using geodesic circles we can make the surfaces more consistent. We could slice out a union of geodesic circles about features rather than the mushroom on the projection plane? One might suspect wrong support could also lead to distortions. And the support should be intrinsic rather than projection plane dependent.
Also the graph I was referring to is a texture mapping of the integration over the distortion between a point and the rest of the points. It would give you an indications of where the largest distortion happens...
looking again at the examples you sent, the support is a problem in the last example. Wrong support leads to wrong alignment.
We shall work on each of the suggested areas of improvement in turn. The correct fix seems to be within arm's reach and the addition of a new menu option for geodesic circles sure seems necessary. The results at the moment are not as negative as that last illustrative set of images may suggest; this was supposed to accentuate the differences between good matches (single digit) and those which are 1 order of magnitude or even 2 above the rest.
As means of showing that GMDS on the full face does not work reliably,
shown are a bunch of results (including stress maps with overall scores)
obtained from pairs of the same person. Noteworthy are the observations
where left and right need to be flipped and in one case, the one where
the score is extremely high, the matching is found to be upside down
(chin matching forehead). See Figure
and Figure
. Although the author's experience with
GMDS is somewhat limited, it does not appear as though carving out
a on a geodetically-defined boundary will magically resolve the issue,
so we will explore some other tricks until the missing code can be
fetched (or alternatively be re-implemented).
|
The images show how stress is distributed over the shapes/surfaces.
But stress mapping is something else. For each point on the surface
we plot a gray level or red level that is equal to the sum of its
distortion from the rest of the points. The total stress of each point
on the surface,
, w.r.t. to all points in
. If one views
the matrix of geodesic distances between each point to another, then
the desired outcome would be a figure, e.g. with Voronoi tessellation,
where for each point we know how much distortion (or total distance)
it has. This would help debug GMDS as applied to faces, but such functionality
might not exist yet. To clarify, we have two shapes, X and Y, with
sets of N points on them, {
} and {
} accordingly.
Say these points were found by GMDS, that is pairs (
,
)
are corresponding points. Now, one can take these points on one shape,
say
on X, and use Fast Marching to find distances from them
to the rest of the points on X. The same you can do for Y. Basically,
all that's required is that for each such point, {
}, the
total of all those distances from {
} to all points on X
(or Y) should somehow be shown graphically. I wonder if the visualisation
tools for GMDS - not necessarily those with Voronoi cells - can
be used to show us the stress per point, {
}. What we try
to achieve is, basically, something that tells us where the fitting
between two surfaces (faces) fails. At the moment we can plot the
matrix of distances between disparate points, but we do not have a
way of visualising where each point number exists on the actual surface.
We hope to find a simple way to find or visualise which point shown
in surf/mesh/plot3 corresponds to point
. This is needed for
debugging or tweaking.
The next step will properly look at areas of difficulties. In the mean time, we will explore the effect of the algorithms on some synthetic data, at the very least in order to improve our understanding of how and why things fail. A blob that varies in terms of its size, location, and aspect ratio will be used very briefly as a form of validation data, a fourth dataset even. Later on it might help reason about the validity of this approach, showing its ability to recover the correct solution and the ground-truth correspondence.
Synthesis is shown in Figure
. The short set
of experiments used data consisting of three types of consistent patterns
with different scales (width), location in Y and rotational orientation
(small variation). This was designed to test using well-behaved data
how GMDS copes. The results are shown in increasing order of difficulty
and they seem to suggest that the appearance of new features around
edges causes confusion, so a stress analysis and geodetically-defined
cutoff will be the next logical step.
|
It may not be understand what is going on in the examples. The synthetic data should ideally be richer, i.e. the surface should contain effective Gaussian curvature, else the mapping is ambiguous.
The examples in Figure
show the general results
one gets when running the algorithm on true pairs
(same person) and false pairs (different
people). There is one case (shown by the score) where there is a misdetection,
meaning that for a correct/real pair the correct correspondence is
not being found. By looking at a geodesic slicing mechanism (ring/circle)
and inspecting the spatial distribution of stress we can hopefully
overcome this issue for all datasets at once, hitting several birds
with one stone. The value of the method is very much hinged on our
ability to resolve this ambiguity, so it is worth the extra effort
and continued research. Having looked at other possible selections
of a sub-surface, they don't seem to offer a noticeable advantage,
so for the time being the top half of the face will be used, centred
around the nose.
|
We've finished implementing a variant of the Voronoi code that displays
stress as colours, where the brighter the colour, the greater the
stress (see Figure
). As rightly argued
at the start, if cheeks are culled out, then the stress around them
is very high.
Wort noting is the inability to access more GMDS code (we use the TOSCA demo code only).
The colours of dots have been improved so as to take full advantage
of the whole range of greyscale levels and shown as an example in
Figure
is the difference - stress-wide
- between inclusion and exclusion of the cheeks.
|
The next thing to implement is a geodesic mask. Geodesic masks are indeed necessary. This would be sensitive to the source location and, still, with partial matching it could do the job. While trying to avert some mysterious problems with unknown program crashes (hardware exception, not consistently reproducible) the results are approaching what needs to be achieved (several features with union of geodesic circles around them).
In Figure
are some of the debugging
artefacts.
|
The next step concentrates on finding an adequate cropping methodology
which preserves surface area based on the anatomy and not Euclidean
measures (the former is invariant, whereas the latter changes with
expression and pose). Having found out that Fast Marching was causing
the crashes as deprecated interfaces had been used, this step ought
to be a simpler one, but for qualitative results it might be necessary
to run many different experiments like those tested for Figure
and Figure
. Previously it was found out
that just taking the top half of the face and removing clutter along
the sides worked better than taking the entire face, which sometimes
led to flipping (matching upside-down). Taking a smaller portion than
the 'half face' led to lack of signal (lacking gradient).
|
Roy Schestowitz 2012-01-08