At this stage we are able to form many kinds of benchmarks (ROC curves) on some of the data sets. We already get some numbers, but to get good numbers and organise them in ROC curves we need to finalise the protocols of dividing up the sets. In order to get ROC curves to be tested ASAP only X data was used, which is not terribly useful as X contains little signal in general (low entropy, too). All the preliminary results will therefore be more like a proof of concept.
The careful arrangement of sets will be necessary to ensure that many
tests without too much overlap or repetition can be enrolled and used
as our standard. The set of 86 distinct
people should be partitioned sensibly. In order to test this and show
the results are reasonable for a test set of just 13 faces, Figure
displays the ROC curves acquired based
on mean of differences (one of several similarity measures). We will
need better ones, preferably with comparators too (overlaid curves
for human judgment).
It is usually worth checking if other groups or even people who work in the same lab have pre-partitioned and classified data (as per individuals). That would save the researcher the hassle of doing it manually. Just picking out expression took nearly 5 hours. The larger the dataset, the smoother the ROC curves will be, obviously.
Classification by hand takes a while, but it is crucial for results. The work done for NIST should have it for the training. The training partition contains nothing with expression variation, however, so we classify the image already isolated for the task of expression removal, comparing an approach that does not annual expression against a similar one that does.
Basic experiments were soon followed. In this very preliminary test
we are dealing with a rather difficult set, using different acquisition
conditions and different expressions from many people. We focus on
dealing with just rigid registration (GIP latest ICP implementation)
and simple metrics. ROC curves are plotted in accordance with the
data gathered from 50 examples (see Figure ).
Next, we intend to improve the results with more cunning registration, annulment of facial expressions (e.g. the EDM approach), and most importantly improved algorithms for masking and aligning image parts, then measuring more meaningful properties in them.
![]()
|
With a much larger sample set which includes all the neutral-to-non-neutral
pairings I ran the same experiment, this time using an older ICP,
which uses PCA, to plot the ROC curves (see Figure ).
ICP is only used for translation in this case. There is plenty of
room for improvement and it should not be hard to get that improvement
shortly. This has been an exercise in just testing the foundations
of the framework, which now streamlines a lot better.
![]()
|
In Figure is the same type of curve for a
method which was made more robust to noise and sensitive to differences.
![]()
|
Comparisons have so far involved just the X/horizontal axis data (see
Figure for X, Y, and Z data overlaid),
which was not particularly useful for telling people apart. It was
intended to test and explore some new code. A median-based method
with squared differences taken into account is now put in place and
it uses actual depth (Z alone used as signal/data) to perform tests
on neutral and non-neutral images, as before. The results are, as
expected, far better than before. Figure
shows the 5 first matches that are correct and Figure
shows the first 12 that are not correct (belonging to different people).
Figure
shows the classification of those
17 images, which are simply the first ones in the test set (no selection
bias). The small scale of this experiment is intended to help track,
on an image-by-image basis, what it going on. Larger experiments will
follow.
Next, model-based approached will be incorporated and then benchmarked against others, notably counterparts that do not take advantage of statistical expression annulment.
Figure shows the same ROC curve extended
to account for a lot more image pairs (for which there is no accompanying
matrix representing the contribution of each, as before). Comparative
curves should be trivial to produce.
Roy Schestowitz 2012-01-08