PDF version of this entire document

ROC Curves

We shall be looking at ways of designing experiments whose final step helps in producing error charts and then ROC curves. The comparative results will hopefully demonstrate an advantage when GMDS is used. The plan is to produce initial ROCs/EERs that support Mian's results and we will try to reproduce their experiments. Failing that, we may try to contact the group for help or for missing data. It is defensible to suspect that things are more complicated than they seem on the surface (pun intended).

The types of experiments that would make sense to run are:

  1. Neutral to neutral identification comparison (easiest case)
  2. Training set versus neutral (including non-neutrals)
  3. Everything available versus neutral
  4. Arbitrary non-neutrals versus neutrals (hardest case)
In order to get the results of the expressions comparators in low-dimensional space experiments need to be designed such that they include everything we have compared to all of the available expressions (i.e. use all training including expressions) as a final step. Alternatively, we could generalise expressions from neutral to all. The above 4 steps should come first anyhow.

What one hopes to show is that by aligning a fully-correspondent set and then creating an expression deformation model, one gets particular achievable results (with GIP dataset) and with GMDS one can attain better results, perhaps even under difficult conditions which one method is more resistant to than its counterpart. There seems to be bias (by data-fitting) in the original paper and by reproducing some of the experiments we can hopefully show that the opposite of what was claimed there is true.

In order to compare ROC curves (corresponding to what Mian's group had attained) it seemed reasonable to construct similar tests. There are still missing bits of code; for example, visualising the resultant eigenvectors, getting proper alignment all the time (very crucial), reliable and consistent cropping (a black art of trial and error), and most importantly comparators of model fit to examples. The problem is, rushing towards getting results - any results - without improving dependent parts won't serve towards getting something which is workable. If the best one can do is show a reproduced algorithm performing at rate of - say - 70% rather than 90% detection rate, then no strong claims can be made about one being inferior. In these cases you have two option; 1) the said paper was cheating or 2) those who try to mimic their counterpart's algorithm intentionally implement it poorly so as to get the desired performance gap. Suffice to say, it is harder to cheat when there are standard tests one must conform to, but in any event, the plan is to show a similar framework in an apple-to-apple comparison, where basically both methods are implemented the same way with the only distinguishable difference being GPCA [38,20] and GMDS swapped15. Then, rather than adhere to comparing methods on an absolute basis a relative comparison can be made, with a paper demonstrating that an MDS- or GMDS-based approach works better than PCA for human faces and more specifically anatomy of expression.

We finished building a GIP EDM and expected to see what experiments based on model criteria can be run now. It will be interesting to get a sense for TP/FP rates and then beat that by changing similarity measures and cleaning up the data a little further (automatically, not manually). Ron said, ``Regarding the 77% figure they are right, but this was a very pessimistic test aimed as proof of concept rather than a face recognition tailored one. There was no much intelligent pruning, no alignment, no special treatment of missing parts, etc. I am in fact surprised we got the 77% altogether.

``Moving from PCA/MDS to GPCA/GMDS would introduce (I expect) some flexibility to the modeling that would enable capturing small variations about the given expressions and thereby better recognize the identity of a person...``

Roy Schestowitz 2012-01-08