Recognition Results

A couple more overnight experiments were run with the penalty term elevated somewhat, so as to better account for cases of mismatch. This improved the recognition results, as had been hoped right from the start.

There are some technical issues associated with increasing the number of vertices because this either surpasses memory caps in MATLAB (large matrices) or it gets stuck with no debugging information, requiring a restart each time. It is reasonable to assume that a lot of useful information gets lost due to this sampling limit and the smoothing, which in some sense does aid performance, does not always help so much, either. There are inherent advantages and disadvantages to this approach and the experiments help recognise them, as well as assess the performance attainable taking all the drawbacks into account. It is generally understood, for instance, that either PCA or GMDS rely on calculating everything on large matrices, which only ever subsample the original data. They can only be as accurate as the quantisation, unless of course more sophisticated approaches are devised.

The next experiment will continue to add improvements that, based on empirical evidence, ought to entail further improvements. For 50% of the data (where there is greater certainty) it is possibly to classify correctly at a rate of about 99%. Examples from that other 1% or so can help show what remains to be 'hacked' around in a way that generalises to the entire dataset.

It is worth noting how large the data base you experimented with to get to these figures actually is. The data pool is one of 1000 images and the aforementioned experiments used about half of those, not in any particular order. It is possible to add more, but this will require further manual work.

The bad 50% of the data on which we do not get 99% is simply undecided on. Taking every single pair, including those where a decision is somewhere at the boundary, gives about 93% recognition rate (in the last experiment). The goal now is to further improve that so that rather than make the classification inconclusive (then pass) there will be a correct classification returned almost every time. In order to understand the effect of resolution on performance, systematic experiment will be run and the results then plotted overlaid for comparison. One serious limitation right now is that the surfaces are shrunk by about a factor of 5 along each dimension, i.e. 25 times for XY.

Roy Schestowitz 2012-01-08