PDF version of this entire document

Viola-Jones

A later paper from the same group [15] deals with a similar problem except the expressions and it analysis ears rather than faces (frontal). This strand of work demonstrates impressive results, but these are comparable to prior work from other groups, which show similarly good results in the region of 99% and above. The claim being made is that only one image is mis-detected (shown in table), leading to the sub-100% figures. When the gap is reaching such minuscule value, it becomes a discriminant which cannot quite distinguish between those where winning is hinged on one single image. For examples, where there is occlusion by hair, performance drops to about 50%. Robustness of such methods varies based on the assumptions that they make (e.g. expectation of structural completeness).

The method is dependent upon similar algorithms which were used for face detection. The ear does not require a resolution as high though. It is managing to detect ears within about 6 milliseconds and sometimes enabling real-time detection at a frame rate high enough for video sequencing. Performance depends considerably on the size of a given dataset, either because of galley size or the complexity of the set, whose scale affects recognition rates too (there is down-sampling). Training took days on a cluster of about 30 PCs, so this performance has a hidden toll.

The authors are using templates and rectangular coarser frames, with Haar features and AdaBoost (Freund and Schapire). We already this in the program for the purpose of nose-finding, even though the potential of this is not being explored further at this stage (it does train on and detects faces under different conditions, but it needs transformation to 2-D).

Roy Schestowitz 2012-01-08