Scale Issues

As for the limitation of number of vertices - one can safely increase the limit in the program, but only to some degree. When it's too much, FMM crashes for some reason. The C++ version version allocates a /contiguous/ chunk of $0.5N^{2}$ doubles for the distance matrices, which allows SSE but is a serious disadvantage for high values of , because of memory issues ( is the number of vertices on the mesh). The largest mesh previously used was ~14,000 vertices, and only allocating the memory took at least 10 minutes (the operating system had to move stuff out of the way). That's obviously not the way to get (distances should be computed on demand and cached), but it is what the coder wrote at the time.

14,000 would give us roughly 120x120 points. It will be good to see if this can improve our results and, if so, to what extent. The artificial cap on the number has been raised and it does take a lot of time to allocate memory. From a purely research-oriented point of view, it will be interesting to at least check the feasibility and suitability of this approach, regardless of performance in terms of efficiency, at least for the time being. In some previous experiments we needed to essentially down-sample the surface into a 30x30 or so grid (not grid per se but vertices), which makes a high-resolution image almost 'iconised' (32x32 by some conventions) and therefore performance was poor. This is similar to the limitation encountered when applying PCA in image space, having to reduce the covariance matrix to something manageable by the available memory, e.g. 1000 observations or 30 pixels/points by 30 pixels/points. The PCA approach was also applied to the gradient, we tried a hybrid of signals, e.g. intensity/depth signal with derivative, and ideally we foresee a combination of GMDS and PCA, enabling scaling along particular dimensions to be carried out dimensionality reduction the classic PCA way. This is still a subject of debate.

One might guess that a modern machine with 8 GB ram cannot run GMDS on a mesh with more than ~100,000 vertices. One possible solution would be a multi-resolution or coarse-to-fine approach. We might, for instance, wish to first view the problem from above, watching general topology, then pick particular segmented (by GMDS) sub-parts and apply GMDS to those for a finer similarity assessment and score (ensemble). FMM is already being used to carve out surfaces from the whole, depending on the location of easily-identifiable points and geodesic circle/s around these.

Having spent several hours learning the behaviour scalability-wise, it seems safe to say that: 1) stress minimisation is faster by a factor of almost 10; 2) stability issues that are caused by the remainder of the GMDS process limit one's ability to run unsupervised experiments, especially when the number of vertices is high; 3) increasing the number of vertices may often lead to a sort of program freeze, from which the only apparent route out is killing of the whole MATLAB process (PID) or disconnecting. This is not necessarily related to the aforementioned stability issues. Going above 4,000 or so vertices almost always results in this problem, which is not trivial to debug. It is likely correlated to RAM being almost fully exhausted (it is a shared server) and swapping being used spuriously.

Our performance depends heavily on our ability to handle full resolution images, as opposed to a sub-sampling of them (without interpolation). GMDS almost always gets the general topology right (PCA too performs at similar levels), but for inter-person distinction we must begin to think of a better approach that scales gracefully. A lot of information is unaccounted for.

The distance computation should be performed by demand and not as a pre-process that computes all to all distances on the given mesh. This would immediately make the whole procedure function at a complexity related to the number of points and not the mesh resolution.

Roy Schestowitz 2012-01-08