Introduction About Site Map

XML
RSS 2 Feed RSS 2 Feed
Navigation

Main Page | Blog Index

Archive for January, 2012

Join Diaspora… But Maybe Not Just Yet

Up and down all day long

Roundabout

SEVERAL months ago I joined Diaspora and enjoyed the good uptime of the service. The community was thriving, everyone was friendly, and the site reacted to input as one would expect. But then, just like Identi.ca, the site began having performance and uptime issues. At one point the site was down for a week. People soon lost those withdrawal symptoms and perhaps just moved on; some returned only to see sporadic operation of the site, which fairly enough is still in alpha (the software it runs is). But the bottom line is, in the early days people reviewed the site harshly for technical shortcomings. Now it’s just the really terrible uptime and low reliability. Unless this gets fixed the site is likely to lose its most ardent supporters and participants.

When Diaspora becomes “stable” it may all be resolved, but by that point, how many people will be on the JoinDiaspora pod?

Humans Are Technically Animals, But Some People Treat Them as Such

Unicorn

HUMAN BEINGS are a special kind of animal because we, humans, are the only ones capable of writing about animals. The inclination to distinguish between human and animal is an artificial one, a bit like saying “pork” and not “pig” and “beef” instead of “cow” (not personifying something we eat). But as humans we do have special responsibilities for those of our kind — it’s an implicit contract we share because no-one wants to be seen as potential prey of one like oneself.

To drive humans into abuse of other humans (or even cannibalism) it takes great disturbance and a bad mind. Like most animals, as part of our survival instincts we choose to bond with our kind — sometimes bonding against other species (tribalism is the causes of many wars). But the general commonality is, people take some sort of bad fuse to happily mistreat fellow people. Steven Weinberg once said that “[r]eligion is an insult to human dignity. With or without it, you’d have good people doing good things and evil people doing bad things, but for good people to do bad things, it takes religion.”

Likewise, people typically choose to treat fellow humans well, but for someone to talk about peers as though they are animals it takes extreme capitalism and devaluation of life it can lead to. When will we start talking about the ills of this kind of globalisation?

GMDS vs Other FMM-based Measures

IN the previous post on this subject we looked at the masks used in a GMDS pipeline tailored for recognition purposes. The question now is, what would be a constructive way to progress from the conclusion?

Well, the goal is to beat the competition and do so with methods of a particular kind — the kind we advocate — which seems achievable but requires a lot of tinkering, seeing where and how mistakes can be resolved/avoided.

An additional experiment, taking about a day to complete, shows not much promise. Its goal is simply to compare the performance of GMDS with the new Fast Marching Methods-based measures (faster) when all parameters are kept consistent across runs. With many rings, many points, and many vertices, recongition performance is relatively poor because of the hard dataset, as demonstrated by the ROC curve. The point to note though is that in hard cases GMDS is outperformed by the other approach. One question is, are there any measurable quantifies (other than stress) resulting from GMDS and capable of assessing similarity?

GMDS multiple runs

FMM-based

There are some additional results from the last set of shallow, comparative tests. By applying the same experiment’s parameters to test an antiquated triangle-counting approach and a best fit GMDS approach (rather than average over multiple runs) we get two more ROC curves.

Triangle counting

GMDS best fit

Finally, using this same difficult set (where problematic cases are included) we get a ROC curve for the standard GMDS approach.

Simple GMDS

In GMDS one could either work with and L2 norm (which is what we do right now) or Linfty. In fact, if one takes the log of the distances and apply GMDS, one does, in a sense Lipschitz embedding. The diffusion distances could also be used within the GMDS framework. In fact, with better interpolation properties, one can interpolate the eigenfunction before integrating the distance itself…

I spoke to a colleague about Linfty and I now attempt to compile it on GNU/Linux (not done before). It would be interesting to know if GMDS been tested where D is computed based on spectral properties (not as geodesics). There an IJVC paper with Sapiro in which this is done; in it, the authors are also finding symmetries that way.

Coarse Correspondence in Riemannian Manifolds: Masks and Multi-Resolution Approach

THIS post is part of a series that explores the potential of comparing surfaces using GMDS, or generalised multidimensional scaling.

A 15-level multi-resolution approach has led to no classification errors being made, at least thus far. This may have helped prevent the convergence near local minima, but it is painfully slow, especially when the C++ implementation does not get used. I have looked into more ways to widen the separation between correct pairs (same person) and incorrect pairs. I have begun looking at the impact of two factors; one is the size of the mask used prior to GMDS (or dilation through iteration) and another is the number of multi-resolution levels. Based on an ongoing experiment, a very coarse correspondence/initialisation leads to something quite reasonable when the pairs belong to the same subject and everything is a lot quicker and a bit of a mess otherwise (see the first two images).

At 3 cumulative levels in the multi-resolution reproach, a false classification does not take long to occur, so I increased that to 15 and ran that at 3 levels of dilation from the three centres for half a day. In spite of the optimisation process taking a lot longer, performance was not good, peaking well below 90% recognition rate. Although the tested dataset is not large enough to draw conclusions from, the recent experiments seem to suggest that not that a multi-scale approach on its own cannot resolve frequent recurrence of misclassifications.

In order to better understand what weakens these measures I have taken a closer look at visual GMDS output. It seems as though the scores are heightened when real (correspondent) pairs of surfaces are not yielding the correct correspondence, even after a 15-level optimisation when the data is exactly the same except the mask size (as shown in the images).

In the past, taking the best fit among all matches was tried (and watched as secondary/surrogate in all of the recent experiments in fact), but it does not perform well as a discriminant on its own. If GMDS succeeds at finding the accurate correspondence 95% of the time in these circumstances, then in this case we need to rerun GMDS several times to exceed it in terms of recognition rates. The other FMM-based method (the one I created) achieved better recognition rates than that.

In order to test the mask type and its effect on performance I ended up setting all parameters to fixed values and running very coarse-scale experiments, first on the entire face and later on a masked subset of limited size, particularly focusing on rigid parts alone.

Results were interesting in the sense that they showed that, based on GMDS as assessment criterion, smaller mask around the fixed points do not clearly and unambiguously produce better results, at least not in the case of this dataset. The assumption we had about removing the non-rigid area may have been misplaced — inherited from other work.

In the next stage I will return to fine levels to get vastly better results.

Ideally, we should initiate these high-resolution triangulations by the result we get from the lower resolution. Currently, by default, there are 9 levels, adjusted according to m, the number of sample size (300 in the latest few experiments). It’s the number of levels in the multi-resolution hierarchy. At the finest level there are typically 15999 faces and 8101 vertices (in older experiments we had just about ~2000 and in recent ones there were ~4000). A low-to-high resolution is operated by default, but it is not sufficient for evading local minima. This should explain the decrease in performance and Initialisation with lower resolution result (iteratively) should solve part of the problems.

Computer Vision Versus Big Brother

Handycam

LIKE nuclear physics, good science can be exploited for bad causes. Getting access to powerful methods need not necessarily mean that this power will be benign. In fact, a lot of funding comes towards science for malicious reasons, such as war. Just look how much money gets funneled by the military industrial complex into aviation and other such research faculties/industries. There is a danger, however, which has a lot to do with how Computer Vision gets tied up with Big Brother connotations, even though Computer Vision gets used a lot to save people’s lives, e.g. in computer-guided surgeries. It would be a travesty if everyone extrapolated ideas to only show the negative uses of these ideas while ignoring the underlying science. Computer graphics is a generative science, whereas Computer Vision is more analytical. Both use models to understand nature, but one synthesises it, whereas the other converts it into information. If we are doing to assume that information collection is always a bad idea, then the World Wide Web too can be considered harmful.

Running Experiments Over SSH

BAD habits die hard. Good habits stay, so over time we do things more effectively. Today’s post may be relevant to post-doctoral folks whose work involves a lot of computation, including over-night or multi-day experiments.

6 or 7 years ago I wrote detailed posts about how I was using KDE to run experiments on entire clusters of Fedora boxes, over SSH. It has been a long time since then and nowadays, rather than use many dual-core boxes I just mostly use a pair of 8-core computational servers. The tricks are mostly the same and the scripts are largely reused from those that I wrote around 2004 (they are publicly available in this Web site). The procedures are mostly the same, but technically, a few things have changed and this post will detail them (aside from that, back then I did my Ph.D. and now I get paid to run these experiments).

In case of network issues, it is important to run everything but the computation locally. This means that code editing, for example, should be done locally. This reduces cursor/menu lag and prevents loss of work in case of loss of connection. It also assures that files get written to more than one place (local plus remote). Kate with KIO (in Kate) use slaves to enable editing of files over SFTP or SCP/SSH, so this functionality ought to be exploited for controlling experiments remotely. For other tasks, separate terminal windows (e.g. Konsole) should be opened, preferably with endless scrolling buffer, for each remote machine. GUIs can be created to enable quick adjustment and running of experiments. A good terminal will stay active and visible even when a remote connection gets closed, in which case, results can still be observed. In addition, it may help to have another terminal window connected to each remote machine in order to track load throughout runtime, as well as other things (one busy session may permit nothing else to be done on the same terminal). Here it is illustrated graphically, based on screenshots just taken.

Full screen view on workspace 8 (research):

Full screen

Let’s break it down into the left and right monitors:

Full screen left

This left side of the dual-head display contains the tabbed-style view of program files that need editing. All those files are using KIO to essentially be seen as local even though editing them modifies them on the remote servers that I connect to (see terminals on the right). In addition, I used Java framework to create a GUI front end for the experiments. It is singleton for each server.

On the other screen I have this:

Full screen right

Shown at the top left is a window that’s invoked by the program when results are ready. By using KDE’s window-specific settings I can force all such windows to always open in workspace 8 (research), so even if I am busy with other tasks the windows will quietly show up where it belongs, sometimes along with dozen other windows that need attention later. On the right side there are terminals connected to the computational servers. One currently gets used to rsync the code across across servers and the other is tracking server loads.

So, this is pretty much how I use workspace 8. I previously explained how I use workspace 2 (there are 12 in total). KDE makes it easy to multi-task without getting distracted.

How to Quickly Produce HTML-Formatted IRC Logs

Trunks

ON a daily basis I must produce logs for 4 IRC channels. Over time I found more efficient ways for doing so and this post summarises some of the shortcuts and tools. It doesn’t go into specifics where these are not generalisable.

The first stage is opening a template for the post/article linking to the log/s. The post has X’s where the data goes, so that part must be completed by hand and then copied and pasted into the right fields. The way this is done depends on the site and its layout, but the important thing is, use templates. If this gets done daily, then a lot of work and many errors can be prevented.

Then comes the point where the logs themselves get produced. Logging is typically done by IRC clients, but only XChat works well for me with the script that I use since 2008, irclog2html. The first thing to do is open all the files containing the relevant logs. In my case it goes like this:

roy@roy:~$ cat ./irc-files.sh 
kate .xchat2/xchatlogs/FreeNode-#boycottnovell.log  
.xchat2/xchatlogs/FreeNode-#boycottnovell-social.log 
.xchat2/xchatlogs/FreeNode-#techbytes.log
.xchat2/xchatlogs/FreeNode-#techrights.log

The directory and relative paths may vary and one’s favourite text editor may vary as well (I like Kate). Once all the files are open (about 200 MB in my case), open all the previous logs (if any) and scroll down to the bottom. Then, copy a portion of the last line and search for it in the full log to quickly jump to the latest log line, then extend the selection (highlight with keyboard/mouse) to grab the new log text to be converted into HTML. Once this is done, open a new tab/windows, paste the text, and save the text as a file whose name is corresponding to the channel at hand. In my case, that would be irc-log-social, irc-log-techbytes, irc-log-techrights, and irc-log.

Once all the text to be converted is put in the correctly and consistently named files, batch-run the conversion and open the resultant files in a text editor. For example:

python ./Main/Programs/irclog2html.py irc-log-techrights 
--title='IRC: #techrights @ FreeNode: January 22nd, 2012' ; 
python ./Main/Programs/irclog2html.py irc-log --title='IRC:  
#boycottnovell @ FreeNode: January 22nd, 2012' ; python 
./Main/Programs/irclog2html.py irc-log-social --title='IRC:  
#boycottnovell-social @ FreeNode: January 22nd, 2012' ; python 
./Main/Programs/irclog2html.py irc-log-techbytes --title='IRC: 
#techbytes @ FreeNode: January 22nd, 2012' ; kate irc-log.html 
irc-log-social.html irc-log-techrights.html irc-log-techbytes.html

This is basically a simple command run about 4 times, then 4 files opened. Python is needed to run irclog2html.py and there is room for some parameters like page titles. The opening of the resultant files then makes it possible to save under date-stamped files all the logs separately, then upload them and link to them in the template post. That’s about it. If there are any questions about these very basic efficiency tricks, drop me a line. Having published thousands of logs, it is an area I’m quite familiar with.

Retrieval statistics: 18 queries taking a total of 0.127 seconds • Please report low bandwidth using the feedback form
Original styles created by Ian Main (all acknowledgements) • PHP scripts and styles later modified by Roy Schestowitz • Help yourself to a GPL'd copy
|— Proudly powered by W o r d P r e s s — based on a heavily-hacked version 1.2.1 (Mingus) installation —|