Introduction About Site Map

XML
RSS 2 Feed RSS 2 Feed
Navigation

Main Page | Blog Index

Sunday, March 5th, 2006, 4:26 am

Collaborative E-mail Filtering

Junk mail

THE notion of community-driven decisions is not a foreign one. Voting is just one among many examples. The power of a broad audience should be ascribed to handling of large samples of data, thereby making cohesive — sometimes conflicting — and yet reasonable choices.

The ideas were embraced by Google (no other precedence I am aware of) to filtering of E-mail, albeit other anti-spam initiative and services like spamcop.net began to accumulate and share knowledge as regards origins of spam. In Google Mail, it is the users who help one another in discerning ham from spam. In the latter case, sharing of data (primarily blacklist) shapes the core of of the methodology.

The technique which involves blacklists can be rather powerful. SpamAssassin, for instance, which I happily use as a second gate for my E-mail accounts, has never yet marked genuine messages as spam (false positives). This comes as an astounding statistic. SpamAssassin processes close to 200 messages per day and has been activated for about a year. Worth mentioning is the fact that SpamAssassin is free and Open software.

The more interesting technique involves identification and flagging of spam by many individuals. Algorithms can be further extended to take advantage of machine learning approaches such boosting and support vector machines, which are excellent classifiers capable of generalising while retaining good specificity. Let us consider this a case where science meets industry (somewhat of a rarity).

More recently, such ideas were considered a possible cure to a Web 2.0 epidemic: comment spam. To recite, the key idea is one which involves collaborative spam flagging and assemblage of newly-acquired knowledge in a centralised database. The back end of this service remains obscure, for security purposes. The approach has been facing barriers and various problems such as knowledgebase/learner poisoning using training data which is bogus. This led to some doubts and skepticism in the past.

All in all, I am left wondering: what if Google Mail filters were extrapolated to the benefit of everyone? People already use Google Mail as a spam filter, peripherally, but they cannot contribute. The relationship is not truly reciprocal. Moreover, should such services ideally be handled and governed by one particular company? It boggles the mind. Proposition for adoption of the methods, then making them publicly available, is definitely worthwhile. The Akismet spam stopper, for instance, provides hooks for third-party software, which I think is a wonderful demonstration of communal spirit and openness.

A shot in the dark perhaps, but it would be nice to have the Akismet approach applied to E-mail as well (a la Google Mail). API‘s are merely analogous, but with subject lines and headers included, which helps further. Maybe it can outperform SpamAssassin some day? We shall wait and see how spam can be combatted — spam in its entirely, that is, as opposed to just E-mail.

Comments are closed.

Back to top

Retrieval statistics: 21 queries taking a total of 0.118 seconds • Please report low bandwidth using the feedback form
Original styles created by Ian Main (all acknowledgements) • PHP scripts and styles later modified by Roy Schestowitz • Help yourself to a GPL'd copy
|— Proudly powered by W o r d P r e s s — based on a heavily-hacked version 1.2.1 (Mingus) installation —|