Introduction About Site Map

RSS 2 Feed RSS 2 Feed

Main Page | Blog Index

Archive for the ‘Spam’ Category

Collaborative E-mail Filtering

Junk mail

THE notion of community-driven decisions is not a foreign one. Voting is just one among many examples. The power of a broad audience should be ascribed to handling of large samples of data, thereby making cohesive — sometimes conflicting — and yet reasonable choices.

The ideas were embraced by Google (no other precedence I am aware of) to filtering of E-mail, albeit other anti-spam initiative and services like began to accumulate and share knowledge as regards origins of spam. In Google Mail, it is the users who help one another in discerning ham from spam. In the latter case, sharing of data (primarily blacklist) shapes the core of of the methodology.

The technique which involves blacklists can be rather powerful. SpamAssassin, for instance, which I happily use as a second gate for my E-mail accounts, has never yet marked genuine messages as spam (false positives). This comes as an astounding statistic. SpamAssassin processes close to 200 messages per day and has been activated for about a year. Worth mentioning is the fact that SpamAssassin is free and Open software.

The more interesting technique involves identification and flagging of spam by many individuals. Algorithms can be further extended to take advantage of machine learning approaches such boosting and support vector machines, which are excellent classifiers capable of generalising while retaining good specificity. Let us consider this a case where science meets industry (somewhat of a rarity).

More recently, such ideas were considered a possible cure to a Web 2.0 epidemic: comment spam. To recite, the key idea is one which involves collaborative spam flagging and assemblage of newly-acquired knowledge in a centralised database. The back end of this service remains obscure, for security purposes. The approach has been facing barriers and various problems such as knowledgebase/learner poisoning using training data which is bogus. This led to some doubts and skepticism in the past.

All in all, I am left wondering: what if Google Mail filters were extrapolated to the benefit of everyone? People already use Google Mail as a spam filter, peripherally, but they cannot contribute. The relationship is not truly reciprocal. Moreover, should such services ideally be handled and governed by one particular company? It boggles the mind. Proposition for adoption of the methods, then making them publicly available, is definitely worthwhile. The Akismet spam stopper, for instance, provides hooks for third-party software, which I think is a wonderful demonstration of communal spirit and openness.

A shot in the dark perhaps, but it would be nice to have the Akismet approach applied to E-mail as well (a la Google Mail). API‘s are merely analogous, but with subject lines and headers included, which helps further. Maybe it can outperform SpamAssassin some day? We shall wait and see how spam can be combatted — spam in its entirely, that is, as opposed to just E-mail.

Spam Varieties

Junk mail

SPAM is everywhere. It comes in different flavours and the way to chew it varies as well. This essay is not about meaty cuisine, but rather about the plague of the electronic age. Anywhere one has an opportunity to gain, spam will prevail. Several days ago, the MATLAB Web site got flooded by endless spam and many innocent contributers got false E-mail notifications, me included.

Spam should not be attributed to and associated with just E-mail. An initiative by AOL at al. to eradicate spam through taxation has most recently been criticised harshly. Genuine mailers begin to suffer financially, to make matters worse than the need to purge spam. This morning it took me approximately half an hour only to purge spam. As for the impact spam has on my life, I continue to tolerate spam through:

  • E-mail
  • Forum subscriptions with site addresses
  • Forum messages
  • Guestbook entries
  • WordPress comments
  • WordPress trackbacks
  • PHP-Nuke links
  • PHP-Nuke news submission
  • Wiki
  • Referrer spam
  • Zombie attacks

I have set up cleanup/purging cycles in my schedule, with frequency dependent on severity. The chores at hand are still a very time-consuming activity and it gets worse by the day. I recently had a fierce argument over someone who used to comment spam. This involved someone who was possibly in cahoot with the spammer. Some spammers deserve a death penalty. If not practised law, then at least as an effective deterrent. Related item: WordPress Comment Spam Prevention

Hiding Your E-mail Address

Separate boxes
Separating ham and spam

ONE powerful technique to avoiding spam are E-mail addresses (accounts) which are not public. They can reside rather happily alongside more public address(es), but the level of ‘noise’ in each then varies. Reading habits benefit from the separation.

An odd suspicion should rise when a private (undisclosed) mail account is beginning to receive spam. Then, one can only spect that EITHER:

  • A trusted person gave the E-mail address to a spammer or posted it publicly for ratbots to harvest


  • Somebody’s computer has been hijacked and address book data pulled from it, leading to misuse

This unfortunate scenario has recently hit me. At the end, it turned out that SpamAssassin was disabled, so I reported the fault to my host. My private account remains clean and has been clean for over a year. I warmheartedly recommend this tactic, which will be explained at greater depth if you follow the link above.

Akismet Problems

Dog scooping
Akismet cleans up your blog spam,
but false positives sometimes go unnoticed

Akismet is a comment spam prevention mechanism. It can tell apart genuine comments from ‘comment bombing’ and used do so almost flawlessly. The Akismet filter has quickly gained popularity among its origins: WordPress blogs. I set up Akismet in my WordPress 2.0 test blog and mentioned this before in a writeup on ending comment spam using collaborative spam flagging.

Akismet can be used only given a key which establishes some trusted identity. Nevetheless, its performance is said to have degraded recently. I have been wondering for quite a while what would prevent a guild of spammers from downloading and installing WordPress 2, getting an API key and then posting comments to self. They could begin marking comments improperly en masse. Only a trusted few need be able to flag messages. It is is a necessity when one wishes for robustness to fraud to ever prevail. I even mentioned this before, roughly a month before the tool was publicly available.

I have an API key for one blog and another test blog that ran Akismet without a key. That was back in the early days when spam-stopper, as is was named at the time, was actively developed and tested by a set of individuals. Ever since, I believe it has reached many hands and became too easy to gain access to, for malicious purposes as well.

There is hope of successfully reverting the learner back to a more reliable state if backups were made of it on occasions.

Retrieval statistics: 21 queries taking a total of 0.102 seconds • Please report low bandwidth using the feedback form
Original styles created by Ian Main (all acknowledgements) • PHP scripts and styles later modified by Roy Schestowitz • Help yourself to a GPL'd copy
|— Proudly powered by W o r d P r e s s — based on a heavily-hacked version 1.2.1 (Mingus) installation —|