Introduction About Site Map

XML
RSS 2 Feed RSS 2 Feed
Navigation

Main Page | Blog Index

Archive for the ‘Spam’ Category

The Initiative for Better E-mail

Junk mail

LAST year I got a bit frustrated with the nature of most E-mails that had been reaching my box. I then decided to write a short FAQ (while I was lying by a swimming pool, if I recall correctly). Over time, I found a way of reducing and managing the volume, which often led to unnecessary distractions. I even separated my E-mail into several ‘tiers’, which unlike filters, I find very handy. Some other communication was routed to Wikis.

Over two years ago I came to discover that Knuth had given up on E-mail altogether. I have just come across another nice homepage which expresses the frustration when it comes to badly-formatted E-mails, so I thought I’d share it by quoting:

E-mail is my main form of communication and the best way to reach me is to email howcome@@@opera.@com. I have used email since 1985 and, unlike Donald Knuth, I plan to continue using it in the future. I recommend these rules for writing electronic mail:

  1. Write e-mail messages in plain text (not HTML) with around 70 characters per line.
  2. When quoting other messages, insert your own text underneath the quoted text so that the logical order of the text is preserved
  3. Avoid e-mail attachments: send URLs pointing to your attachments instead.
  4. In particular, never send documents in proprietary formats as e-mail attachments. PDF is acceptable if the formatting of the documentis essential to understanding the document.

If you look at the source of the page (either the original or even this short post), you’ll find a nice trick of obfuscating E-mail addresses, which can prevent mass-harvesting by ratbot, for subsequent spamming.

In Fight Against Spammers, Google Drops Pages

Google Cookie

SEVERAL weeks ago I discussed some of the problems which Google are having with their cache. The links therein paint a full picture that comprises many speculations. Genuine pages from various Web sites across the Web are being dropped. After a while, some more evidence has been reaching the surface, e.g.

Something really weird happened when I had the password problem last week — I completely disappeared from Google.

As discussions, which are oddly enough being deleted (Google may be trying to hide the existence and scale of the problem) indicate, something quite major is happening ‘behind the scenes’.I submitted a relevant link to Digg. As the thread indicates, Google is indexing billions of spammy pages and is apparently dropping and neglecting genuine Web sites in the process. It is not deliberate on Google’s part, but the outcome are poorer search results and reduced traffic, if you are among the Webmasters affected.

Service to Centralise Plagiarism Awareness

Book scanning

WHEN Webmasters discover that someone stole their content, they often report this to Google. Blog plagiarism has truly become a plague, but stolen content, spam, site scraping, and copyrights infringements ought to be reported to all search engines, which is why I set up this one particular Web page. It enables any Webmaster or user to report content misuse to three search engines at once (or in parallel at the least).

What was the motivator for this little handy frameset? I noticed that many people report undesirable content to just a single copany called Google. But what about the others? Shouldn’t all be infomed at once? I believe there should be an impartial, not-for-profit body that watches over content theft. somehat like the DMCA, I suppose. Such a body (or site-accessible service with interfaces) could centralise knowledge regarding Web offenders and prevent various companies from doing this independently, merely reinventing the wheel to achieve the same (search) results.

Think big(ger). Let’s get the Web more organised, better structure, and immune to fraud. Never delegate this responsibility to companies, which are slowly making our Web more commercialised and semi-privately owned.

Chinese Spam on the Rise

Stuffed mailboxes
More spam from more remote countries

Inevitably, with higher bandwidth, more connected nodes, and little supervision (other than endless censorship), Chinese spam is gaining steam. That said, 80% of all spam was recently said to be despatched from compromised Windows computers, which makes this crime passive. As the BBC reports:

The US is close to losing its place as the top spam sending nation on Earth.

Statistics from security firm Sophos show that China is fast catching up the US as a source of junk e-mail.

The amount of spam on my sites has become too overwhelming recently. Certain accounts I will just filter blindly, without bothering to check any spam boxes for false positives. When it takes nearly an hour going through spam, it becomes both unacceptable and impractical.

My Wikis are repeatedly falling victims to Chinese spam and that’s just the beginning of the story, which involves much beyond E-mail. The Web becomes a scary place when much of its traffic is malignant. It becomes hostile, as opposed to user-friendly. It deters the use of E-mail, much as hacking forces some sites to go offline, due to high maintenance costs (time/money).

Simultaneous Spam Reports

Google portal

TO anyone who is interested: I have put together a page that enables content spam to be reported to several search engines in tandem. The purpose of this little ‘utility’ is to centralise various pages of interest, which motivates spam reports that reach more than just a single company. The ‘meat’ of the report can be conveniently copied and pasted from one frame to another. Report spammy sites that violate ethics.

Spam from the Future

Stuffed mailboxes

SPAMMERS are not utterly dumb. They constantly learn how to get past the filters and receive more attention from their recipient. Subject lines are often fudged in such a way so that they beg to be read, but it doesn’t stop there.

Consider, for instance, spam that goes ‘on top of the pile’ by having a future date (thus the title of this essay), assuming reverse chronological mail readers. Such spam can stand outamong the pile of trite spam. Also consider spam with very odd dates, no dates or ones which goes years and decades into the past. In the latter case, this puts message at the bottom edge of the inbox, which again, appeals for more attention.

Lastly, and perhaps most annoyingly, certain spam is being sent with a sender address which resides the recipients domain, which can under many circumstances have it automatically whitelisted.

Collaborative E-mail Filtering

Junk mail

THE notion of community-driven decisions is not a foreign one. Voting is just one among many examples. The power of a broad audience should be ascribed to handling of large samples of data, thereby making cohesive — sometimes conflicting — and yet reasonable choices.

The ideas were embraced by Google (no other precedence I am aware of) to filtering of E-mail, albeit other anti-spam initiative and services like spamcop.net began to accumulate and share knowledge as regards origins of spam. In Google Mail, it is the users who help one another in discerning ham from spam. In the latter case, sharing of data (primarily blacklist) shapes the core of of the methodology.

The technique which involves blacklists can be rather powerful. SpamAssassin, for instance, which I happily use as a second gate for my E-mail accounts, has never yet marked genuine messages as spam (false positives). This comes as an astounding statistic. SpamAssassin processes close to 200 messages per day and has been activated for about a year. Worth mentioning is the fact that SpamAssassin is free and Open software.

The more interesting technique involves identification and flagging of spam by many individuals. Algorithms can be further extended to take advantage of machine learning approaches such boosting and support vector machines, which are excellent classifiers capable of generalising while retaining good specificity. Let us consider this a case where science meets industry (somewhat of a rarity).

More recently, such ideas were considered a possible cure to a Web 2.0 epidemic: comment spam. To recite, the key idea is one which involves collaborative spam flagging and assemblage of newly-acquired knowledge in a centralised database. The back end of this service remains obscure, for security purposes. The approach has been facing barriers and various problems such as knowledgebase/learner poisoning using training data which is bogus. This led to some doubts and skepticism in the past.

All in all, I am left wondering: what if Google Mail filters were extrapolated to the benefit of everyone? People already use Google Mail as a spam filter, peripherally, but they cannot contribute. The relationship is not truly reciprocal. Moreover, should such services ideally be handled and governed by one particular company? It boggles the mind. Proposition for adoption of the methods, then making them publicly available, is definitely worthwhile. The Akismet spam stopper, for instance, provides hooks for third-party software, which I think is a wonderful demonstration of communal spirit and openness.

A shot in the dark perhaps, but it would be nice to have the Akismet approach applied to E-mail as well (a la Google Mail). API‘s are merely analogous, but with subject lines and headers included, which helps further. Maybe it can outperform SpamAssassin some day? We shall wait and see how spam can be combatted — spam in its entirely, that is, as opposed to just E-mail.

Retrieval statistics: 21 queries taking a total of 0.466 seconds • Please report low bandwidth using the feedback form
Original styles created by Ian Main (all acknowledgements) • PHP scripts and styles later modified by Roy Schestowitz • Help yourself to a GPL'd copy
|— Proudly powered by W o r d P r e s s — based on a heavily-hacked version 1.2.1 (Mingus) installation —|