Has Digg Been ‘Hacked’ by Phrama SPAM?

SEVERAL days ago I noticed that Digg’s archives, particularly where prominent (popular) pages are involved, had all their comments invisible and inaccessible. There was no way to get to the comments, even if there were hundreds of them. I lacked an explanation, so after a bit of exploration I believe I may have identified the cause. Watch the following screenshot. It shows you a submission that was made popular 3 months ago and inherited a PageRank of 5. As you can see (mind the scroll bar), none of the original comments appear. Instead, only 4 spammy comments appear, along with links to Asian sites and drug Web sites.


Has Digg been compromised? Has Digg been hiding old comments deliberately, due to SPAM that targets the archives? Perhaps this is just a baseless speculation, but it does not look good. In this one example, Digg is being turned into a link farm. Growing pains?

Handling SPAM in Large Social Content Management Systems

HANDLING SPAM where collaboration is involved can be a hard task, whereas in non-social systems, erasing SPAM (or content with vandalistic intent) may be easy(ier). There is virtually no interaction with the context in the latter case. With social Web sites and forums this equates to nuking a spammer along with some innocent people in the spammer’s surroundings. In the case of collaborative editing, such as this one which I discovered yesterday, judgment becomes difficult as well. It’s marginal because free of speech, authority, and censorship ought to be weighed.

I would like to throw in some random thoughts: earlier this morning, in Netscape, I had a look at some tag clouds and saw some arcane phrases dominating the cloud. I decided to dive in and I found that these sometimes came from spammers (brute-force tagging) whose account had been suspended. This led me to pondering if by leaving these links in tact spammers have an incentive to return. There’s a true dilemma where SPAM comes to a content management system. It’s much easier when it comes to E-mail, unless excessive marketing and bulk mail which fills up boxes is concerned.

Returning to my main concern, Netscape thrives in high figures. High figures may be good for site vanity and integrity of the whole community (e.g. not purging votes and comments of non-spammers), but what about a bury-like feature (a la Digg) that keeps these submissions of out the reach/sight of search engines (and sometimes human visitors too)? As it stands, spammers are currently being rewarded by going under the ‘radar’ of Web spiders.

What if Became a Universal Aggregator?

THIS title of this post was chosen for a dramatic effect. Have a look at the following Netscape profile. I love this guy’s articles/essays, but I can’t help but feel like he’s using Netscape as his personal blog aggregator. In, for a change, he gets other people, maybe his regular readers, to submit (probably without request). His domain was at some point blacklisted, after em masse burying.

Like I said, I love what he writes, but it would be nice if he actually participated in Netscape (see profile stats). I am aware of another site that does the same thing (e.g. Linux screenshots).

There is a similar problem in I am aware of two people who keep doing this (Seopher and Locust). Others were told off for linking to their gateway/corridor pages (robbing actual articles) and they appear to have been banned or vanished due to discouragement. But that’s a wholly separate problem. I am more concerned about the former, so I thought I’d share, or at least give a heads-up to the Digg and Netscape communities.

I ought to emphasise the fact that these people only link to their own site/s. They use Netscape as a link farm that goes in one direction.

Identifying Personal E-mails and ‘Botmails’


STUDIES which analyse large volumes of communication have always been interesting. For instance, most of the E-mail traffic nowadays is identified as SPAM; and over 80% of it is said to come from compromised Windows PC‘s. However, for a change, this is not what I wish to discuss today. I don’t want to have yet another bite at the effects Windows has on the WWW. It leaves me bitter.

Earlier today I read that only 37% of all E-mail at the ‘average’ office are personal E-mails. The rest are not. Some E-mails these days are invoked from a system rather than a human. Typically these are less interesting, less urgent, or can be altogether ignored. Some of that is mass mail, automated and despatched using address databases.

It is sometimes hard to discern between a personal message–one to which a response would be polite–and one which is targetted at a wide audience and whose content is carefully doctored to appear personal. I would like to recommend and promote a personal tip of mine. It is a little method I thought about for detecting and telling apart computer-generated from human-generated mail. When entering your name (e.g. at registration stage), for example, always append extra spaces that serve no purpose but preserve the integrity of the name. Having done so, you challenged the wisdom of the bot. Before punctuation, for example, you can see if a human inserted the name properly. A naive algorithm will not bother to crunch spaces, so the automation deems self-evident.

In other circumstances, having the recipient’s addresses within sight may help. Full headers can be very informative and various Thunderbird extensions even simplify text with representative figures (e.g. routing information as a series of flags, mail client name as an icon, signature as an icon, etc.). It makes the information easier to digest and it adds a wealth of knowledge that is often missed. Lastly, never discount the BCC tricks. A seemingly personal message can reach anyone ‘on the same wagon’.

It’s a Sock Puppet Show at Social Bookmarking Sites

WITH success in any Web site comes some spam, which needs to be combatted effectively. Herein I will deal with social bookmarking Web sites in particular. Spam is not always automated. There is brute-force spam that is scripted; but there’s also self-promotion that strikes in the form of mass submission. And people have begun trying to game Netscape, whose front page bears an admirable PageRank 9. This keeps the Anchors and Chief Editor on their toes.

The problem has become somewhat universal across this new wave of sites comprising contributer-driven content. Digg has had submission parasites, yet human moderation, as well as spam report widgets (community-driven), have it eradicated early on. I usually report any suspicious submission as spam, at least as soon I spot a distinct and objectionable pattern. When the same person always posts to the same domain, for instance, that’s a red flag. Sometimes you can align the username with the domain’s affiliation, but sometimes consistency in the URL is enough. And ‘sock puppets’ (same person with multiple identities that boost a bogus sense of consent) are another-yet-closely-related matter altogether.

When enough stories get intercepted, links the to the domain are banned by principle (for a month if not permanently), or particular Web addresses blocked for good. This sends the appropriate message: abuse, then get your domain blacklisted. This may be better than banning the users who could otherwise change their ways and contribute differently; in a positive way, that is. In fact, some people just haven’t grasped the concepts of social bookmarking, so they fail to see the wrongdoing.

When banning users, there is a need for caution. A pissed off innocent user is far worse than spam that successfully percolates because people talk. They have blogs, so a good rant with proof can get heavy exposure very quickly. And it affects reputation. Look at what has happened in Digg more recently.

An afterthought: One possible workaround to ‘sock puppets’ would be to demand that each newly-subscribed user supplies a unique E-mail address, as well as logs in with an IP address that wasn’t yet used in registration on that same day. This can’t stop instantiation of puppets or protect against proxies and dynamic IP’s. However, it definitely slows down the abuse and reduces incentive to game the system.

Microsoft Windows is Creating Jobs

  • For malware developers
  • For spammers
  • For extortionate botmasters
  • For spam filter developers
  • For firewall developers
  • For anti-virus developers

All of the above are nasties or software that defends against them. All of them exist and prosper owing to the fact that Windows was never built with security in mind. I can’t help feeling bitter as I am among the sufferers, despite the fact that I touch no Microsoft software. In a matter of just one week, a 30-megabyte mail account got clogged up by SPAM. The amount that comes in is so sheer that I cannot afford to even look at all the subject lines; rather, I go by patterns and highlighting-type filters. It is unbearable as I am skipping some genuine mail.

Windows botnets have brought the Internet to a dark age. Some people question themselevs as to whether conceding the use of E-mail altogether is the better way. And as for collaboration-based, Web 2.0-ish software, I have already been forced to disable much of its function (e.g. registrations, comments, and open Wikis). I also needed to block 2 IP address yesterday, due to continuous abuse involving heavy and continuous spidering of my main site. At least the abusers’ ISP‘s were alert and they quickly took action. These attacks came to their end yesterday. They were not the first though. It is a recurring pattern.

Several years ago I said that SPAM was a problem that did not affect me and I would rather just ignore it. But I am afraid that it is no longer possible. And if Microsoft does not protect its O/S (Vista was already proven to be hijackble) or loses a very significant market share, things will not improve any time soon. They will only get much, much worse.

Stuff That Bothers Me

Here is an arbitrary list of items which contribute to hassle and even distress:

  • Monolithic content management systems such as PHP-Nuke (and its derivatives or siblings) still attract spam. My forum section has begun eating spam on a daily basis and it is very time consuming. I restore from backup every couple of days, merely reverting to an older database state.
  • The blog’s CAPTCHA filter has been cracked, so I must cope with over 100 spam per day.
  • Some people post incoherent comments which are not only characterised by poor grammar and typos. It makes one wonder if click-and-point sobriety tests should replace CAPTCHA-based filters.
  • With the increase in the number of Windows zombies on the Web, the amount of junk mail that I receive doubled within a few months. I am not alone in this, so I at least find some sympathy.

Speaking of which, the following showed up in the news last night:

Spam zombies give UK ISPs the fear

A massive 96 per cent of 50 ISP respondents cited the proliferation of botnets – networks of virus-infected PCs under the control of hackers – as a key business issue.

According to industry analyst firm Gartner, seven in 10 items of spam originate from infected PCs.

Let us take a moment to thank out friends at Microsoft. Owing to their so-easy-to-hijack operating system, we all choke on spam.

  • Lawsuit against Google over PageRank got bloggers humming. It was a mastery of incompetence. One such lawsuit was apparently successful, so algorithms that discriminate (not deliberately so) can lead their operator to paying fines.
  • Judging by one of the OSDL mailing lists, to which I have been subscribed for while, the OSDL mailing lists (much like xmms-dev) mainly attract spam, kooks, and posers.
  • Outlook Express or Outlook (same codebase; same rubbish; one word less) are a bit of a handful. I am tired of receiving E-mail where responses, are top-posted (‘jeopardy-style’ composition, i.e. answer comes first, then the question).
  • Making your software exclusively available for Windows is like selling and displaying your merchandise at a garbage site just because most prospective customers reside.
  • Digg version 3 does not discourage dupes as effectively as it used to. It makes it somewhat inferior to its predecessor. But I digress…

