Introduction About Site Map

XML
RSS 2 Feed RSS 2 Feed
Navigation

Main Page | Blog Index

Archive for October, 2005

Non-Evolving Search Engines and Operating Systems

3 Monkeys

Refuse to explore and stay a monkey forever

Google algorithms are very complex at present. The very core of these remains PageRank — a mechanism that often gets misused and leads to disasters (referrer spam among other link spamming techniques). For my recent zombie attacks I blame:

  • Google – for unintentionally leading to ‘link greed’, not understanding or anticipating streetsmarts and penetration of second- and third-world countries into the Internet
  • ISP‘s – for apathetically harbouring traffic that is pure spam or targetted attacks
  • Microsoft – for creating an operating system that is so easy for crooks to capture

Google’s algorithms have become like a horse that has lots of decorations upon it, but is nothing more than a brute-force horse underneath. You can take a pig, put it in a dress and take it out for dinner. But it’s still a pig in dress, not a girlfriend.

Search must Evolve. A flawed or limited principle at the heart of something is bound to fail no matter how many bits you hang atop to patch it up and improve it. This is why traditional page indexing is not a good method for approaching the problem of information extraction and discovery. Microsoft’s operating system, for instance, suffers from the very same problem where a flawed and too complex an operating system was build from ‘code spaghetti’. It was recently heard through the grapevine that Longhorn was thrown away and reverted merely to ground zero to be based on the XP-related Server 2003 code. This comes to show that weekly updates were merely patching a mordid mess. Microsoft recognise their inability to complete with Linux performance (uptime, flexibility — the a reason for Monad). Linux just took a right approach — a right paradigm if you like — all along and was therefore able to sweep along all the best programmers in the world.

Returning to Google, by relying heavily on PageRank and making ad-hoc improvements, no real innovation will be made. This is why I intimated Iuron a few days ago. It ought to turn a large pool of indexed page into actual knowledge and provide definitely answers rather than a linear scatter of related pages.

Also comes to mind are AltaVista and other antiquated search engines with very fundamental and not-so-cunning methods for scanning pages. These were very quickly relaced by backlink-based engines, i.e. link counting in 1998. No progress has been made in nearly 8 years, however. One may begin to conceive a Google killer rather than a Windows killer. The required resources, however, in particular data centres, make the (financial) entry barrier too high to initiate a substantial enough threat. Proprietaries, however, are no concrete barrier, in contrary to the case with operating systems. So, I remain optimistic and I might soon meet some professors whose expertise is the semantic Web.

With reference to the famous 3-monkey image on top (I also have one on top of my monitor), those who refuse to evolve (Ballmer) show ‘zoo symptoms’ already. I vividly recall the day when Scoble quoted Microsoft CEO, Steve Ballmer, saying that RSS has no future. This was roughly 7 months ago. Ballmer also said that Google would vanish in 5 years and promised that MSN search was bound to ‘kill’ Google. I say: live in the past, be the past.

Moving on to a different topic, the latest article with the theme of aging at Microsoft came out on the day when I first composed this item: Pity poor Microsoft’s midlife crisis. Another recent article I have just been informed of is At 30, Microsoft Grapples With Growing Up.

Recommended reading:

Q: What about all the people in the corporate environments who are forced to use MS products and aren’t allowed the option/choice to use Mac/Linux/UNIX?

A: Kick your boss’s ass, or, choose to work for a company who have decisions that you liked.

Aftermath of a Zombie Attack

Dynamite Monkey

As some of you may have read, my site came under a large-scale denial-of-service (DoS) attack some while ago. It managed to endure it, but I was used as a ‘human filter’, getting sporadic exercise, sleep and food for a couple of days. Below I present the aftermath in my logs, which leaves lessons to be learned.

Referrer spam

Click image to see it full-sized

As you can see, the motives of the attacker may have been high Google listings through referrer spam. However, as my site was requested for around 50 GB of dynamically-generated pages in just 1 day, there might be more to it. I suspect that the hundreds of hijacked Windows boxes were programmed to primarily wreak havoc; they were targetting my heaviest pages specifically.

Undoing Selection/Deselection

Thinking dentistA long time ago I mentioned top software design/usability bugs. It occurred to me yesterday that there is yet another common deficiency, which is the inability to undo selection and deselection of files, entries and the like. This should become a very fundamental functionality in my opinion.

How many times in the past did you use the CTRL or SHIFT keys to establish and highlight a collection of files? One wrong click and the entire selection is gone, leading to a habit whereby files are handled in subsets, i.e. in smaller batches. In all operating systems I have come across, a selection of files, for example, forms a newline-separated list of the files with their full path. All of this is stored in the clipboard, so implementation of an undo stack should be trivial and incur no efficiency penalty.

Health Obssession Sarcasm

Orange pillsA week or so ago I notices a small swelling at the back of my neck. Such things are rather common, even among the healthy population, but they tend to draw my attention more than they should. This was not as worrying, however, as a similar tiny swelling near my bottom, which had me see my GP several months ago (in vain of course).

I came to the conclusion that I am overly obsessed with my health, even if all that there is are puny swellings that quickly fade away. Recently I was more paranoid about that abnormality near my neck, which has just disappeared. After all, unlike the other occurrence, it resides next to my second most-valued organ! [pun]

Windows Attacks the Web

Dear Windows users,

Please get your act together and always patch up your operating system, if not migrate to a less vulnerable operating system. The Macs, for example, are not so hard to use.

Your current machines are occasionally getting infected and then used as zombies in the midst of our network. Subsequently, under the control of evil hands, they commence a collateral onslaught on Web sites. Such sites, if not computers in your network, can be powered by Mac O/S, Linux or other UNIX variants. You must become responsible as you reside in a networked environment and can affect it tremendously without you being aware of it.

I am currently suffering from an international army of infected Windows machines. This is no laughing matter as I can confirm all zombies are Windows-driven and there are hundreds of them all over the world.

Windows out-of-the-box (i.e. unpatched) is somewhat of a weapon. It can easily drain bandwidth of other users and inject spam content. It can also lead to downtime of others in a global village that is the Internet. We are now hearing about a Dutch network of people who exploited vulnerable Windows machines world-wide. Be more cautious than ever before or else get disconnected by your ISP. Here at the University we already charge people if they continously get disconnected due to viruses in their Windows machines. Viruses (virii as the common slang) are causing a great deal of distress to network administrators, which is the reason for monetary penalty.

If the attacks on my site do not reach a halt or some solution is found, it might have to be isolated (READ: brought down), which is unacceptable. If not isolated, my site may take other sites down along with it in the future. I am not alone in the recent batch of attacks according to Tao of Mac.

Related items:

Under Zombie Attack

Devil

UNDER the quiet exterior of schestowitz.com, which continues to serve pages reasonably fast, there are actually many problems. For the past two weeks, zombie attacks have been launched against the site. As more Windows machines get infected around the world, the number of attacks surges, approaching tens of thousands per day at the moment. This is much beyond the scale that I am used to or can afford. This gives us yet another reason to hate that unsecure, ‘hijackable’ O/S that is permitted to attack reliable and resilient Linux servers.

I have tried a variety of method to combat the scary scale of these attacks, which get worse by the hour. If anybody knows some good solutions, please send me your advice as soon as possible, before the server collapses. Here are a few valid tools apart from the ad-hoc methods I have been using thus far:

The only glaring issue with the above are that they require ownership or power over the Web server. I have contacted my hosts last night as we might have to collaborate on this. It is not only my sites that get penalised, but also other eCommerce sites that depend on QoS for their income.

UPDATE (5:30AM): Can Apache be configured to block requests based on referring URL (with regex)? I could exclude .to fairly cleanly. Please reply by E-mail if you can assist.

UPDATE (10:50AM): I have been told about modsecurity.org, but I still need root access to my host’s machines.

UPDATE (11:30AM): I have also been told about Patch-o-Matic netfilter/iptables.

UPDATE (11:40AM): The following Apache rule might work, but it is yet untested:

RewriteEngine On
RewriteCond %{HTTP_REFERER} .to/
RewriteRule .* - [F]

More details in a separate post to be published shortly.

E-mail Count

Stuffed mailboxes

I am never too aure as to how E-mail should be counted (if it can be counted at all). There is a certain amount of traffic that can be quantified when it comes to E-mail, but not all E-mails should be equally treated. Points to ponder:

  • Should spam be counted?
  • Or spam that sank in a BoxTrapper and gets reviewed on occasions?
  • Should mailing lists be counted?
  • What about newsletters that anyone can sign up for?
  • E-mails with multiple recipients?
  • Uninvited E-mails that are not spam?
  • Automated messages?
  • E-mail one-liners?
  • E-mails sent to deprecated accounts that are rarely checked, if ever?

I have reached the conclusion that if E-mail bulk is ever to be counted, there must be some weighting applied to the variety of E-mail ‘types’. I have never counted my mail, but I am somewhat fed up with people who brag about the quantity of mail (or spam) they get. I have recently read somewhere that there services that one can sign up for in order to increase the amount of incoming spam traffic. Amazing, is it not? More amazing is the fact that people may actually sign up and welcome the increase in junk. It’s a market niche, right?

Retrieval statistics: 18 queries taking a total of 0.220 seconds • Please report low bandwidth using the feedback form
Original styles created by Ian Main (all acknowledgements) • PHP scripts and styles later modified by Roy Schestowitz • Help yourself to a GPL'd copy
|— Proudly powered by W o r d P r e s s — based on a heavily-hacked version 1.2.1 (Mingus) installation —|