Introduction About Site Map

XML
RSS 2 Feed RSS 2 Feed
Navigation

Main Page | Blog Index

Archive for the ‘Search’ Category

Search Engine Downtime Has a High Cost

Servers stack

BACK when I communicated with Google, I came to realise that they have engineers whose sole/main purpose was to ensure the site stays online at all times. A few days ago I had another odd realisation, but perhaps a very obvious one. To search engines, downtimes are a hugely damaging thing. If people are unable to search for something immediately, they will choose a different tool. They must. By testing the water elsewhere—as such a downtime would lead to—failure can encourage them to switch to the rival.

Ordinary sites, as opposed to such complex tools, do not have this problem. How many of us use a single search engine exclusively? What would happen if one day we found that the grass is greener elsewhere? Search, as opposed to a flow of information, tends to have immediate need. It cannot be deferred until the favourite site returns. So, defection can be a matter of availability and its impact should not be underestimated. Downtime on a corporate network rarely has any long-term impact, unlike search tools whose quality is a subjective thing.

Being Big But Doing No Evil?

Google Cookie

The eternal Google cookie leaves room for worries and doubt

A Guardian writer opines that Google’s growth will make it fearsome.

As the world’s biggest search engine starts to compete with old media it risks becoming the ‘Microsoft of the internet’. Richard Wachman reports

Google Honours (a Little More) Privacy

Googleplex in London
Image of Googleplex in London (from ZDNet gallery)

As you have probably heard by now, Google has taken baby steps towards not being quite so evil. Privacy concerns will be addressed by anonimising log files.

SAN JOSE, Calif. – Google, the world’s largest search engine, is dramatically changing the way it treats personal information.

Alexa Versus Netcraft Ranks

Wikipedia statistics

I will start with a proposition that I repeat rather often: Alexa ranks are flawed. Usually, for most sites, they are utterly meaningless.

It is difficult to argue this when faced with Alexa-happy people, but the figures cannot be trusted. It is a toolbar that acts in a similar way to spyware which drives these ranks. The A9 toolbar for Firefox used to have the same effect. Recently however, Microsoft grabbed A9 by the balls and forced them to drop the toolbar. No more Alexa manipulation on Macs and Linux boxes. So where do we end up?

Alexa aligns with Webmasters’ surfing habits. Netcraft figures, on the other hand, align better with system administrators’ surfing habits. The two can intersect. The shown figures are, by default, calculated from a three-month average of pageviews. One can view daily reach though to see how it goes willy-nilly when a few regular visitors use the toolbar. The exception to this might be the very top sites, although the definition of traffic still matters.

Manipulation gets harder at that stage where top sites get ranked. Many people game Alexa as well. Do not trust Alexa ranks. Ever. Use Netcraft if you want something that’s not just an alternative, but is also better in the sense that fewer people game it. Here are some example statistics from two top site.

  • Netcraft rank for Netscape: 341st
  • Netcraft rank for Digg: 867th
  • Alexa rank for Netscape: 479th
  • Alexa rank for Digg: 79th

See? No alignment between Netcraft and Alexa figures at all. Not even for top sites. These so-called ‘realistic’ figures collide and contradict one another. Alexa has become one these “everybody steals, so I can as well” sort of thing… grossly biased. While people continue to game Alexa it remains a strange animal.

Internet Dinosaurs

Rabbit
Evolve or perish

I love the Larry King show. I really do, I swear! I no longer get around to watching Larry King Live, but I have truly admired the guy for his eloquence and knowledge… until now. Have a look at the reason. While some people lag behind when it comes to understanding the Web (HTML), IBM celebrates the tenth anniversary of XML.

Assorted Search Engines Observations

WHILE Ask transforms itself into a search engine with technical merits (rather than many ads), Internet leviathan Google continues its climb. It appears as though this duo comprises the only prominent players that still gain market share. I have become rather fond of the Linux/Open Source news feed that Ask is offering.

Also while on the issue of search engines, various sites build content from RSS feeds in a supposedly legal fashion. This transforms the way search engines operate. Have a look at this discussion that is a good roundup. Google has already been sued in Europe over its news aggregator

Lastly (yes, more on the subject of search engines): Politicians beware – Internet prolongs blunders

Divisive Web

InternetAccording to an article that I recently read, the Internet could one day be broken down into separate networks that are isolated and selectively dispersed around the world. This means that the global nature of the Web, as well as the wealth of information, would cease to exist. Moreover, this heralds that final goodbye to a state where little or no censorship barriers can prevail. This changes one’s perspective entirely.

This worrisome move is entirely different from the issue of Net neutrality, which in itself separates the Web into multiple tiers. It is also reminiscent of rumours about ‘Googlenet’, where one submits a site to a dark privatised Web that gets indexed and closely monitored (obviating the need to crawl remote servers and use pings for distant notification).

In the long term, whether this is totally disastrous or not remains to be seen. Consider, for instance, the peculiar extension of resources that are made publicly available. Let’s look a look at the way that the Web has evolved in recent years. Only a tiny crosssection of the ‘visible’ Web involves content spammers (or scrapers), where visibility is grossly defined by search engines (internal sites and intranets aside). However, in reality, the content that exists on the Web–that which is deliverable and which is spam–can actually be a majority (spammers spawn colossal colonies of junk and dummy content). This leads to (or involves) blogalanches, ‘poisoning’ of the index/cache, and it’s subverting search results in the process. All this leads to chaos as search engines diverge from the correct search results and deliver something less meaningful. In the process of struggling for good spots (or visibility) in search engines, spam rises and leads to attacks of various sort. Temptation leads to vandalism, which leads to further maintenance. The Web no longer seems like an appealing place to be. But can division of the Web help? I very much doubt it. It’s all about authorities controlling information. Brainwash is the means for making others think alike, comply, and even be submissive.

Retrieval statistics: 14 queries taking a total of 0.172 seconds • Please report low bandwidth using the feedback form
Original styles created by Ian Main (all acknowledgements) • PHP scripts and styles later modified by Roy Schestowitz • Help yourself to a GPL'd copy
|— Proudly powered by W o r d P r e s s — based on a heavily-hacked version 1.2.1 (Mingus) installation —|