Introduction About Site Map

XML
RSS 2 Feed RSS 2 Feed
Navigation

Main Page | Blog Index

Archive for the ‘Search’ Category

Google Honours (a Little More) Privacy

Googleplex in London
Image of Googleplex in London (from ZDNet gallery)

As you have probably heard by now, Google has taken baby steps towards not being quite so evil. Privacy concerns will be addressed by anonimising log files.

SAN JOSE, Calif. – Google, the world’s largest search engine, is dramatically changing the way it treats personal information.

Alexa Versus Netcraft Ranks

Wikipedia statistics

I will start with a proposition that I repeat rather often: Alexa ranks are flawed. Usually, for most sites, they are utterly meaningless.

It is difficult to argue this when faced with Alexa-happy people, but the figures cannot be trusted. It is a toolbar that acts in a similar way to spyware which drives these ranks. The A9 toolbar for Firefox used to have the same effect. Recently however, Microsoft grabbed A9 by the balls and forced them to drop the toolbar. No more Alexa manipulation on Macs and Linux boxes. So where do we end up?

Alexa aligns with Webmasters’ surfing habits. Netcraft figures, on the other hand, align better with system administrators’ surfing habits. The two can intersect. The shown figures are, by default, calculated from a three-month average of pageviews. One can view daily reach though to see how it goes willy-nilly when a few regular visitors use the toolbar. The exception to this might be the very top sites, although the definition of traffic still matters.

Manipulation gets harder at that stage where top sites get ranked. Many people game Alexa as well. Do not trust Alexa ranks. Ever. Use Netcraft if you want something that’s not just an alternative, but is also better in the sense that fewer people game it. Here are some example statistics from two top site.

  • Netcraft rank for Netscape: 341st
  • Netcraft rank for Digg: 867th
  • Alexa rank for Netscape: 479th
  • Alexa rank for Digg: 79th

See? No alignment between Netcraft and Alexa figures at all. Not even for top sites. These so-called ‘realistic’ figures collide and contradict one another. Alexa has become one these “everybody steals, so I can as well” sort of thing… grossly biased. While people continue to game Alexa it remains a strange animal.

Internet Dinosaurs

Rabbit
Evolve or perish

I love the Larry King show. I really do, I swear! I no longer get around to watching Larry King Live, but I have truly admired the guy for his eloquence and knowledge… until now. Have a look at the reason. While some people lag behind when it comes to understanding the Web (HTML), IBM celebrates the tenth anniversary of XML.

Assorted Search Engines Observations

WHILE Ask transforms itself into a search engine with technical merits (rather than many ads), Internet leviathan Google continues its climb. It appears as though this duo comprises the only prominent players that still gain market share. I have become rather fond of the Linux/Open Source news feed that Ask is offering.

Also while on the issue of search engines, various sites build content from RSS feeds in a supposedly legal fashion. This transforms the way search engines operate. Have a look at this discussion that is a good roundup. Google has already been sued in Europe over its news aggregator

Lastly (yes, more on the subject of search engines): Politicians beware – Internet prolongs blunders

Divisive Web

InternetAccording to an article that I recently read, the Internet could one day be broken down into separate networks that are isolated and selectively dispersed around the world. This means that the global nature of the Web, as well as the wealth of information, would cease to exist. Moreover, this heralds that final goodbye to a state where little or no censorship barriers can prevail. This changes one’s perspective entirely.

This worrisome move is entirely different from the issue of Net neutrality, which in itself separates the Web into multiple tiers. It is also reminiscent of rumours about ‘Googlenet’, where one submits a site to a dark privatised Web that gets indexed and closely monitored (obviating the need to crawl remote servers and use pings for distant notification).

In the long term, whether this is totally disastrous or not remains to be seen. Consider, for instance, the peculiar extension of resources that are made publicly available. Let’s look a look at the way that the Web has evolved in recent years. Only a tiny crosssection of the ‘visible’ Web involves content spammers (or scrapers), where visibility is grossly defined by search engines (internal sites and intranets aside). However, in reality, the content that exists on the Web–that which is deliverable and which is spam–can actually be a majority (spammers spawn colossal colonies of junk and dummy content). This leads to (or involves) blogalanches, ‘poisoning’ of the index/cache, and it’s subverting search results in the process. All this leads to chaos as search engines diverge from the correct search results and deliver something less meaningful. In the process of struggling for good spots (or visibility) in search engines, spam rises and leads to attacks of various sort. Temptation leads to vandalism, which leads to further maintenance. The Web no longer seems like an appealing place to be. But can division of the Web help? I very much doubt it. It’s all about authorities controlling information. Brainwash is the means for making others think alike, comply, and even be submissive.

Does Digestion of Rivals Make You Evil?

Google on a computer screen

GOOGLE is a prime example (sometimes even a role model) in various different contexts. However, Google is also known for its aggressive reach (otherwise appeal) to people who are not deservedly earned. The company has been accused of leading to ‘brain drain’ in the States, as well as disrupting operations in some companies. This was essentially caused by pulling of employees without prior notice. One notable embodiment of this behaviour is snatching of developers from Open Source projects that concern Google, e.g. Vinton Cerf, Andy Morton, and Greg Stein. There are several other VIP figures that serve as equally-valid examples.

There is no problem with this on the shallow level of principles. The developers have the will and the right to pursue their goals and follow their passions. But the snag comes at the end of my post.

Google has been thriving on three main choices/ideologies, in my opinion.

  • One smart move was the use of Linux, which increased performance, contributed to secrecy, and raised stability while reducing running costs;
  • Another was the use of new algorithms, which are very much like those you would find in academic trackers (e.g. publication farms like CiteSeer). Counting backlinks was not a hard idea to embrace or anything unprecedented, with the exception of Web search perhaps;
  • Lastly, the famous mantra and general laid-back culture put an innocent face on a to-be behemoth.

Ultimately, you may find that Google’s continued success relied on technologies that were harnessed through acquisitions (compare with Microsoft). I am not at all fond of the direction our economy takes as it evolves. It kills startups (one could spot quite a few of these Google victims on eBay) and it shrinks the level of choice in industry. I am a strong advocate of diversity and co-existence, by means of reduced aggressiveness in human resources and management. When lawyers and investors take over a pool of intellect, alarm bells should be sounded.

Second Interview with Google

Googleplex in London
Image of Googleplex in London (from ZDNet gallery)

LAST night I had my 2nd interview with google (a *nix systems administration position). What is noteowrthy is that I did not apply for a job. I was contacted by Google owing to my involvement and work on the Web. I am patiently waiting to hear their decision (should take several days), but I am pessimistic. Some questions were really hard and I needed hints. These questions were less analytical than I had expected.

Retrieval statistics: 21 queries taking a total of 0.756 seconds • Please report low bandwidth using the feedback form
Original styles created by Ian Main (all acknowledgements) • PHP scripts and styles later modified by Roy Schestowitz • Help yourself to a GPL'd copy
|— Proudly powered by W o r d P r e s s — based on a heavily-hacked version 1.2.1 (Mingus) installation —|