Archive for October, 2005

In Defence of Google Print

Book scanning

ERIC Scmidt, who is the CEO of Google, speaks out in defence of Google Print. Goole Print is the controversial initiative to scan literature, infringing copyrights in the process. The intended service, titled Google Print, is to provide surfers with instant and comprehensive coverage of books from the shelf.

Imagine sitting at your computer and, in less than a second, searching the full text of every book ever written. Imagine an historian being able to instantly find every book that mentions the Battle of Algiers. Imagine a high school student in Bangladesh discovering an out-of-print author held only in a library in Ann Arbor. Imagine one giant electronic card catalog that makes all the world’s books discoverable with just a few keystrokes by anyone, anywhere, anytime.

C-Block Penalties

Google Cookie

WE have begun to hear more and more about link filtering, selective links, rel='nofollow' and link penalties in recent days. It appears as though sites might be penalised for having certain links associated with them. I am skeptic nonetheless and hereby I present a case study argument as to why no SE could ever (successfully) deploy such an approach.

Think of a Web designer, Mr. X, who has built commercial sites for Mrs. Y and Mrs. Z.

Since Mr. X knows his Web host rather well and wants to centralise his bills, he registers the sites for his clients himself (and possibly ownerships are also attributed to his own business, if not hosted locally). Once done, he does not neglect to add the new sites to his portfolio page. Moreover, he remembers to include a footer in his clients’ sites, which link back to him and potentially attract some clients who liked his work.

Will a search engine penalise X, Y and Z as a consequence? Will they all run out of business because they work together, acknowledging one another reciprocally? Some links are exchanged for the benefit of the visitor (as illustrated above). Cohesiveness and communities are the way our Internet is built and research in IBM has shown that.

The Web is not an isolated set of Web sites. Penalising for cross-site relationships would be a chaotic mistake. In fact, the only way to ever resolve this is to look for off-site links that depart from a ‘community’. But what if these are not relavant? What if all Chinese sites linked to one another because their native language is the same? That’s a community. A country can be a community. Bloggers are a community. You can never penalise for clannish patterns, even if the registrar happens identical.

These communal patterns may have led to questioning of the PageRank system in the past. From an old article on PageRank, for example:

As Gary Stock noted here last May, Google “didn’t foresee a tightly-bound body of wirers. They presumed that technicians at USC would link to the best papers from MIT, to the best local sites from a land trust or a river study – rather than a clique, a small group of people writing about each other constantly. They obviously bump the rankings system in a way for which it wasn’t prepared.”

Although it’s tempting to suggest that bloggers broke PageRankâ„¢ it might equally be the case that the Blog Noise issue is emblematic rather than causal. Blog Noise – in the form of ‘trackbacks’, content-free pages and other chaff – is the most visible manifestation, but mindless list-generators are also to blame for Google’s poor performance.

While on the subject, another article from July presents a few more speculations about such C-block-reliant penalties.

Google’s possible purpose for filtering new links

While Google’s algorithm is not made public, it’s generally thought that Google intends to clamp down on link sales for PageRank and for ranking in the SERPs. Also on Google’s hit list are multiple interlinked sites, existing on the same ip c block, entirely for the purposes of link popularity and PageRank enhancement.

Purchased links tend to be added to a website in medium to large quantities, and often all at one time. Large quantities of incoming links, appearing all at once, might indeed trip a filter.

Google could suspect a high volume of links added at one time to be purchased, and therefore suspect. The possibility would be in keeping with Google’s strongly suspected policy of discouraging link sales. After all, Google’s guidelines point out that any type of linking schemes are against its policies.

The ip c block is the third series of numbers in the identity of an ISP. For example, in the c block is denoted as xxx. Google is able to readily identify those links.

TV First, Then Science

I quite liked the critical spin that a Slashdot contributer put to an article on the move to digital TV.

After budgets cuts led to the layoff of engineers and scientists at NASA Jet Propulsion Laboratory, a US Senate committee has approved a $3 billion dollar subsidy to assist Americans in their difficult transition to digital television in 2009.

TV X-FilesWhile we should all know that it is science that drives innovation, money gets spent where the long-term future is uncertain. Television and advertisements that accompany its existence shape up a tremendous industry. However, it is a well-established fact that economy cannot safely propagate to the future (Wall street and the ‘bubble effect’) whereas exploration and new discoveries are capable of putting the States at the forefront. This all comes at a very sensitive time when the whitehouse issues budgetary cuts on science and research while creationism and defence (or contrariwise armament) are better catered for. I am truly concerned.

Workout Reduction

Workout session
The local gym – photo captured in July

TODAY I decided to re-prioritise a few hobbies and activities in my life. Thus far things have gone rather well, but I was at times susceptible to pressure and found it hard to cope with the amount of work that came my way.

I have maintained a stringent habit of working out consistently for nearly 10 years. At the start, when I was only 14, I stuck to short sessions that verged the high frequency of 7 times a week (i.e. a brief daily routine). This soon was reduced to just 6 and around 2001 was altered again and became just 5. A few years ago this was already downed to 4 and a half and this week I have finally settled at 4. This trend is somewhat worrying, at least to me. I am certainly not at the point in my life where I ought to head ‘downhill’ although I admittedly reached a plateau. I still work out approximately 8 hours a week, but I find myself more bound to the Palm handheld throughout. I simply must record my thoughts. I soon realise that I must stay connected more often. That is truly what I enjoy, inevitably.

Earlier today there was an unusually nice atmosphere at the gym so I even fell asleep there. It is not the very rare exception, I might point out. The nice music that is played on the TV set and perhaps the lack of pressure led to this, not to mention the erratic sleep patterns which I have recently adopted. The workouts themselves appear to be getting easier, psychologically in particular, over the years. This may have plenty to do with the fact that I prevent self-imposed pressure and time-constraints. I learned from past mistakes, I presume, as pressure becomes a deterrent strong enough to discourage exercise altogether.

Collaborative Effort to Crawl the Web


ONLY a few days ago, somebody had me aware of the Majestic-12 distributed search engine. The idea behind the engine is persistent use of other people’s computer power and bandwidth. The goal is mind is to crawl and potetially index the Web reasonably well.

This sudden ‘enlightement’, to me at least, provided somewhat of an insight. It affected the matter of practicability in my Open Source Iuron, which is in its early stages and more of a porposal at this stage. As explained before, Iuron does not index pages; it aspires to gain actual knowledge from the Internet instead. This can potentially make PageRank (or equivalents) obsolete, I believe, thereby reducing spam and search engine cheats.

Within a few days, I will be meeting the person who is arguably the father of the Semantic Web. My project will be difficult to lift off the ground without some support. Nonetheless, this now appears to be a hindrance with a simple solution. It is, after all, the kind of project where the vast requirement for bandwidth and computer power can be obtained in more or less the same way as Majestic-12. Since it is Open Source, willingness on the public’s behalf should not be a considerable peril.

On an unrelated topic which is paranoia, I recently noticed a referral reduction from Google. It became conspicuosuly significant in recent days so I thought it was an attempt to silence me. It finally turns out to have been merely a side-effect of a large-scale update at Google’s end. Many Web sites were in fact affected by this and distress became apparent in a few newsgroups. It was even pointed out that was assigned PageRank 2!

Firefox Fork

Firefox in the dock

I have just become aware of Flock, which is an interesting fork of Firefox 1.5. The much-anticipated version 1.5 has not been formally released yet, which makes this a somewhat controversial scenario. Flock is now being promoted by, where a download of Flock warrant a free WordPress blog (at least for the time being).

With all due respect, I am always slightly apprehensive when it comes to adopting, thus relying on forks. I am also aware of the problem associated with forking one’s own application. Once you lag behind, the long-invested dedication can wind up being disposed of. Flock appears to me like the conspicuous rationale behind Mozilla becoming a foundation and going by the identity of

As regards and the Flock relationship, I might give it a try, but I would certainly seek plenty of convincing arguments before I do so. My past experiences with Firefox 1.5 betas (AKA Deer Park) have been fairly disappointing and led to regrets. I have made two such attempts to migrate to a version that was not finalised.

Having said that, a certain other fact is worrying me slightly more. According to ZDNet, the lead developer of Flock said:

“Please note that this is a developer preview and that there are still plenty of bugs, many of which we are aware of.”

To me that sounds as if the application is not quite ready for “prime time” so are possibly getting users on a dangerous wagon.

It was also said, however:

“In architecting our software, build systems and engineering processes, we have given considerable thought to how our code will be able to evolve alongside the Mozilla code, without forking it”

This sounds rather re-assuring and I sure hope these folks will walk as they preach.

Linux to Penetrate Tablets, PDA, Mobile Market

Nokia 770

Some of you have possibly heard about the Nokia 770. For those who haven’t, have a look at this snippet from a recent review in the Washington Post.

The Finnish mobile firm had originally expected the Tablet, which runs on Linux-based software, to enter the market before the end of September.

I personally use a Palm handheld and I consider myself an avid Palm enthusiast. Yet, I was deeply disappointed by the move which had a Treo run Pocket PC. To me, this was beyond a simple “deal with the devil” as I also use Linux and I can imagine the implications of such a dangerous pact. Palm were committed to Linux for quite some time, but were recently taken over by a Japanese giant, so the future is uncertain and past promises or planned strategies should now be taken with a grain of salt.

I started to explore alternative paths and the Nokia 770 is one of these. I have just been reminded that it would not be surprising if Nokia dumped that rusty Symbian altogether and implemented a port of the existing GUI to run on top of the Linux kernel.

From recent articles about the Nokia 770, it sure seems as if the codebase will be made available for other devices as well. I believe the device will be ready by Christmas, so expect a homebred Linux distribution for mobile devices soon. It can then be sold to other vendors which will mark yet another revolution — the entry of Linux into the mobile phones, handheld PDA‘s, tablets, etc. First the servers, then the desktop and soon a penguin in everyone’s pocket, Palm included perhaps.

The remainder of this item slides off the theme of Linux and discusses Web 2.0-like transitions.

How come nobody has implemented a portable PDA synchronisation method? Many PDA’s these days can establish an Internet connection and very many computers reside in a connected environment with always-on connection. Would it not be rational to implement a Web-based Palm Desktop, for example? One which centalises data in on-line accounts? I still wait for a handheld that integrates with and synchronises with all platforms more seamlessly. The Internet, being open, can easily bridge that gap and assist users from the fuller spectrum of operating systems. Sooner or later, this all will happen.

