Introduction About Site Map

XML
RSS 2 Feed RSS 2 Feed
Navigation

Main Page | Blog Index

Tuesday, January 17th, 2006, 6:25 am

Search Engines Dig Deeper

SEARCH engines are constantly finding new ways to improve their performance. While there are many methods involved, some of them are less ethical than others. Think of the following perplexing scenario: Refinement of one’s results by means of crawling and utilising the opponents’ data, which is ‘exposed’ to everyone. In legitimate cases this is known as “harvesting” or — put negatively — ascribe to it the connotation of “scraping”. The root of the idea is use of search engines to improve and refine results of another. Scroogle and Webcrawler, for instance, are dependent purely on this concept. There is a cyclic trap here, too. Search engine ‘poisoning’ comes to mind.

Tractor armThink of MSN using Google Directories or Google using del.icio.us, which is a social linkage database that is now owned by Yahoo. Moreover, directories like DMOZ are non-profit, yet they are often open for use (or misuse) by profit-making companies. Link bases like deli.ci.us (owned by Yahoo) have the potential of refining results. As they are publicly available, could anyone truly restrict rivals from accessing and using the potentially valuable data? These links are contributed and managed by the public and have no prescribed copyrights. The depth of exploration for search engines does not seem to be limited, which is worrying.

There seems to be a certain ethical and legal border where image crawling and serving them within frames (Google Images) become questionable, let alone public forums, UseNet included (Google Groups). This by all means refers to Google Images and Google Groups, quite exclusively even. How deep should one be allowed to crawl the data in existence and how should it be attributed to the source? To dare is to win, but often this leads to demise which is catalysed by public opinion. There is no doubt as to whether company acquitions are intended for more extensive data collection, assuming that information, if obtained even in the form of spying, is available. Information can become powerful, but it has a cost which is often the death of privacy.

As I carry on with my drivel on the infiltration of search engines, I also discover the belated arrival of a European search engine.

In his New Year’s address outlining his administration’s plans for 2006, French President Jacques Chirac focused on plans for a European search engine to rival US internet companies such as Yahoo and Google. Some of the top tech labs in France and Germany are reportedly working on the ‘Quaero’ (Latin for ‘to search’) search engine.

Comments are closed.

Back to top

Retrieval statistics: 21 queries taking a total of 0.096 seconds • Please report low bandwidth using the feedback form
Original styles created by Ian Main (all acknowledgements) • PHP scripts and styles later modified by Roy Schestowitz • Help yourself to a GPL'd copy
|— Proudly powered by W o r d P r e s s — based on a heavily-hacked version 1.2.1 (Mingus) installation —|