Re: Alexa Internet archive crawler gone wild?

__/ [Carol W] on Tuesday 03 January 2006 06:41 \__

> On Tue, 03 Jan 2006 04:57:44 +0000, Roy Schestowitz
> <newsgroups@xxxxxxxxxxxxxxx> wrote:
>>What's  the  benefit of permitting Alexa to crawl though? Having the  site
>>archived  for someone to look back at deleted content in the future?  Have
>>we not learned the lesson yet?
> Actually it can be helpful to have an archived copy - even if that
> particular content or site becomes deleted at a later date. I have
> used the web archive to help locate some information or data that had
> been deleted or removed from the web.
...But  if the size of a site does not exceed gigabytes (particularly when
compressed),  why  not  make use of private storage, which is  often  very
cheap.  You can stack up a progressive backup for just a few quid. If  re-
silience is important, you can duplicate the content periodically. The Web
Archive is slower to access and it tends to mix objects that were collect-
ed at different timepoints.

Another  issue  is people having access to content which was  *accidently*
made  public, or even find the roots of a site whose 'image' has  evolved.
Having said that, the Web Archive can be useful to the user. I once wanted
to  know which Palm models were considered state-of-the-art in 2002, so  I
looked  up the Palm front page in 2002. Going back further, it was  inter-
esting  to  discover  their old dependency on 3COM.  Whether  this  served
/Palm/,  who allowed Alexa to archive the site, remains the main question.
They could probably deny this without it being frowned upon.

Lastly,  as  the OP points out, Alexa can have a noticeable cost,  whether
that  cost is latency when serving visitors and search engines  (crawlers)
or  even  the  traffic  (hosting) bill. Rarely is there  something  to  be

