Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: bots

  • Subject: Re: bots
  • From: Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx>
  • Date: Thu, 11 May 2006 05:56:53 +0100
  • Newsgroups: alt.internet.search-engines
  • Organization: schestowitz.com / MCC / Manchester University
  • References: <4460f79c$0$541$ed2619ec@ptn-nntp-reader01.plus.net> <5680887.ZiombDuED8@schestowitz.com> <dWn8g.449765$7i1.304461@fe06.news.easynews.com> <_W78g.439472$7i1.117514@fe06.news.easynews.com> <1268649.RZZeWs0Bdj@schestowitz.com> <pmr8g.220305$2g2.193999@fe07.news.easynews.com>
  • Reply-to: newsgroups@xxxxxxxxxxxxxxx
  • User-agent: KNode/0.7.2
__/ [ www.1-script.com ] on Wednesday 10 May 2006 20:42 \__

> Roy Schestowitz wrote:
> 
> 
>> Ouch! I believe you have your own dedicated server, fortunately. Maybe
>> you
>> should get another one and spray red "Y!" over it. Sorry, I
>> know it's no
>> place for sarcasm...
> 
> Well, I do but the throughput is not un-metered. So, if they keep at that
> rate (and they are not alone!) I'm going to start excluding bots based on
> their respective engine's ROI ( not my idea:
>
http://www.shoemoney.com/2006/05/03/search-engine-bot-followup-true-searchbot-roi/
> )


Also see the following:

        http://wantmoore.com/archives/2005/06/27/search-engine-query/


>> Replication can be done more efficiently than that. Since much of the
>> content
>> (that you care about) is textual, one could compress content as set it
>> aside. Compression algorithms can reduce natural text to about 10-20%
>> of its
>> original size. I don't know how large their indices are (compared with
>> full
>> text, i.e. Google Cache), but dumping of that data certainly does not
>> depend
>> on the way it's stored/structured. If they don't back up their data and
>> send
>> it to a remote location, they play a very risky game. I'm assuming that
>> the
>> datacentres serves them as some arrays of redundancy /already/.
> 
> There is no reason to believe that they are not ALREADY storing data in a
> compressed format. Google is big but they have to play by the same rules
> as everybody else that uses databases. They are (were?) backing data up
> until the point where the original data reaches half the size of their
> storing capacity. Beyond that you have to decide whether you freeze the
> size of your database (and Yahoo! is pushing theirs up, so there is no
> stopping here) and selectively drop some old data. That's hard for Google
> because for Google old==good. They can also buy more hard drives to the
> point where actually powering them may become a HUGE expense in itself,
> not even talking about maintenance and other associated expenses.
> 
> So, there is really no easy solution to your problems if you are the
> worlds #1 search engine!


If (old==good), then{

they could simply buy Alexa with their 6-perabyte (not certain about the
number as it was said to contain just 100 terabytes in the past) Time
Machine. The architecture makes data slow and cumbersome to access, but it's
there, nicely shelved in case you want to snatch it some time in the future.
The one terabytes hard-drives are not far ahead, chronologically-speaking...

        http://www.hitachigst.com/hdd/research/recording_head/pr/index.html

How about mobile machines with 3.5 perabyte (3,500,000 GB) of storage?

        http://www.pbs.org/cringely/pulpit/pulpit20051117.html

The future holds plenty of possibilities, unless high capacity leads to
melting.

}

Best wishes,

Roy

-- 
Roy S. Schestowitz      |    "Error, no keyboard - press F1 to continue"
http://Schestowitz.com  |  SuSE GNU/Linux   ¦     PGP-Key: 0x74572E8E
  5:45am  up 13 days 12:42,  8 users,  load average: 1.69, 1.04, 0.82
      http://iuron.com - help build a non-profit search engine

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index