Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Crawling Behavior

  • Subject: Re: Crawling Behavior
  • From: Roy Schestowitz <newsgroups@schestowitz.com>
  • Date: Wed, 10 Aug 2005 02:16:34 +0100
  • Newsgroups: alt.internet.search-engines
  • Organization: schestowitz.com / Manchester University
  • References: <ms0hf1p8371kts46c0uuf8a2103ic0gnt0@4ax.com> <ddagcg$9dh$1@phys-news1.kolumbus.fi> <ddakal$1oen$1@godfrey.mcc.ac.uk> <ddalro$mic$1@phys-news1.kolumbus.fi>
  • Reply-to: newsgroups@schestowitz.com
  • User-agent: KNode/0.7.2
Wÿrm wrote:

> 
> "Roy Schestowitz" <newsgroups@schestowitz.com> kirjoitti
> viestissä:ddakal$1oen$1@godfrey.mcc.ac.uk...
> 
> <snip>
> 
>> There are contradictions to this:
> 
> <snip>
> 
>> observers ___had___ pegged
>> the number at no higher than 8 billion documents."
>>
>> [/snip]
>>
>> NOTICE THE LAST SENTENCE...
> 
> Notice word HAD in it ;) So they're talking older index.

You're right, but the previous poster put forward a link to a Yahoo site.
Any company tends to flatter itself and all references to the high figures
(in various sources I have come across) come from the Yahoo Blog.

> Or just try
> search
> THE  in  yahoo and see amount of results.


'The' is a stop word. It doesn't (shouldn't) get indexed. Any results
returned for such a query should be taken with a grain of salt.


>> Yahoo picked up some music from my site, which is impressive (for Yahoo),
>> but they barely have any images as far as I can tell. Yahoo also request
>> around 60% the amount of traffic that Google and MSNBot do judging by
>> statistics that I see, so they appear to be somewhat slow.
> 
> I don't have images or music so I can't be sure about how good google or
> yahoo is indexing those, might have to do some tests when I have time.


Inktomi (Yahoo) do not appear to pick up many images. They certainly don't
add up to quite the same amount of traffic (bandwidth or hits). They also
have bugs in their code which causes them to crawl sites incorrectly and
raise many errors. In my mind, Yahoo have the poorest crawler (not search
engine, but _crawler_) among the top 3 SE's.

 
> Out of curiosity I've been tracking Google, Yahoo and MSN bots and in my
> site, Googlebot visits least. MSN bot hits pages about 25% more than
> GoogleBot, and Yahoo about 2.35 times more.


Yahoo probably likes your site. An old site of mine gets crawled primarily
by MSN, slightly by Google, but is largely neglected by Yahoo.


> My pages are mostly not
> changing, so Googlebot seems either be smartest, or laziest atleast in
> mycase and that's kinda nice thing.  One thing I have forgot to check out
> is what bot index new pages fastests. Might have to check that out later
> when I remember.


Site maps are supposed to speed indexing up, or so I imagine.


> Other interesting detail how bots been hitting my pages is that Yahoo
> seems to have very stable number of hits every day. Google bot hits vary
> from almost none to like half of pages in site, depending a day. MSN bot
> hits are about as varying as Googles.


They must be 'cycling' their attention. Perhaps the intervals between heavy
crawls is more or less fixed...?

Roy

-- 
Roy S. Schestowitz
http://Schestowitz.com

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index