Re: Googlebot and robots.txt

_____/ On Friday 26 August 2005 07:26, [Tonnie] wrote : \_____

> Alan Cole wrote:
>> Is there any way to stop the Googlebot visiting a site quite so
>> often.... I know a robots.txt file can restrict their access to certain
>> pages but I don't want to do that as I still want all pages listed, I'd
>> just rather the Googlebot didn't visit so often as it eats up all of my
>> bandwidth.
>> It seems to visit my site every few days and manages to crawl through
>> something like 30,000 pages each time it does so.
> Perhapse Google can help:
> http://www.google.com/intl/en/webmasters/bot.html
> Tonnie

Beat you to it. *wink*

To add a little more, I respect Google's approach to this. They ask you to
contact them directly (I assume they reply too) instead of providing a
non-standardised method of communication. 'Extending' the well-agreed-upon
robots.txt sounds like something that other, more vicious companies would
be tempted to do. Hanging 'bits at the end', however, is something that I
suspect one (or more) of the crawlers did. Was it actually Google that
started to support wildcards?

I am not entirely fond of A9/Amazon's siteinfo.xml. Imagine yourself a site
whose top-level directory is filled with different data files for each
individual on-line service. It triggers errors and makes the erection of
new Web sites a daunting and arduous task.


