Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: robots.txt

  • Subject: Re: robots.txt
  • From: Roy Schestowitz <newsgroups@schestowitz.com>
  • Date: Fri, 14 Jan 2005 03:40:47 +0000
  • Newsgroups: alt.internet.search-engines
  • References: <WWhFd.6854$C.5005@trnddc05> <Xns95DDD881F421castleamber@130.133.1.4> <xPBFd.15599$ig7.7937@trnddc04>
  • User-agent: KNode/0.7.2
star green wrote:

> "John Bokma" <postmaster@castleamber.com> wrote in message
> news:Xns95DDD881F421castleamber@130.133.1.4...
>> star green wrote:
>>
>>> Ive been trying to figure out how to write a robots.txt file that will
>>> allow the robots to access the homepage (index.html), but no other
>>> page on the site.
>>
>> Note that not all robots honor robots.txt. The major search engine ones
>> do (most likely), but ratware bots don't.
>>
>> --
>> John
> 
> I know, I was just wondering if there's something that would work for most
> robots.
> 
> But, I guess there's nothing to be done short of completely reorganizing
> the site (which I don't wish to do) or listing every page in the
> robots.txt file.
> Ah, well.
> 
> Thanks to everyone for the advice!

Use a script. You can get a listing of all HTML files on your site and
output them to a file. Contact me if you need help on this.

Having said that, if your site has, let us say, 10,000 pages, then
robots.txt becomes overly inflated.

Roy

-- 
Roy Schestowitz
http://schestowitz.com

  • Follow-Ups:
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index