Re: robots.txt

Roy Schestowitz wrote:

> John Bokma wrote:
>> Roy Schestowitz wrote:
>>> CarolW. wrote:
>>>> One suggestion: Could put the other pages into a subdirectory
>>>> folder then disallow the spiders from those folders.
>>> This can be a big overhaul in the absence of _relative_ links.
>> Just a search & replace :-D
> In theory, yes. How about Web sites with thousands of pages? You then
> need to use an fgrep-like tool which scans a large group of files and
> do the work. It's not easy.

I would use Perl, but I am using that for over 10 years now :-D. And 
yes, with Perl it's quite easy. If you want to knit some Unix things 
together like grep, sed, etc, yeah, takes more time.

> Not to mention that there is no guarantee that text (as opposed to
> tags) will remain unchanged...

Of course you HTML parse it if you want to be sure. With Perl this is 
just a piece of cake.

> with large sites it's difficult to test
> and validate...

You test it first locally, automated, before you upload.

> Anyway, that's just the con, which one has to be aware of...

It all depends on how much one gains :-D.

