Re: robots.txt

Home	Messages Index

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index

Re: robots.txt

Subject: Re: robots.txt
From: Roy Schestowitz <newsgroups@schestowitz.com>
Date: Fri, 14 Jan 2005 03:36:48 +0000
Newsgroups: alt.internet.search-engines
References: <WWhFd.6854$C.5005@trnddc05> <41e5abce.28610794@news.prodigy.net> <cs4rk1$2tjb$1@godfrey.mcc.ac.uk> <Xns95DDED4CA3B7castleamber@130.133.1.4>
User-agent: KNode/0.7.2

John Bokma wrote:

> Roy Schestowitz wrote:
> 
>> CarolW. wrote:
>> 
>>> One suggestion: Could put the other pages into a subdirectory folder
>>> then disallow the spiders from those folders.
>> 
>> This can be a big overhaul in the absence of _relative_ links.
> 
> Just a search & replace :-D

In theory, yes. How about Web sites with thousands of pages? You then need
to use an fgrep-like tool which scans a large group of files and do the
work. It's not easy.

Not to mention that there is no guarantee that text (as opposed to tags)
will remain unchanged... with large sites it's difficult to test and
validate...

Anyway, that's just the con, which one has to be aware of...

-- 
Roy Schestowitz
http://schestowitz.com

Follow-Ups:
- Re: robots.txt
  - From: John Bokma <postmaster@castleamber.com>

References:
- Re: robots.txt
  - From: from_you@nomail.com (CarolW.)
- Re: robots.txt
  - From: Roy Schestowitz <newsgroups@schestowitz.com>

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index