Re: robots.txt

Home	Messages Index

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index

Re: robots.txt

Subject: Re: robots.txt
From: John Bokma <postmaster@castleamber.com>
Date: 14 Jan 2005 05:56:14 GMT
Newsgroups: alt.internet.search-engines
Organization: Castle Amber - software development
References: <WWhFd.6854$C.5005@trnddc05> <41e5abce.28610794@news.prodigy.net> <cs4rk1$2tjb$1@godfrey.mcc.ac.uk> <Xns95DDED4CA3B7castleamber@130.133.1.4> <cs7eot$1gc1$2@godfrey.mcc.ac.uk>
User-agent: Xnews/5.04.25
Xref: news.mcc.ac.uk alt.internet.search-engines:53819

Roy Schestowitz wrote:

> John Bokma wrote:
> 
>> Roy Schestowitz wrote:
>> 
>>> CarolW. wrote:
>>> 
>>>> One suggestion: Could put the other pages into a subdirectory
>>>> folder then disallow the spiders from those folders.
>>> 
>>> This can be a big overhaul in the absence of _relative_ links.
>> 
>> Just a search & replace :-D
> 
> In theory, yes. How about Web sites with thousands of pages? You then
> need to use an fgrep-like tool which scans a large group of files and
> do the work. It's not easy.

I would use Perl, but I am using that for over 10 years now :-D. And 
yes, with Perl it's quite easy. If you want to knit some Unix things 
together like grep, sed, etc, yeah, takes more time.

> Not to mention that there is no guarantee that text (as opposed to
> tags) will remain unchanged...

Of course you HTML parse it if you want to be sure. With Perl this is 
just a piece of cake.

> with large sites it's difficult to test
> and validate...

You test it first locally, automated, before you upload.

> Anyway, that's just the con, which one has to be aware of...

It all depends on how much one gains :-D.

-- 
John -> http://johnbokma.com/  Firefox: http://johnbokma.com/firefox/
                           Perl SEO tools: http://johnbokma.com/perl/
                 Experienced (web) developer: http://castleamber.com/
Get SEO help: http://johnbokma.com/websitedesign/seo-expert-help.html

References:
- Re: robots.txt
  - From: from_you@nomail.com (CarolW.)
- Re: robots.txt
  - From: Roy Schestowitz <newsgroups@schestowitz.com>
- Re: robots.txt
  - From: Roy Schestowitz <newsgroups@schestowitz.com>

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index