Roy Schestowitz wrote:
> star green wrote:
>
>> "John Bokma" <postmaster@castleamber.com> wrote in message
>> news:Xns95DDD881F421castleamber@130.133.1.4...
>>> star green wrote:
>>>
>>>> Ive been trying to figure out how to write a robots.txt file that
>>>> will allow the robots to access the homepage (index.html), but no
>>>> other page on the site.
>>>
>>> Note that not all robots honor robots.txt. The major search engine
>>> ones do (most likely), but ratware bots don't.
>>>
>>> --
>>> John
>>
>> I know, I was just wondering if there's something that would work for
>> most robots.
>>
>> But, I guess there's nothing to be done short of completely
>> reorganizing the site (which I don't wish to do) or listing every
>> page in the robots.txt file.
>> Ah, well.
>>
>> Thanks to everyone for the advice!
>
> Use a script. You can get a listing of all HTML files on your site and
> output them to a file. Contact me if you need help on this.
Script?
find . -name "*.html" -print and pipe that into sed :-D
> Having said that, if your site has, let us say, 10,000 pages, then
> robots.txt becomes overly inflated.
You can probably mod_rewrite to a cgi script that generates robots.txt
on the fly :-D
--
John -> http://johnbokma.com/ Firefox: http://johnbokma.com/firefox/
Perl SEO tools: http://johnbokma.com/perl/
Experienced (web) developer: http://castleamber.com/
Get SEO help: http://johnbokma.com/websitedesign/seo-expert-help.html
|
|