__/ [ Eric Johnston ] on Thursday 13 April 2006 11:07 \__
>
> "Per-Erik Skramstad" <webmaster@xxxxxxxxxxxxxxxxxxxxxx> wrote in message
> news:443cfa7d$1@xxxxxxxxxxxxxxxxxxxx
>> Big Bill wrote:
>>> On Wed, 12 Apr 2006 14:48:30 +0200, Per-Erik Skramstad
>>> <webmaster@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>>> What am I to write in the robots.txt file to make robots ignore a whole
>>>> folder? I tried Disallow: /foldername/ but it didn't seem to help
>>>
>>> Show us your robots.txt and we'll have a look at it.
>>
>> http://www.rsil.no/robots.txt
>>
>>
>> --
>>
>> Per-Erik Skramstad
>> http://www.korrekturavdelingen.no
>
> Just some guesses / ideas ...
>
> Try removing the spurious space character at the end of the line Disallow:
> /WORDogPDF/
> but retain a carriage return or line feed character to properly terminate
> the line.
>
> There may be come confusion about your url name
> ht tp://www.rsil.n o/WORDogPDF gets changed to ht tp://rsil.n o/WORDogPDF/
> ht tp://www.rsil.n o/WORDogPDF/ stays as ht tp://www.rsil.n o/WORDogPDF/
> Why ?
> You use relative addressing for your page to page links and once someone or
> something has gone to ht tp://rsil.n o/ there is entire duplicate of your
> site for them to browse. This will confuse search engines and possibly
> generate duplicate site penalty. This has something to do with server
> configuration and DNS setup. If you read any of your pages there is no
> indication what is the url of the site. I think you can use something like
> <base href="ht tp://ww w.demo.com/" /> in the head to clarify.
I don't want to step on anybody's toes, but the one-stop place for quick
reference is probably:
http://www.robotstxt.org/wc/robots.html
It explains everything that is supported and does so fairly well.
Fortunately, no search engines have yet diverged from the standard by
supporting wildcards.
> If a search engine read the content before you added the robots.txt it may
> take many months or several years for it to be removed from the search
> engine records. Try asking the search engines to delete. Google have a
> special file deletion process that works well, but you need to check again
> about 6 months later in case it re-appears. If other sites have already
> put links to the unwanted file you can expect calls for the file for ever
> more....
>
> Best regards, Eric.
Have a look here:
http://services.google.com:8882/urlconsole/controller
This uses your robots.txt for instructions.
Best wishes,
Roy
--
Roy S. Schestowitz | Apache: commercial software?s days are numbered
http://Schestowitz.com | SuSE Linux ¦ PGP-Key: 0x74572E8E
3:10am up 43 days 16:53, 6 users, load average: 1.18, 1.04, 1.03
http://iuron.com - next generation of search paradigms
|
|