Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Robots.txt - asterisk

  • Subject: Re: Robots.txt - asterisk
  • From: Roy Schestowitz <newsgroups@schestowitz.com>
  • Date: Tue, 26 Jul 2005 04:27:54 +0100
  • Newsgroups: alt.internet.search-engines
  • Organization: schestowitz.com / Manchester University
  • References: <1122227122.544038.152980@g14g2000cwa.googlegroups.com> <op.sufluamy584cds@borek> <nSVEe.15656$vv6.12994@newsfe6-gui.ntli.net> <dc1cld$1tik$1@godfrey.mcc.ac.uk> <DI8Fe.11215$Hd4.234@newsfe2-gui.ntli.net>
  • Reply-to: newsgroups@schestowitz.com
  • User-agent: KNode/0.7.2
Wolfman's Brother wrote:

> Roy Schestowitz wrote:
>> Wolfman's Brother wrote:
>> 
>> 
>>>Borek wrote:
>>>
>>>>On Sun, 24 Jul 2005 19:45:22 +0200, ted <occasionaluse@hotmail.com>
>>>>wrote:
>>>>
>>>>
>>>>>Disallow: /*.gif$
>>>>
>>>>
>>>>If I recall correctly wildcards are not allowed by the standard.
>>>>
>>>>Check at www.robotstxt.org
>>>>
>>>>Best,
>>>>Borek
>>>
>>>To be pedantic about it.. It's not that wildcards arent ALLOWED by the
>>>standard, but that they arent HANDLED by it. So a "*" is not illegal,
>>>but simply means the literal character "*" rather than some other
>>>special meaning.
>>>
>>>However .. Google's handling of robots.txt DOES include special meanings
>>>for wildcards, and in that sense is non-standard.
>>>
>>>Chris
>>>--
>>>http://www.lowth.com/rope - Scriptable packet match logic for IPCop and
>>>                             other linux-based firewalls.
>> 
>> 
>> From http://www.robotstxt.org/wc/faq.html#info
>> 
>> <snip>
>> 
>> Two common errors:
>> 
>>     * Wildcards are _not_ supported: instead of 'Disallow: /tmp/*' just
>>     say
>> 'Disallow: /tmp/'.
>>     
>> ...
>> 
>> </snip>
>> 
>> Roy
>> 
> 
> Quite so.
> 
> "Disallow: /tmp/*" is not wrong, but means: disallow access to files
> whos names start with the six characters "/" "t" "m" "p" "/" "*"
> 
> If you mean: Disallow all files in /tmp, you say..
> 
> Disallow: /tmp/
> 
> But if you have files in /tmp/ that have a "*" as the first character of
> their names (heaven alone knows why you'd want to), which you want to
> disallow, then you'd say..
> 
> Disallow: /tmp/*
> 
> Only trouble is that Google will read this differently. The moral is:
> dont put "*" into your file names.

Huh? Wha? *smile* I didn't even know it was possible to prefix a file with
an asterisk until I checked. The *NIX filesystem will turn *file into
\*file (note the escape character), but will do so quite transparently.

In any case, preceeding filenames with symbols is bad practice. Likewise my
bad habit of capitalising directory names, which causes poorer browsers
like Lynx to invoke 404's many in my error logs and disappoint the visitor.

Windows, on the other hand, imposes unexplained limitations on path length
and filename length. This is why I had to steer away from it altogether and
can never come back. Good riddance, too.

Roy

-- 
Roy S. Schestowitz
http://Schestowitz.com

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index