Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: ?google=nocrawl

__/ [ Borek ] on Sunday 19 March 2006 13:29 \__

> On Sun, 19 Mar 2006 12:48:39 +0100, Roy Schestowitz
> <newsgroups@xxxxxxxxxxxxxxx> wrote:
>>>> I read that yesterday morning, but then I thought: "might as well use
>>>> robots.txt". The *last* thing you would want to embed in a page is a
>>>> vendor-specific (branding) information. AdSense bits are more than
>>>> enough
>>>> and _even that_ raises doubt over the Openness of the Web. Also think
>>>> of
>>>> ms-objects in Office-generated 'HTML'.
>>> I don't think I am going to use it, however, I like the idea of a URL
>>> part
>>> working as a limiter (?) - it may take any form, even
>>> "Borek_asks_to_end_crawling_here" :). Too bad robots.txt doesn't allow
>>> wildcards.
>> I don't think I am *ever* going to use it. Either way, I loathe the idea
>> of a
>> URL part b0rking using a limiter (?). It should have taken a *general*
>> form,
>> even "All_search_engines_should_not_index_further" :{.

I guess that, above all, I was trying to be sarcastic. See how I echoed your
paragraph, so I exaggerated as I went along.

> What I mean is - I like the idea of using this type of solution to block
> all well behaving SE crawlers visiting my site from crawling parts of the
> site I don't want to be crawled - and I will be happy to see it as
> approved standard, not a hack. Right now I have to occasionally use some
> hacks to avoid duplicate content problems, this solution will help.

I take your point. *smile*

>> It's a good thing that robots.txt doesn't allow wildcard.
> Why?

It needs to be negotiated before someone stands up and states that robots.txt
wildcards will be supported by search engine X. If they were supported, what
would be permitted? To what extent? Can it be formalised? This has to be an
initiative that is agreed upon by all peers that are affected. Webmasters
need to be informed too.

As an example, think of bgsound in HTML. Microsoft simply invented some
method for playing sound in HTML although it never fit into that format.
Later, people would whine because Netscape or Firefox do not play sound,
thus IE "must be preferable". If everyone took this approach of extending
unilaterally, just imagine what garbage filled with incompatibilities the
Net would be today. Netscape are guilty for some of that too, but who was
truly their competition at the time? They probably developed browsing
somewhat selfishly.

> example.com/some_sript.cgi?abc
> abc parameter fits [a-c]* - any length, any combination. Try to block
> crawlers (using robots.txt) to not allow parameters longer than 3
> characters.

I believe that less technically-inclined Webmasters would find this confusing
and then make unfortunate mistakes. I guess the same could be argued w.r.t.
the rest of site development though.

> And it is not a case made especially for this post, that's the real
> problem I had to hack last summer.

You could have gotten a sponsorship from Google perhaps. They will have even
sent you a masseuse and a red bouncing ball.

Roy S. Schestowitz
http://Schestowitz.com  |    SuSE Linux     ¦     PGP-Key: 0x74572E8E
  1:30pm  up 11 days  6:07,  11 users,  load average: 2.12, 1.86, 1.80
      http://iuron.com - help build a non-profit search engine

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index