Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Allow robot access to protected content

__/ [ John Bokma ] on Thursday 08 June 2006 05:23 \__

> Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx> wrote:
>> __/ [ Borek ] on Wednesday 07 June 2006 20:03 \__
>>> On Wed, 07 Jun 2006 20:54:28 +0200, Sholom <sdeen@xxxxxxxxxxxx>
>>> wrote:
>>>> Anyone know how to allow Google's robots to index protected content?
>>>> My company has a site that requires a subscription to access the
>>>> info, but we'd like to have google index those pages. I see there
>>>> are many sites who've managed this.
>>> Easy way to get banned.
>>> I hate sites that are indexed but not accessible. Usually I do two
>>> things at the same time - first, I read cached content. Second, I
>>> report such site to Google.
>> There is a way around this. Change user-agent string to googlebot and
>> you're in.
> If they check for that, yup. Some sites check for the crawlers, based on
> IP or name.

In worse scenarios, if you have no browser extensions, wget can be used to
fetch the page in question. There's the "--user-agent" option.

>> To be honest, I didn't know this trick until somebody told
>> me last week.
> Wasn't me, but 2+ years ago:
> http://johnbokma.com/mexit/2004/04/24/changinguseragent.html
> Funny, I notice that I have a link to report spam with google on my site
> :-D My site is getting too big. Or maybe I should say: a site is getting
> good when you limit Google to your site when looking for some info (which
> I do now and then, I even made a special keymark for it :-D)

*smile* I can remember the time when I ceased to maintain the sitemap and
lost that visual, conceptual idea of how my site was constructed. It is now
somewhat of a messy Web, which I sometimes try to restructure. Same
situation with E-mail accounts, Web hosts, and domain names.

Best wishes,


Roy S. Schestowitz      | Othello for Win32/Linux: http://othellomaster.com
http://Schestowitz.com  | Free as in Free Beer ¦  PGP-Key: 0x74572E8E
  8:15am  up 41 days 13:48,  11 users,  load average: 0.95, 0.81, 0.77
      http://iuron.com - semantic engine to gather information

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index