__/ [ John Bokma ] on Thursday 08 June 2006 05:23 \__
> Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx> wrote:
>
>> __/ [ Borek ] on Wednesday 07 June 2006 20:03 \__
>>
>>> On Wed, 07 Jun 2006 20:54:28 +0200, Sholom <sdeen@xxxxxxxxxxxx>
>>> wrote:
>>>
>>>> Anyone know how to allow Google's robots to index protected content?
>>>>
>>>> My company has a site that requires a subscription to access the
>>>> info, but we'd like to have google index those pages. I see there
>>>> are many sites who've managed this.
>>>
>>> Easy way to get banned.
>>>
>>> I hate sites that are indexed but not accessible. Usually I do two
>>> things at the same time - first, I read cached content. Second, I
>>> report such site to Google.
>>
>> There is a way around this. Change user-agent string to googlebot and
>> you're in.
>
> If they check for that, yup. Some sites check for the crawlers, based on
> IP or name.
In worse scenarios, if you have no browser extensions, wget can be used to
fetch the page in question. There's the "--user-agent" option.
>> To be honest, I didn't know this trick until somebody told
>> me last week.
>
> Wasn't me, but 2+ years ago:
> http://johnbokma.com/mexit/2004/04/24/changinguseragent.html
>
> Funny, I notice that I have a link to report spam with google on my site
> :-D My site is getting too big. Or maybe I should say: a site is getting
> good when you limit Google to your site when looking for some info (which
> I do now and then, I even made a special keymark for it :-D)
*smile* I can remember the time when I ceased to maintain the sitemap and
lost that visual, conceptual idea of how my site was constructed. It is now
somewhat of a messy Web, which I sometimes try to restructure. Same
situation with E-mail accounts, Web hosts, and domain names.
Best wishes,
Roy
--
Roy S. Schestowitz | Othello for Win32/Linux: http://othellomaster.com
http://Schestowitz.com | Free as in Free Beer ¦ PGP-Key: 0x74572E8E
8:15am up 41 days 13:48, 11 users, load average: 0.95, 0.81, 0.77
http://iuron.com - semantic engine to gather information
|
|