Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Hidden Web Pages Access

__/ [Andrew] on Friday 04 November 2005 16:49 \__

> In comp.infosystems.www.authoring.html on Tuesday 01 November 2005 05:56,
> Roy Schestowitz wrote:
> 
>> __/ [Leif K-Brooks] on Tuesday 01 November 2005 01:16 \__
>> 
>>> Roy Schestowitz wrote:
>>>> This will not avoid spyware like Alexa/A9/Amazon toolbars (among
>>>> more) from crawling your password-protected pages, but it will  at
>>>> least turn away human users who ought to remain outside. To
>>>> understand how this  works  (essentially JavaScript), look at the
>>>> source, change  it  and save it.
>>> 
>>> Anyone with a clue can turn off JavaScript support in their browser.
>>> Security should not depend on clueless attackers.
>> 
>> No JavaScript, no entry. *smile*
>> 
>> ...still better than ActiveX
>> 
>> ActiveX enabled, anybody in (including hijackers)
>> 
>> Roy
> 
> There is a flaw with this method, which is that if someone is visiting your
> "private" page and then goes to a completely different website, the private
> URL will be passed as the referrer to that website. Many websites' referrer
> logs are publicly available (this may or may not be with the
> intention/knowledge of the webmaster) and therefore, potentially, the links
> could be accessed by search engines, so your private content could appear
> in a search engine's results.


I never thought about this route. Thanks for pointing that out.


> A partial solution, which I recommend you use, is to put the following in
> the head section of each private page.
> 
> <meta name="ROBOTS" content="NOINDEX,NOFOLLOW,NOARCHIVE">


If the page contains sensitive content, I suppose 'shielding' it would indeed
be worthwhile. I would only like to stress that the information which I
'hide' is not confidential, yet it should never be easily-accessible.
Private material like Palm data has always been password-protected.

 
> This only works with some search engines (but the major ones should all act
> on it).
> 
> The preferred method of controlling search engine spiders is to use a
> robots.txt file but this will have two drawbacks:
> 
> 1. You might not have access to the root directory of the domain or
> subdomain, which is where the robots.txt needs to go.
> 2. In any event, some people look at a site's robots.txt to "discover"
> directories the site owner would rather weren't known about, hence it is
> definitely *not* recommended for your situation.


Yes, I once thought about it. Pages and sections where I deny crawlers access
at robots.txt-level are either:

- Sections that contains names, which I would rather people did not 'Google'
(or 'Yahooed' etc.)

- Sections that are too extensive to be crawled as they will add 'noise' to
indices of the search engines.

Roy

-- 
Roy S. Schestowitz      | Useless fact: A dragonfly only lives for one day
http://Schestowitz.com  |    SuSE Linux     |     PGP-Key: 0x74572E8E
  5:15am  up 2 days  1:13,  4 users,  load average: 0.25, 0.46, 0.42
      http://iuron.com - next generation of search paradigms

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index