Re: Hidden Web Pages Access

Home	Messages Index

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index

Re: Hidden Web Pages Access

Subject: Re: Hidden Web Pages Access
From: Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx>
Date: Sat, 05 Nov 2005 05:21:23 +0000
Newsgroups: comp.infosystems.www.authoring.html
Organization: schestowitz.com / MCC / Manchester University
References: <uAb9f.3665$zT6.1312@trnddc06> <vilain-23F146.16341830102005@comcast.dca.giganews.com> <dk4h5n$2tuh$1@godfrey.mcc.ac.uk> <Elz9f.905$ZA3.212090@monger.newsread.com> <dk6sjc$12km$3@godfrey.mcc.ac.uk> <11312121.129Vnf9cNV@4ax.com>
Reply-to: newsgroups@xxxxxxxxxxxxxxx
User-agent: KNode/0.7.2

__/ [Andrew] on Friday 04 November 2005 16:49 \__

> In comp.infosystems.www.authoring.html on Tuesday 01 November 2005 05:56,
> Roy Schestowitz wrote:
> 
>> __/ [Leif K-Brooks] on Tuesday 01 November 2005 01:16 \__
>> 
>>> Roy Schestowitz wrote:
>>>> This will not avoid spyware like Alexa/A9/Amazon toolbars (among
>>>> more) from crawling your password-protected pages, but it will  at
>>>> least turn away human users who ought to remain outside. To
>>>> understand how this  works  (essentially JavaScript), look at the
>>>> source, change  it  and save it.
>>> 
>>> Anyone with a clue can turn off JavaScript support in their browser.
>>> Security should not depend on clueless attackers.
>> 
>> No JavaScript, no entry. *smile*
>> 
>> ...still better than ActiveX
>> 
>> ActiveX enabled, anybody in (including hijackers)
>> 
>> Roy
> 
> There is a flaw with this method, which is that if someone is visiting your
> "private" page and then goes to a completely different website, the private
> URL will be passed as the referrer to that website. Many websites' referrer
> logs are publicly available (this may or may not be with the
> intention/knowledge of the webmaster) and therefore, potentially, the links
> could be accessed by search engines, so your private content could appear
> in a search engine's results.


I never thought about this route. Thanks for pointing that out.


> A partial solution, which I recommend you use, is to put the following in
> the head section of each private page.
> 
> <meta name="ROBOTS" content="NOINDEX,NOFOLLOW,NOARCHIVE">


If the page contains sensitive content, I suppose 'shielding' it would indeed
be worthwhile. I would only like to stress that the information which I
'hide' is not confidential, yet it should never be easily-accessible.
Private material like Palm data has always been password-protected.

 
> This only works with some search engines (but the major ones should all act
> on it).
> 
> The preferred method of controlling search engine spiders is to use a
> robots.txt file but this will have two drawbacks:
> 
> 1. You might not have access to the root directory of the domain or
> subdomain, which is where the robots.txt needs to go.
> 2. In any event, some people look at a site's robots.txt to "discover"
> directories the site owner would rather weren't known about, hence it is
> definitely *not* recommended for your situation.


Yes, I once thought about it. Pages and sections where I deny crawlers access
at robots.txt-level are either:

- Sections that contains names, which I would rather people did not 'Google'
(or 'Yahooed' etc.)

- Sections that are too extensive to be crawled as they will add 'noise' to
indices of the search engines.

Roy

-- 
Roy S. Schestowitz      | Useless fact: A dragonfly only lives for one day
http://Schestowitz.com  |    SuSE Linux     |     PGP-Key: 0x74572E8E
  5:15am  up 2 days  1:13,  4 users,  load average: 0.25, 0.46, 0.42
      http://iuron.com - next generation of search paradigms

References:
- Re: Web page access
  - From: Roy Schestowitz

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index