Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Report for pages with NO hits?

  • Subject: Re: Report for pages with NO hits?
  • From: Roy Schestowitz <newsgroups@schestowitz.com>
  • Date: Thu, 16 Jun 2005 03:31:23 +0100
  • Newsgroups: alt.www.webmaster
  • References: <MPG.1d1a3ed76df8cae29897de@newshost.allthenewsgroups.com> <Xns96769F5AFD27Egandalfparker@208.201.224.154> <Xns9676D5BA07BF4castleamber@130.133.1.4>
  • User-agent: KNode/0.7.2
John Bokma wrote:

> Gandalf  Parker wrote:
> 
>> Byron <spamagnet@dorrk.com> wrote in
>> news:MPG.1d1a3ed76df8cae29897de@newshost.allthenewsgroups.com:
>> 
>>> Is there a piece of software that can cross-reference the site root
>>> with the log files and produce a list of files that have had hits in
>>> the last 6 months?
>> 
>> You mean have NOT had hits? I could write one I think. Basically it
>> would list the files in the directory, then search for each one in the
>> logs.
> 
> Ouch, O(n^2).
> 
> Better, hash each page URI from the log, preferable with a count. ( O(n) )
> 
> Next, find each page in the htdocs directory, turn it into a relative URI,
> check if it's in the hash, if not, add it and set the count to 0. ( O(n)
> ).

Or just write a naive shell script and let the machine sweat overnight. The
real problems crop up when re-using the scripts for very large sites
(>10,000 files) with very big (>100 MB) logs.

Roy

-- 
Roy S. Schestowitz
http://Schestowitz.com

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index