John Bokma wrote:
> Gandalf Parker wrote:
>
>> Byron <spamagnet@dorrk.com> wrote in
>> news:MPG.1d1a3ed76df8cae29897de@newshost.allthenewsgroups.com:
>>
>>> Is there a piece of software that can cross-reference the site root
>>> with the log files and produce a list of files that have had hits in
>>> the last 6 months?
>>
>> You mean have NOT had hits? I could write one I think. Basically it
>> would list the files in the directory, then search for each one in the
>> logs.
>
> Ouch, O(n^2).
>
> Better, hash each page URI from the log, preferable with a count. ( O(n) )
>
> Next, find each page in the htdocs directory, turn it into a relative URI,
> check if it's in the hash, if not, add it and set the count to 0. ( O(n)
> ).
Or just write a naive shell script and let the machine sweat overnight. The
real problems crop up when re-using the scripts for very large sites
(>10,000 files) with very big (>100 MB) logs.
Roy
--
Roy S. Schestowitz
http://Schestowitz.com
|
|