Re: Daily Crawlers Breakdown

Home	Messages Index

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index

Re: Daily Crawlers Breakdown

Subject: Re: Daily Crawlers Breakdown
From: Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx>
Date: Sat, 31 Dec 2005 18:53:26 +0000
Newsgroups: alt.internet.search-engines
Organization: schestowitz.com / MCC / Manchester University
References: <dp6fpl$8p4$3@godfrey.mcc.ac.uk> <41o0gtF1f4phkU1@individual.net> <dp6jqk$9ve$1@godfrey.mcc.ac.uk>
Reply-to: newsgroups@xxxxxxxxxxxxxxx
User-agent: KNode/0.7.2

__/ [Roy Schestowitz] on Saturday 31 December 2005 18:48 \__

> __/ [Brian Wakem] on Saturday 31 December 2005 18:21 \__
> 
>> Roy Schestowitz wrote:
>> 
>>> Does anybody know a (preferably free) tool that will extract crawlers
>>> data from raw log files and produce a day-by-day breakdown of traffic
>>> from each search engine? I can see total (aggregated) volumes using
>>> available tools, but they tend to be more (human) visitor-oriented.
>>> 
>>> [...]
>> 
>> 
>> I only monitor the big 3.
>> 
>> 
>> [code]

>> $ ./bot /usr/local/apache2/logs/access_log
>> Googlebot       10595
>> Yahoo! Slurp    1326
>> msnbot          12
> 
> Thanks a bunch, Brian. Since Perl is double-dutch to me, is there any way
> of having the above script separate the numbers by day? I only had a
> shallow look and I suspect the functionality is there, somewhere.
> 
> It's important to me as I suspect a certain batch of links (WordPress
> support) encouraged a lot of crawling, but I can't tell to what extent, if
> at all. I haven't kept track of a daily running sum, so I need to look at
> this in retrospect. Visitors and AWStats haven't got this functionality. I
> don't know about Analytics, but I can never use it properly.

Never mind that. I can slice the log files, which is not ideal (somewhat
manual), but should work nonetheless. I just find an editor that can handle
large files or (at second thought) make use of fgrep, something along the
lines of

fgrep /usr/local/apache2/logs/access_log "30/Dec/" >29_dec_log
fgrep /usr/local/apache2/logs/access_log "30/Dec/" >30_dec_log
...


Thanks again,

Roy

References:
- Daily Crawlers Breakdown
  - From: Roy Schestowitz
- Re: Daily Crawlers Breakdown
  - From: Roy Schestowitz

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index