Re: Daily Crawlers Breakdown

Home	Messages Index

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index

Re: Daily Crawlers Breakdown

Subject: Re: Daily Crawlers Breakdown
From: Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx>
Date: Sun, 01 Jan 2006 18:26:39 +0000
Newsgroups: alt.internet.search-engines
Organization: schestowitz.com / MCC / Manchester University
References: <dp6fpl$8p4$3@godfrey.mcc.ac.uk> <op.s2pny8g326l578@borek>
Reply-to: newsgroups@xxxxxxxxxxxxxxx
User-agent: KNode/0.7.2

__/ [Borek] on Sunday 01 January 2006 17:53 \__

> On Sat, 31 Dec 2005 18:39:33 +0100, Roy Schestowitz
> <newsgroups@xxxxxxxxxxxxxxx> wrote:
> 
>> Does anybody know a (preferably free) tool that will extract crawlers
>> data from raw log files and produce a day-by-day breakdown of traffic
>> from each search engine? I can see total (aggregated) volumes using
>> available tools, but they tend to be more (human) visitor-oriented.
> 
> My logs are already cut into days, so all I need to do is
> 
> grep inktomisearch.com statslog.20060101.txt | grep -c inktomi
> grep msnbot statslog.20060101.txt | grep -c msnbot
> grep googlebot statslog.20060101.txt | grep -c googlebot
> 
> But then - even if you have all logs kept in one file - it should
> be enough to modify above to something like
> 
> egrep "googlebot.*31/Dec/2005" yourlogfilename | grep -c googlebot
> 
> It is only a question of finding proper string to search for :)
> 
> Best,
> Borek

Thanks, Borek.

I only began to think about this approach as the discussion progressed. I
guess there are a variety of other analyses such as page requests, hits,
frequency, and visualisation of the results, e.g. using a graph. Calc or
gnuplot can do that, but not in a serialised fashion, which is the advantage
of self-contained stats packages.

John Bokma wrote a script to analyse Google crawling behaviour and write the
output in CSV or tab delimited values.

Best wishes,

Roy

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index