Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Daily Crawlers Breakdown

  • Subject: Re: Daily Crawlers Breakdown
  • From: Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx>
  • Date: Sun, 01 Jan 2006 17:13:16 +0000
  • Newsgroups: alt.internet.search-engines
  • Organization: schestowitz.com / MCC / Manchester University
  • References: <dp6fpl$8p4$3@godfrey.mcc.ac.uk> <41o0gtF1f4phkU1@individual.net> <dp6jqk$9ve$1@godfrey.mcc.ac.uk> <41o3j2F1f3h7rU1@individual.net>
  • Reply-to: newsgroups@xxxxxxxxxxxxxxx
  • User-agent: KNode/0.7.2
__/ [Brian Wakem] on Saturday 31 December 2005 19:13 \__

> Roy Schestowitz wrote:
> 
>> __/ [Brian Wakem] on Saturday 31 December 2005 18:21 \__
>>> $ ./bot /usr/local/apache2/logs/access_log
>>> Googlebot       10595
>>> Yahoo! Slurp    1326
>>> msnbot          12
>> 
>> Thanks a bunch, Brian. Since Perl is double-dutch to me, is there any way
>> of having the above script separate the numbers by day? I only had a
>> shallow look and I suspect the functionality is there, somewhere.
> 
> 
> It just totals up all the bot hits in the file.  Our logs are rotated daily
> so grouping by date has never been an issue.
> 
> 
> The following should work, though I don't have a multi-day log to test.
> 
> The dates will not sort correctly, I haven't got time to write a sub to
> convert into a sortable format.
> 
> 
> 
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> my @bots = ('Googlebot','Yahoo! Slurp','msnbot');
> my %date;
> open (LOG, "< $ARGV[0]") or die "Can't open log ($ARGV[0]) - $!";
> while(<LOG>){
>         chomp;
>         if (m!\[(\d+/\w+/\d{4}):.*("(.*?)" ".*?"$)!) {
>                 my $date = $1;
>                 my $ua = $2;
>                 foreach (@bots) {
>                         if (index ($ua, $_) != -1){
>                                 $date{$date}{$_}++;
>                                 last;
>                         }
>                 }
>         }
> }
> close LOG;
> 
> foreach my $date( sort { $date{$b} cmp $date{$a} } keys %date ) {
>         foreach my $ua( sort keys %{$date{$date}} ) {
>                 printf "%-15s%-18s%d\n",$date,$ua,$date{$date}{$ua};
>         }
>         print "\n";
> }

Thanks again, Brian. I have just tested it. It works brilliantly. It doesn't
show me the numbers I was hoping to see, but it's a valuable tool which I
will definitely use again in the future. I still have (and use) your
"extract URL's from HTML" Perl one-liner. It was tailored for Borek, but
simplified my problems too.

Roy

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index