On Fri, 12 May 2006 17:15:08 +0100, Roy Schestowitz
<newsgroups@xxxxxxxxxxxxxxx> opined:
> __/ [ David Cary Hart ] on Friday 12 May 2006 17:01 \__
>
> > Who is using the UA and why? I cannot even find were to obtain it.
> > This seems to be a Hoover yet I have yet to see an instance where
> > any have checked robots.txt. I have been redirecting these. Am I
> > losing legitimate traffic?
>
> What's the nature of the site? MATLAB, whose is heavily based on
> Java, can be used as a (fairly rudimentary) Web browser, so
> denying it might not be a good idea. It is primarily used for
> up-to-date documentation. That is why I am asking about the nature
> of your site.
Anti-spam and DNSBL.
>
> Additionally, see the if the paths (sequence of requested files)
> seems to characterise these as human visitors rather than some
> experimental bot, a lamer, or a leech.
Multiple, simultaneous gets. Never a referrer. I am watching it
closely for false positives. These are added to the firewall for 30
minutes (after notification) and generate an email to root.
> Don't neglect the
> possibility of spoofing. In the past week alone, two people in
> the search engine newsgroups reported being whacked by Google
> or Yahoo. Upon closer inspection, there were probably fakers (at
> least one of them confirmed).
>
> wget -R --user-agent="Java/1.2.4_55" your_site_URL\
That's how I test the setup. Spoofs are more likely to generate a
Mozilla string. Leeches are easy to spot because the consecutively
lack a referrer.
>
--
Displayed Email Address is a SPAM TRAP
Our DNSRBL - Eliminate Spam: http://www.TQMcube.com
Multi-RBL Check: http://www.TQMcube.com/rblcheck.php
The Dirty Dozen Spammiest Ranges: http://tqmcube.com/dirty12.php
|
|