Roy Schestowitz |
What is it for?Rather than performing numerous Google searches and rather than browsing to determine Pagerank, let Perl summarise large chunks of figures in a single text file. This can be done in a cyclic manner by setting up a cron job. InstallationTwo Perl scripts are necessary (whether you need both depends on what figures you wish to collect from Google):
WWW::Google::PageRank module. Open a shell as root and type in:
perl -MCPAN -e shell
This essentially configures the CPAN module, which then enables quick installation of new modules. Once CPAN is set up, simply type in: install WWW::Google::PageRank
Having got the PageRank script ( gpr.pl ) and the Google Suggest script (gsuggest.pl ), set up a file named google , for example, and set its permissions to be executable:
chmod 700 google
The file should contain a list of tasks to run. In my case, for instance:
echo 'Schestowitz' :
Needless to mention, all files must be available by forming a part of the PATH or by being put in the same directory.
UsageThen, by running:google >output.txt
all results should be put in a single text file. AutomationA cron job such as:50 20 * * * [YOUR_PATH]/google >[YOUR_PATH]/output.txt
will update the file every night. If you are not familiar with cron jobs, now is a good time to find out. To find out only about differences, i.e. to see changes in terms of numbers, one can do the following:
30 22 * * * [YOUR_PATH]/google >[YOUR_PATH]/GoogleCron/new
[YOUR_PATH]/GoogleCron/old needs to be set up as the 'template' (base) file, but there is plenty of room for extension of this idea.
Greedy QueriesIf the above does not provide sufficient infomation or is too static, follow the steps below. There is a way of keeping track of SERP's, links, and corresponding PageRank. The Perl code is very bandwidth-greedy so it should be use with great restraint.
Detecting Changes in Regular PagesYou can in principle fetch a Web page every night and have changes flagged to you. The approach is similar to the one above, but it relies on
35 22 * * * cd [YOUR_PATH]/Syndication/
The file named LinksSee the HTML syndication page which provides a more systematic way of syndicating standard, static Web pages. |
This page was last modified on April 23rd, 2005 | Maintained by Roy Schestowitz |