Roy Schestowitz |
IntroductionMany of us have used or heard of feeds, also referred to as RSS or XML. Content can be delivered in large batches from various source and then fed, in digestable form, to the user who wishes to keep track of change over the Internet. The ability to track changes (or syndicate) a Web site has thus far been limited to sites that provide feeds. Although it is a rising trend, many small-scale, simplistic sites cannot afford the extra complexity. Apparently, the only alternative is to repeatedly navigate to the same site and attempt, as hard as it may seem, to identify changes, additions and news. This page introduces of a way of converting pages into pseudo-feeds, all for the convenience of the user Examples of Practical Uses
All the above is, in principle, very reminiscent of the advantages of using feeds (RSS being the technical phrase or jargon). ToolsThe method that follows allows any static Web page (or a page that is generated on-the-fly but cannot be syndicated) to be tracked. The idea is a rather simple one: get a copy of the page every day (or hour, or week) and compare its state with respect to the state of a previous copy. The power of this method is a result of two powerful *NIX tools:
With a report of change -- that is the output of the CodeAs an example, a script is included which scans the Manchester University Web site for news development: Here is a step-by-step explanation of what the script does (also see notes within the file)
Explanation: Go to the location of the script, where much of the file handling will be done echo ''> /home/roy/Desktop/NEW.txt
Explanation: Clean up the file that summarises previous changes OLD=mu_old
Explanation: The file name of the old version of the page NEW=mu.txt
Explanation: The file name of the newer version of the page SITE=http://www.manchester.ac.uk/press
Explanation: The full address of the page to syndicate FILENAME=index.html
Explanation: The file name that is expected to be downloaded from the address above wget -l0 -H -t0 -nd -N -np $SITE
Explanation: Download the page (file) from the supplied address mv $FILENAME $NEW
Explanation: Rename the downloaded file so that its name becomes more meaningful (avoids repetitions) diff $OLD $NEW >/home/roy/Desktop/$NEW
Explanation: Check for differences between the old and the new version and output it to Desktop mv $NEW $OLD
Explanation: The version of the page that has just been fetched is now considered old. This re-assures that the next run of the script will have a notion of 'old' data. echo $NEW':' >> /home/roy/Desktop/NEW.txt
Explanation: Write the name of the file in the summary (changes index) test -s /home/roy/Desktop/$NEW
Explanation: Test to see if the file is empty (indicating no change) echo $? >> /home/roy/Desktop/NEW.txt
Explanation: Record the status of the last command echo ''>> /home/roy/Desktop/NEW.txt echo ''>> /home/roy/Desktop/NEW.txt echo ''>> /home/roy/Desktop/NEW.txt Explanation: Enter a few blank lines to separate results from different syndication
|
This page was last modified on June 28th, 2005 | Maintained by Roy Schestowitz |