Roy Schestowitz |
IntroductionThis simple yet powerful script scans a collection of files (HTML files in this one particular example case although any file type would do). It attempts to find a string described by a regular expression and then replace it with another. In simpler terms, it is a command-line "search and replace". Why command line? When a large number of static pages have to be changed similarly, only scripting is the answer. ExplanationBelow is some template code. By varying the code, it is possible to add a file footer, add header information or fix a frequently-repeated (or duplicated) typo. For instance, one can replace all occurrences of "dogg" with "dog" in text files in the current directory. Have a glance at the following code (expressions changed to generic words):find . -type f -name '*' -print | while read filename do ( sed 's/[OLD_TEXT]/[NEW_TEXT]/i;' $filename >$filename.xxxxx mv $filename.xxxxx $filename # replace output files with original ) done ExampleTo give an example that is practical, the code below adds an important link to the RSS feed on the site. The link is essentially appended to the elements in the header section. # add_header - change header of all files in the current directoryfind . -maxdepth 1 -type f -name '*.html' -print | while read filename do ( sed 's/<head>/<head>\n<link rel="alternate" type="application\/rss+xml" title="Your site" href="\/feed.php">/i;' $filename >$filename.xxxxx mv $filename.xxxxx $filename # replace output files with original ) done RecursionAn extra script will apply the changes to all HTML files in the current directory (note that it might need changing to account for global should be used. It can be downloaded from this site although the origin is another Linux user's group.
Let us say that files to be processed are located under ~/my_files and so is the example script above, add_header and global . The files to be processed can reside at any directory depth and will be reachable by the following:
cd ~/my_files/ global ~/my_files/add_header Note that the full path of add_header needs to be specified because of the recursion that affects relative paths.
AcknowledgementThanks to Toby Inkster for suggesting a way to handle regular expressions. |
This page was last modified on June 12th, 2005 | Maintained by Roy Schestowitz |