Home

Roy Schestowitz

Command-line Recursive Search and Replace

Editing a large number of files using shell scripts

Introduction

This simple yet powerful script scans a collection of files (HTML files in this one particular example case although any file type would do). It attempts to find a string described by a regular expression and then replace it with another. In simpler terms, it is a command-line "search and replace". Why command line? When a large number of static pages have to be changed similarly, only scripting is the answer.

Explanation

Below is some template code. By varying the code, it is possible to add a file footer, add header information or fix a frequently-repeated (or duplicated) typo. For instance, one can replace all occurrences of "dogg" with "dog" in text files in the current directory.

Have a glance at the following code (expressions changed to generic words):

find . -type f -name '*' -print |
  while read filename
  do
    (
    sed 's/[OLD_TEXT]/[NEW_TEXT]/i;' $filename >$filename.xxxxx
    mv $filename.xxxxx $filename # replace output files with original
    )
done

Example

To give an example that is practical, the code below adds an important link to the RSS feed on the site. The link is essentially appended to the elements in the header section.

# add_header - change header of all files in the current directory

find . -maxdepth 1 -type f -name '*.html' -print |
  while read filename
  do
    (
    sed 's/<head>/<head>\n<link rel="alternate" type="application\/rss+xml" title="Your site" href="\/feed.php">/i;' $filename >$filename.xxxxx
    mv $filename.xxxxx $filename # replace output files with original
    )
done

Recursion

An extra script will apply the changes to all HTML files in the current directory (note that it might need changing to account for .htm files rather than .html).

To apply this function to all files in subdirectories and subsubdirectories (making its operation recursive), the script global should be used. It can be downloaded from this site although the origin is another Linux user's group. Let us say that files to be processed are located under ~/my_files and so is the example script above, add_header and global. The files to be processed can reside at any directory depth and will be reachable by the following:

cd ~/my_files/
global ~/my_files/add_header

Note that the full path of add_header needs to be specified because of the recursion that affects relative paths.

Acknowledgement

Thanks to Toby Inkster for suggesting a way to handle regular expressions.


This page was last modified on June 12th, 2005 Maintained by Roy Schestowitz