Skip to content Skip to sidebar Skip to footer

Multiple Regex Replacements Based On Lists In Multiple Files

I have a folder with multiple text files inside that I need to process and format using multiple replacement lists looking like this: old string1~new string1 old string2~new strin

Solution 1:

Unless your python code is really bad, it is not likely that switching to awk will make it more maintainable. That said, it's pretty simple in awk, but does not scale well:

cat replacement-list-files* | awk 'FILENAME == "-" { 
  split( $0, a, "~" ); repl[ a[1] ] = a[2]; next }
  { for( i in repl ) gsub( i, repl[i] ) }1' - input-file

Note that this works on one file at a time. Replace 1 with something like { print > ( FILENAME ".new" ) } to work on multiple files, but then you have to deal with closing the files if you want to work on a large number of files, and it quickly becomes an unmaintainable mess. Stick with Python if you already have a working solution.

Solution 2:

Here's the regular expression replacement script (mostly just cosmetically different from what @WilliamPursell posted):

   awk -F'~''
   NR==FNR{ map[$1] = $2; next }
   {
      for (old in map) {
         gsub(old,map[old]
      }
   }
   ' /wherever/mappingFile file

but here's the string replacement script that I think you really need:

   awk -F'~' '
   NR==FNR{ map[$1] =$2; next }
   {
      for (old in map) {
         rlength = length(old)
         while (rstart = index($0,old)) {
            $0= substr($0,1,rstart-1) map[old] substr($0,rstart+rlength)
         }
      }
   }
   ' /wherever/mappingFile file

In either case just enclose it in a shell loop to affect multiple files:

for file in *
do
   awk -F'~''...' /wherever/mappingFile "$file" > tmp && mv tmp "$file"done

Post a Comment for "Multiple Regex Replacements Based On Lists In Multiple Files"