Multiple Regex Replacements Based On Lists In Multiple Files
Solution 1:
Unless your python code is really bad, it is not likely that switching to awk will make it more maintainable. That said, it's pretty simple in awk, but does not scale well:
cat replacement-list-files* | awk 'FILENAME == "-" {
split( $0, a, "~" ); repl[ a[1] ] = a[2]; next }
{ for( i in repl ) gsub( i, repl[i] ) }1' - input-file
Note that this works on one file at a time. Replace 1
with something like { print > ( FILENAME ".new" ) }
to work on multiple files, but then you have to deal with closing the files if you want to work on a large number of files, and it quickly becomes an unmaintainable mess. Stick with Python if you already have a working solution.
Solution 2:
Here's the regular expression replacement script (mostly just cosmetically different from what @WilliamPursell posted):
awk -F'~''
NR==FNR{ map[$1] = $2; next }
{
for (old in map) {
gsub(old,map[old]
}
}
' /wherever/mappingFile file
but here's the string replacement script that I think you really need:
awk -F'~' '
NR==FNR{ map[$1] =$2; next }
{
for (old in map) {
rlength = length(old)
while (rstart = index($0,old)) {
$0= substr($0,1,rstart-1) map[old] substr($0,rstart+rlength)
}
}
}
' /wherever/mappingFile file
In either case just enclose it in a shell loop to affect multiple files:
for file in *
do
awk -F'~''...' /wherever/mappingFile "$file" > tmp && mv tmp "$file"done
Post a Comment for "Multiple Regex Replacements Based On Lists In Multiple Files"