Regex Syntax For Replacing Multiple Strings: Where Have I Gone Wrong?
I have a dataframe with the column 'purpose' that has a lot of string values that I want to standardize by finding a string and replacing it. For instance, some very similar values
Solution 1:
You are making the regular expression too restrictive and using the wrong character for alternation. You can use \b
to match a word boundary, |
to match multiple patterns and IGNORECASE to cover case issues. So for example
credit_data.purpose.str.replace(r'\b(real estate|housing|house|property)\b',
'Real Estate', regex=True, flags=re.IGNORECASE)
If you want to replace the entire string, you can use dot-all (.*
).
credit_data.purpose.str.replace(r'.*(real estate|housing|house|property).*',
'Real Estate', regex=True, flags=re.IGNORECASE)
Post a Comment for "Regex Syntax For Replacing Multiple Strings: Where Have I Gone Wrong?"