Skip to content Skip to sidebar Skip to footer

Regex Syntax For Replacing Multiple Strings: Where Have I Gone Wrong?

I have a dataframe with the column 'purpose' that has a lot of string values that I want to standardize by finding a string and replacing it. For instance, some very similar values

Solution 1:

You are making the regular expression too restrictive and using the wrong character for alternation. You can use \b to match a word boundary, | to match multiple patterns and IGNORECASE to cover case issues. So for example

credit_data.purpose.str.replace(r'\b(real estate|housing|house|property)\b',
    'Real Estate', regex=True, flags=re.IGNORECASE)

If you want to replace the entire string, you can use dot-all (.*).

credit_data.purpose.str.replace(r'.*(real estate|housing|house|property).*',
    'Real Estate', regex=True, flags=re.IGNORECASE)

Post a Comment for "Regex Syntax For Replacing Multiple Strings: Where Have I Gone Wrong?"