Skip to content Skip to sidebar Skip to footer

Pandas Str.contains - Search For Multiple Values In A String And Print The Values In A New Column

I just started coding in Python and want to build a solution where you would search a string to see if it contains a given set of values. I've find a similar solution in R which us

Solution 1:

You need to set the regex flag (to interpret your search as a regular expression):

whatIwant = df['Column_with_text'].str.contains('value1|value2|value3',
                                                 case=False, regex=True)

df['New_Column'] = np.where(whatIwant, df['Column_with_text'])

------ Edit ------

Based on the updated problem statement, here is an updated answer:

You need to define a capture group in the regular expression using parentheses and use the extract() function to return the values found within the capture group. The lower() function deals with any upper case letters

df['MatchedValues'] = df['Text'].str.lower().str.extract( '('+pattern+')', expand=False)        

Solution 2:

Here is one way:

foods =['apples', 'oranges', 'grapes', 'blueberries']

def matcher(x):
    for i in foods:
        if i.lower() in x.lower():
            return i
    else:
        return np.nan

df['Match'] = df['Text'].apply(matcher)

#                                           Text        Match# 0                   I want to buy some apples.       apples# 1             Oranges are good for the health.      oranges# 2                  John is eating some grapes.       grapes# 3  This line does not contain any fruit names.          NaN# 4            I bought 2 blueberries yesterday.  blueberries

Post a Comment for "Pandas Str.contains - Search For Multiple Values In A String And Print The Values In A New Column"