Skip to content Skip to sidebar Skip to footer

Regex To Extract A Set Number Of Words Around A Matched Word

I was looking around for a way to grab words around a found match, but they were much too complicated for my case. All I need is a regex statement to grab, lets say 10, words befor

Solution 1:

This regex will get you started

((?:\w*\s*){2})\s*word3\s*((?:\s*\w*){2})

Group 1 will have the words before your target and group 2 will have the words that come after

In the example I choose to capture 2 words but you can adjust this at will.

Let me know how it goes and if it works on your input.

You can improve your question by reading this short advice http://worksol.be/regex.html

enter image description here

Solution 2:

Here's a likely definition of "word": A string of non-space characters. Here's another: A string of letters and digits, but no punctuation. Python has convenient shortcuts for both.

\w is any "word" character with the second meaning (letters and digits), and \W is any other character. Use it like this:

m = re.search(r'((\w+\W+){0,4}grab(\W+\w+){0,4})', sentence)
print m.groups()[0]

If you prefer the first definition, just use \S (any character that's not a space) and \s (any space character):

re.search(r'((\S+\s+){0,4}grab(\s+\S+){0,4})', sentence)

You'll notice I'm matching zero to four words before and after. That way if your word is third in the sentence, you'll still get a match. (Searches are "greedy" so you'll always get four if it's possible).

Post a Comment for "Regex To Extract A Set Number Of Words Around A Matched Word"