Regex To Extract A Set Number Of Words Around A Matched Word
Solution 1:
This regex will get you started
((?:\w*\s*){2})\s*word3\s*((?:\s*\w*){2})
Group 1 will have the words before your target and group 2 will have the words that come after
In the example I choose to capture 2 words but you can adjust this at will.
Let me know how it goes and if it works on your input.
You can improve your question by reading this short advice http://worksol.be/regex.html
Solution 2:
Here's a likely definition of "word": A string of non-space characters. Here's another: A string of letters and digits, but no punctuation. Python has convenient shortcuts for both.
\w
is any "word" character with the second meaning (letters and digits), and \W
is any other character. Use it like this:
m = re.search(r'((\w+\W+){0,4}grab(\W+\w+){0,4})', sentence)
print m.groups()[0]
If you prefer the first definition, just use \S
(any character that's not a space) and \s
(any space character):
re.search(r'((\S+\s+){0,4}grab(\s+\S+){0,4})', sentence)
You'll notice I'm matching zero to four words before and after. That way if your word is third in the sentence, you'll still get a match. (Searches are "greedy" so you'll always get four if it's possible).
Post a Comment for "Regex To Extract A Set Number Of Words Around A Matched Word"