How To Split A String And Keep The Separators In It
@edzech asked how was it possible to split a string and keep the separators in it. His question was marked as duplicate, whereas the approach here is different than the 'duplicate'
Solution 1:
In the proposed, solution a single opening < or closing > which are not part of a pair <> are excluded from the result.
If you also want to keep a < or > you could use:
<[^<>]*>|(?:(?!<[^<>]*>).)+
Explanation
- <[^<>]*>Match opening- <, then 0+ times not- >, then a closing- >
- |Or
- (?:(?!<[^<>]*>).)+Tempered greedy token, match any char if what is directly on the right is not the opening till closing pattern
For example:
import re
content = "<abc>d<e><f>ghi<j>test><g>"
result = re.findall(r"<[^<>]*>|(?:(?!<[^<>]*>).)+", content)
print(result)
Result
['<abc>', 'd', '<e>', '<f>', 'ghi', '<j>', 'test>', '<g>']Solution 2:
Here is the solution.
import re
content = "<abc>d<e><f>ghi<j>"
result = re.findall(r"<.*?>|[^<>]+", content)
print(result)
Output:
['<abc>', 'd', '<e>', '<f>', 'ghi', '<j>']
Explanations:
- regex <.*?>means everything that matches<content>
- regex [^<>]+means everything else
In brief, findall will find everything that matches <content>, otherwise, everything else. That way, the content will be split without losing the separators.
Post a Comment for "How To Split A String And Keep The Separators In It"