How To Split A String And Keep The Separators In It
@edzech asked how was it possible to split a string and keep the separators in it. His question was marked as duplicate, whereas the approach here is different than the 'duplicate'
Solution 1:
In the proposed, solution a single opening <
or closing >
which are not part of a pair <> are excluded from the result.
If you also want to keep a <
or >
you could use:
<[^<>]*>|(?:(?!<[^<>]*>).)+
Explanation
<[^<>]*>
Match opening<
, then 0+ times not>
, then a closing>
|
Or(?:(?!<[^<>]*>).)+
Tempered greedy token, match any char if what is directly on the right is not the opening till closing pattern
For example:
import re
content = "<abc>d<e><f>ghi<j>test><g>"
result = re.findall(r"<[^<>]*>|(?:(?!<[^<>]*>).)+", content)
print(result)
Result
['<abc>', 'd', '<e>', '<f>', 'ghi', '<j>', 'test>', '<g>']
Solution 2:
Here is the solution.
import re
content = "<abc>d<e><f>ghi<j>"
result = re.findall(r"<.*?>|[^<>]+", content)
print(result)
Output:
['<abc>', 'd', '<e>', '<f>', 'ghi', '<j>']
Explanations:
- regex
<.*?>
means everything that matches<content>
- regex
[^<>]+
means everything else
In brief, findall
will find everything that matches <content>
, otherwise, everything else. That way, the content will be split without losing the separators.
Post a Comment for "How To Split A String And Keep The Separators In It"