Skip to content Skip to sidebar Skip to footer

How To Split A String And Keep The Separators In It

@edzech asked how was it possible to split a string and keep the separators in it. His question was marked as duplicate, whereas the approach here is different than the 'duplicate'

Solution 1:

In the proposed, solution a single opening < or closing > which are not part of a pair <> are excluded from the result.

If you also want to keep a < or > you could use:

<[^<>]*>|(?:(?!<[^<>]*>).)+

Explanation

  • <[^<>]*> Match opening <, then 0+ times not >, then a closing >
  • | Or
  • (?:(?!<[^<>]*>).)+ Tempered greedy token, match any char if what is directly on the right is not the opening till closing pattern

Regex demo | Python demo

For example:

import re
content = "<abc>d<e><f>ghi<j>test><g>"
result = re.findall(r"<[^<>]*>|(?:(?!<[^<>]*>).)+", content)
print(result)

Result

['<abc>', 'd', '<e>', '<f>', 'ghi', '<j>', 'test>', '<g>']

Solution 2:

Here is the solution.

import re

content = "<abc>d<e><f>ghi<j>"
result = re.findall(r"<.*?>|[^<>]+", content)

print(result)

Output:

['<abc>', 'd', '<e>', '<f>', 'ghi', '<j>']

Explanations:

  • regex <.*?> means everything that matches <content>
  • regex [^<>]+ means everything else

In brief, findall will find everything that matches <content>, otherwise, everything else. That way, the content will be split without losing the separators.

Post a Comment for "How To Split A String And Keep The Separators In It"