Skip to content Skip to sidebar Skip to footer

Fast Way To Replace A String According To Dict

I want to replace a very long string according to dictionary. My code is like that: def rep(self, mystr, dict): new_pstr = '' for char in mystr: try: ne

Solution 1:

Consulte this Tutorial. It may help you.

The method translate() returns a copy of the string in which all characters have been translated using table (constructed with the maketrans() function in the string module), optionally deleting all characters found in the string deletechars.

Example code:

Following is the example to delete 'x' and 'm' characters from the string:

#!/usr/bin/python

from string import maketrans   # Required to call maketrans function.

intab = "aeiou"
outtab = "12345"
trantab = maketrans(intab, outtab)

str = "this is string example....wow!!!";
print str.translate(trantab, 'xm')

This will produce following result −

th3s 3s str3ng 21pl2....w4w!!!

The source of this code.


Solution 2:

What you seem to want to do is change the chars to their ascii equivalent, the Unidecode lib will do that for you, all you need to do with the string is decode and pass it to unidecode.unidecode:

In [8]: s = "tériniñ yiriklişip kétişi havadiki nemlikniñ tövenlep ketkenlikidin bolup ، bu vaqitta tére téximu qurğaqlişip kétidu ، tériniñ ilastikiliqi acizlap ، xünük bolup qalidu. şuña xanim – qizlar bundaq vaqitta tére qurğaqlişişniñ aldini alidiğan çare– tedbirlerni qolliniş kérek. nemlikni saqlaşta yuquri dericilik su toluqlaş yüzlüki، hesel ve örük méğizi méyiğa muvapiq miqdarda un arilaşturup melhem qilip yüzge çaplap bers e، yaki nemxuşluqi yuquri bolğan tére nemleştürüş vazilin méyi sürüp berse، qurğaq térige su toluqlaşqa paydiliq."

In [9]: unidecode.unidecode(s.decode("utf-8"))
Out[9]: 'terinin yiriklisip ketisi havadiki nemliknin tovenlep ketkenlikidin bolup , bu vaqitta tere teximu qurgaqlisip ketidu , terinin ilastikiliqi acizlap , xunuk bolup qalidu. suna xanim - qizlar bundaq vaqitta tere qurgaqlisisnin aldini alidigan care- tedbirlerni qollinis kerek. nemlikni saqlasta yuquri dericilik su toluqlas yuzluki, hesel ve oruk megizi meyiga muvapiq miqdarda un arilasturup melhem qilip yuzge caplap bers e, yaki nemxusluqi yuquri bolgan tere nemlesturus vazilin meyi surup berse, qurgaq terige su toluqlasqa paydiliq.'

Or if you have multiple characters as values, this is a faster working version of your own logic:

In [27]: from itertools import chain
In [28]: d = {k:v[0] for k,v in d.items()}
In [29]:  "".join([d[ch] if ch in d else ch for ch in chain.from_iterable(s)])
Out[29]: 'terinig yiriklisip ketisi yavadiki nemliknig tuvenlep ketkenlikidin bulup ، bu vakitta tere teximu kurgaklisip ketidu ، terinig ilastikiliki acizlap ، xvnvk bulup kalidu. suga xanim – kizlar bundak vakitta tere kurgaklisisnig aldini alidigan çare– tedbirlerni kullinis kerek. nemlikni saklasta yukuri dericilik su tuluklas yvzlvki، yesel ve urvk megizi meyiga muvapik mikdarda un arilasturup melyem kilip yvzge çaplap bers e، yaki nemxusluki yukuri bulgan tere nemlestvrvs vazilin meyi svrvp berse، kurgak terige su tuluklaska paydilik.'

Also the correct way to use str.translate with unicode is to use the ord of the characters

table =({ord(k):ord(ch) for k ,v in d.items() for ch in v})

s.translate(table)

Solution 3:

I get the final answer by combining the answers from @Padraic Cunningham @PRVS, this version is 100 times faster than my original code.

 new_d = {ord(k): ord(v[0]) for k, v in d.items()} # ord for Unicode characters
 mystr.translate(d)

If your code don't have any Unicode characters, please check @PRVS answer.


Post a Comment for "Fast Way To Replace A String According To Dict"