Python Googletrans Encoding Weird Chars
I have an ui which takes german language among other things and translate these in english sentences. # -*- coding: utf-8 -*- from googletrans import Translator def tr(s) transl
Solution 1:
The translated text contains two embedded zero-width space (\u200b'
) characters.
>>> res = t.translate(wordDE, src='de', dest='en').text
>>> res
'Pascal and PHP are programming languages \u200b\u200bfor software developers and engineers.'
The text editor appears to the decoding the file as cp1252 (or a similar MS 8-bit encoding), hence the mojibake:
>>> res.encode('utf-8').decode('cp1252')
'Pascal and PHP are programming languages ​​for software developers and engineers.'
This is a known bug is the Google Translate API. Pending a fix, you can use str.replace to create a new string that does not contain these characters:
>>> new_res = res.replace('\u200b', '')
>>> new_res
'Pascal and PHP are programming languages for software developers and engineers.'
Post a Comment for "Python Googletrans Encoding Weird Chars"