Skip to content Skip to sidebar Skip to footer

Python Googletrans Encoding Weird Chars

I have an ui which takes german language among other things and translate these in english sentences. # -*- coding: utf-8 -*- from googletrans import Translator def tr(s) transl

Solution 1:

The translated text contains two embedded zero-width space (\u200b') characters.

>>> res = t.translate(wordDE, src='de', dest='en').text
>>> res
'Pascal and PHP are programming languages \u200b\u200bfor software developers and engineers.'

The text editor appears to the decoding the file as cp1252 (or a similar MS 8-bit encoding), hence the mojibake:

>>> res.encode('utf-8').decode('cp1252')
'Pascal and PHP are programming languages ​​for software developers and engineers.'

This is a known bug is the Google Translate API. Pending a fix, you can use str.replace to create a new string that does not contain these characters:

>>> new_res = res.replace('\u200b', '')
>>> new_res
'Pascal and PHP are programming languages for software developers and engineers.'

Post a Comment for "Python Googletrans Encoding Weird Chars"