How To Output A Utf-8 String List As It Is In Python?
Solution 1:
When you print (or write to a file) a list it internally calls the str()
method of the list , but list internally calls repr()
on its elements. repr()
returns the ugly unicode representation that you are seeing .
Example of repr -
>>> h = u'\u4f60\u597d'>>> print h
\u4f60\u597d
>>> printrepr(h)
u'\u4f60\u597d'
You would need to manually take the elements of the list and print them for them to print correctly.
Example -
>>>h1 = [h,u'\u4f77\u587f']>>>printu'[' + u','.join([u"'" + unicode(i) + u"'"for i in h1]) + u']'
For lists containing sublists that may have unicode characters, you would need a recursive function , example -
>>>h1 = [h,(u'\u4f77\u587f',)]>>>deflistprinter(l):...ifisinstance(l, list):...returnu'[' + u','.join([listprinter(i) for i in l]) + u']'...elifisinstance(l, tuple):...returnu'(' + u','.join([listprinter(i) for i in l]) + u')'...elifisinstance(l, (str, unicode)):...returnu"'" + unicode(l) + u"'"...>>>>>>>>>print listprinter(h1)
To save them to file, use the same list comprehension or recursive function. Example -
withopen('<filename>','w') as f:
f.write(listprinter(l))
Solution 2:
You are misunderstanding.
u''
in python is not utf-8, it is simply Unicode (except on Windows in Python <= 3.2, where it is utf-16
instead).
utf-8 is an encoding of Unicode, which is necessarily a sequence of bytes
.
Additionally, u'你'
and u'\u4f60'
are exactly the same thing. It's simply that in Python2 the repr
of high characters uses escapes instead of raw values.
Since Python2 is heading for EOL very soon now, you should start to think seriously about switching to Python3. It is a lot easier to keep track of all this in Python3 since there's only one string type and it's much more clear when you .encode
and .decode
.
Solution 3:
withopen("some_file.txt","wb") as f:
f.write(hellolist[0].encode("utf8"))
I think will resolve your issue
most text editors use utf8 encoding :)
while the other answers are correct none of them actually resolved your issue
>>> u'\u4f60\u597d'.encode("utf8")
'\xe4\xbd\xa0\xe5\xa5\xbd'
if you want the brackets
>>>u'[u\u4f60\u597d,]'.encode("utf8")
Solution 4:
one thing is the unicode character itself
hellolist = u'\u4f60\'
and another is how you can represent it.
You can represent it in many many ways depending on where you are going to display.
Web: UTF-8 Database: maybe UTF-16 or UTF-8 Web in Japan: EUC-JP or Shift JIS
For example 本 http://unicode.org/cgi-bin/GetUnihanData.pl?codepoint=672chttp://www.fileformat.info/info/unicode/char/672c/index.htm
Post a Comment for "How To Output A Utf-8 String List As It Is In Python?"