Python Kludge To Read Ucs-2 (utf-16?) As Ascii
Solution 1:
codecs.open()
will allow you to open a file using a specific encoding, and it will produce unicode
s. You can try a few, going from most likely to least likely (or the tool could just always produce UTF-16LE but ha ha fat chance).
Solution 2:
works.log
appears to be encoded in ASCII:
>>>data = open('works.log', 'rb').read()>>>all(d < '\x80'for d in data)
True
breaks.log
appears to be encoded in UTF-16LE -- it starts with the 2 bytes '\xff\xfe'
. None of the characters in breaks.log
are outside the ASCII range:
>>>data = open('breaks.log', 'rb').read()>>>data[:2]
'\xff\xfe'
>>>udata = data.decode('utf16')>>>all(d < u'\x80'for d in udata)
True
If these are the only two possibilities, you should be able to get away with the following hack. Change your mainline code from:
f = open(sys.argv[1])
mb_toc_urlpart = "%20".join(
str(x) forx in calculate_mb_toc_numbers(filter_toc_entries(f)))
print mb_toc_urlpart
to this:
f = open(sys.argv[1], 'rb')
data = f.read()
f.close()
ifdata[:2] == '\xff\xfe':
data = data.decode('utf16').encode('ascii')
# ilines is a generator which produces newline-terminated strings
ilines = (line + '\n'for line indata.splitlines())
mb_toc_urlpart = "%20".join(
str(x) for x in calculate_mb_toc_numbers(filter_toc_entries(ilines)) )
print mb_toc_urlpart
Solution 3:
Python 2.x expects normal strings to be ASCII (or at least one byte). Try this:
Put this at the top of your Python source file:
from __future__ import unicode_literals
And change all the str
to unicode
.
[edit]
And as Ignacio Vazquez-Abrams wrote, try codecs.open()
to open the input file.
Post a Comment for "Python Kludge To Read Ucs-2 (utf-16?) As Ascii"