UnicodeDecodeError in Python with codecs module

Question

I have a text file which comprises unicode strings "aBiyukÙwa", "varcasÙva" etc. When I try to decode them in the python interpreter using the following code, it works fine and decodes to u'aBiyuk\xd9wa':

"aBiyukÙwa".decode("utf-8")

But when I read it from a file in a python program using the codecs module in the following code it throws a UnicodeDecodeError.

file = codecs.open('/home/abehl/TokenOutput.wx', 'r', 'utf-8')
for row in file:

Following is the error message:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xd9 in position 8: invalid continuation byte

Any ideas what is causing this strange behavior?

Ignacio Vazquez-Abrams · Accepted Answer · 2011-07-04 20:48:45Z

5

Your file is not encoded in UTF-8. Find out what it is encoded in, and then use that.

answered Jul 4, 2011 at 20:48

Ignacio Vazquez-Abrams

804k160 gold badges1.4k silver badges1.4k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Wooble Over a year ago

Ù is 0xD9 in ISO8859-[1,3,10,14-16].

Collectives™ on Stack Overflow

UnicodeDecodeError in Python with codecs module

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related