0

I have a text file which comprises unicode strings "aBiyukÙwa", "varcasÙva" etc. When I try to decode them in the python interpreter using the following code, it works fine and decodes to u'aBiyuk\xd9wa':

"aBiyukÙwa".decode("utf-8")

But when I read it from a file in a python program using the codecs module in the following code it throws a UnicodeDecodeError.

file = codecs.open('/home/abehl/TokenOutput.wx', 'r', 'utf-8')
for row in file:

Following is the error message:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xd9 in position 8: invalid continuation byte

Any ideas what is causing this strange behavior?

1 Answer 1

5

Your file is not encoded in UTF-8. Find out what it is encoded in, and then use that.

Sign up to request clarification or add additional context in comments.

1 Comment

Ù is 0xD9 in ISO8859-[1,3,10,14-16].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.