1

I'm processing a file csv file with pandas, so, I open it:

df = pd.read_csv(my_file, low_memory=False)

I'm applying some sanitizing functions, changing some strings to numbers, and then when I want to save the dataframe into a file I do this:

df.to_csv(output_file, index=False)

In some cases this throws a UnicodeEncodeError, so I want to know how to avoid this. I know there's an encoding parameter in the read_csv and the to_csv methods but whenever I have used it, it throws the error again.

I need to build a strong enough code that doesn't fail in the cases where the file has non-ascii characters. I know there's a parameter in the str.encode method, which is ignore and I would like to use something like that, but I'm not sure how to do it.

EDIT:

I know I can use encodings as latin1, iso-8859-1 or some others to make it work, but I would like the output file to be encoded in either ascii (preferably) or utf-8.

1 Answer 1

2

I had the same issue openning a russian database. Try calling read_csv with encoding='latin1', encoding='iso-8859-1' or encoding='cp1252' (these are some of the various encodings found on Windows).

df= pd.read_csv('xxx.csv',encoding ='latin1')
Sign up to request clarification or add additional context in comments.

2 Comments

I would like to avoid that, because I would like the output file to be in ascii or utf-8 encodings. Isn't there a way to avoid using these encodings?
you can try usingfile = open('xxx.csv', encoding='utf-8', errors = 'backslashreplace') pd.read_csv(file)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.