Pandas to_csv correct way to handle UnicodeEncodeError

Question

I'm processing a file csv file with pandas, so, I open it:

df = pd.read_csv(my_file, low_memory=False)

I'm applying some sanitizing functions, changing some strings to numbers, and then when I want to save the dataframe into a file I do this:

df.to_csv(output_file, index=False)

In some cases this throws a UnicodeEncodeError, so I want to know how to avoid this. I know there's an encoding parameter in the read_csv and the to_csv methods but whenever I have used it, it throws the error again.

I need to build a strong enough code that doesn't fail in the cases where the file has non-ascii characters. I know there's a parameter in the str.encode method, which is ignore and I would like to use something like that, but I'm not sure how to do it.

EDIT:

I know I can use encodings as latin1, iso-8859-1 or some others to make it work, but I would like the output file to be encoded in either ascii (preferably) or utf-8.

lfkopp · Accepted Answer · 2018-10-24 12:44:14Z

2

I had the same issue openning a russian database. Try calling read_csv with encoding='latin1', encoding='iso-8859-1' or encoding='cp1252' (these are some of the various encodings found on Windows).

df= pd.read_csv('xxx.csv',encoding ='latin1')

answered Oct 24, 2018 at 12:44

lfkopp

1042 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Carlos A. Jimenez Holmquist Over a year ago

I would like to avoid that, because I would like the output file to be in ascii or utf-8 encodings. Isn't there a way to avoid using these encodings?

lfkopp Over a year ago

you can try usingfile = open('xxx.csv', encoding='utf-8', errors = 'backslashreplace') pd.read_csv(file)

Collectives™ on Stack Overflow

Pandas to_csv correct way to handle UnicodeEncodeError

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related