Python CSV cant encode character

Question

Using Python an Beautiful Soup, I have created a script that takes the name, address and phone number of businesses off a website and the output is saved into three columns of a CSV file.

The script works fine but it stops when I get to a business name that is as follows:

u'\nLevel 12, 280 George Street SYDNEY\xa0 NSW\xa0 2000. . Sydney. NSW 2000\n'

The problem is the "xa0" part. The error message states:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 35: ordinal not in range(128)

I have a vague idea of what this error means but have no idea how to deal with it. Any ideas?

Thanks

Edit:

My script is as follows:

import bs4
import requests

page = requests.get('http://accountantlist.com.au/x123-Accountants-in-Sydney.aspx?Page=0')
soup = bs4.BeautifulSoup(page.content)

for company in soup.select('table#ctl00_ContentPlaceHolder1_dgLawyers tr > td > table'):
     name = company.a.text
     address = company.find_all('tr')[1].text
     phone = company.tr.find_all('td')[1].text
     with open('/home/kwal0203/Desktop/eggs.csv', 'a') as csvfile:
             s = csv.writer(csvfile)
             s.writerow([name,address,phone])

You'll need to show the code you're using that causes the error, along with the complete traceback. — BrenBarn
– BrenBarn, Commented Nov 29, 2014 at 1:48
So you are getting this error while writing to your csv file. Can you share us how you arre writing your csv file? — Tanveer Alam
– Tanveer Alam, Commented Nov 29, 2014 at 1:49
I've added the script to the question but not sure how to get the traceback? — Kane
– Kane, Commented Nov 29, 2014 at 1:54

Tanveer Alam · Accepted Answer · 2014-11-29 03:16:44Z

4

You need to encode it to utf-8 format while writing to csv file as Python's built-in csv doesn't supports unicode.

def remove_non_ascii(text):
    return ''.join(i for i in text if ord(i)<128)


name = remove_non_ascii(company.a.text)
address = remove_non_ascii(company.find_all('tr')[1].text)
phone = remove_non_ascii(company.tr.find_all('td')[1].text)

with open('/home/kwal0203/Desktop/eggs.csv', 'a') as csvfile:
    s = csv.writer(csvfile)
    s.writerow([data.encode("utf-8") for data in [name,address,phone]])

Or you can install unicodecsv which supports unicode by default.

You can install it like this.

pip install unicodecsv

edited Nov 29, 2014 at 3:16

answered Nov 29, 2014 at 1:55

Tanveer Alam

5,2754 gold badges24 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Kane Over a year ago

Hi when I use your code I get an error message saying: TypeError: decoding Unicode is not supported

Tanveer Alam Over a year ago

Can you include a sample of [name,address,phone] in your question so that i can try out your data.

Kane Over a year ago

I don't have the data in a file. I just scrape it off this website http://accountantlist.com.au/x123-Accountants-in-Sydney.aspx using the script I wrote.

Kane Over a year ago

This is the problem text: u'\nLevel 12, 280 George Street SYDNEY\xa0 NSW\xa0 2000. . Sydney. NSW 2000\n'. The xa0 part that encloses NSW

Tanveer Alam Over a year ago

@kane See i have added another function remove_non_ascii. What it does is it takes each char checks whether its ascii(ord<128)) and returns word by removing non ascii values.

|

Collectives™ on Stack Overflow

Python CSV cant encode character

1 Answer 1

11 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Related