2

Using Python an Beautiful Soup, I have created a script that takes the name, address and phone number of businesses off a website and the output is saved into three columns of a CSV file.

The script works fine but it stops when I get to a business name that is as follows:

u'\nLevel 12, 280 George Street SYDNEY\xa0 NSW\xa0 2000. . Sydney. NSW 2000\n'

The problem is the "xa0" part. The error message states:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 35: ordinal not in range(128)

I have a vague idea of what this error means but have no idea how to deal with it. Any ideas?

Thanks

Edit:

My script is as follows:

import bs4
import requests

page = requests.get('http://accountantlist.com.au/x123-Accountants-in-Sydney.aspx?Page=0')
soup = bs4.BeautifulSoup(page.content)

for company in soup.select('table#ctl00_ContentPlaceHolder1_dgLawyers tr > td > table'):
     name = company.a.text
     address = company.find_all('tr')[1].text
     phone = company.tr.find_all('td')[1].text
     with open('/home/kwal0203/Desktop/eggs.csv', 'a') as csvfile:
             s = csv.writer(csvfile)
             s.writerow([name,address,phone])
4
  • You'll need to show the code you're using that causes the error, along with the complete traceback. Commented Nov 29, 2014 at 1:48
  • So you are getting this error while writing to your csv file. Can you share us how you arre writing your csv file? Commented Nov 29, 2014 at 1:49
  • I've added the script to the question but not sure how to get the traceback? Commented Nov 29, 2014 at 1:54
  • Sorry I've left out part of the script. One second. Commented Nov 29, 2014 at 1:55

1 Answer 1

4

You need to encode it to utf-8 format while writing to csv file as Python's built-in csv doesn't supports unicode.

def remove_non_ascii(text):
    return ''.join(i for i in text if ord(i)<128)


name = remove_non_ascii(company.a.text)
address = remove_non_ascii(company.find_all('tr')[1].text)
phone = remove_non_ascii(company.tr.find_all('td')[1].text)

with open('/home/kwal0203/Desktop/eggs.csv', 'a') as csvfile:
    s = csv.writer(csvfile)
    s.writerow([data.encode("utf-8") for data in [name,address,phone]])

Or you can install unicodecsv which supports unicode by default.

You can install it like this.

pip install unicodecsv
Sign up to request clarification or add additional context in comments.

11 Comments

Hi when I use your code I get an error message saying: TypeError: decoding Unicode is not supported
Can you include a sample of [name,address,phone] in your question so that i can try out your data.
I don't have the data in a file. I just scrape it off this website http://accountantlist.com.au/x123-Accountants-in-Sydney.aspx using the script I wrote.
This is the problem text: u'\nLevel 12, 280 George Street SYDNEY\xa0 NSW\xa0 2000. . Sydney. NSW 2000\n'. The xa0 part that encloses NSW
@kane See i have added another function remove_non_ascii. What it does is it takes each char checks whether its ascii(ord<128)) and returns word by removing non ascii values.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.