Read XLS and convert it to CSV using Python

Question

I need to convert XLS files to CSV in order to the data contained in a PostgreSQL database, I used the following code to do the conversion :

import xlrd
import unicodecsv

def xls2csv (xls_filename, csv_filename):
    # Converts an Excel file to a CSV file.
    # If the excel file has multiple worksheets, only the first worksheet is converted.
    # Uses unicodecsv, so it will handle Unicode characters.
    # Uses a recent version of xlrd, so it should handle old .xls and new .xlsx equally well.

    wb = xlrd.open_workbook(xls_filename)
    sh = wb.sheet_by_index(0)

    fh = open(csv_filename,"wb")
    csv_out = unicodecsv.writer(fh, encoding='utf-8')

    for row_number in xrange (sh.nrows):
        csv_out.writerow(sh.row_values(row_number))

    fh.close()

The XLS files I'm using contains 212 columns and at least 100 rows, when I test the code with just 4 rows it works fine, but when nrows>5 the interpreter raises the following errors :

xls2csv ('e:/t.xls', 'e:/wh.csv')
WARNING *** file size (353829) not 512 + multiple of sector size (512)
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
*** No CODEPAGE record, no encoding_override: will use 'ascii'
*** No CODEPAGE record, no encoding_override: will use 'ascii'
Traceback (most recent call last):

  File "<ipython-input-14-ccae93f2d633>", line 1, in <module>
    xls2csv ('e:/t.xls', 'e:/wh.csv')

  File "C:/Users/hey/.spyder/temp.py", line 10, in xls2csv
    wb = xlrd.open_workbook(xls_filename)

  File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\__init__.py", line 441, in open_workbook
    ragged_rows=ragged_rows,

  File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\book.py", line 119, in open_workbook_xls
    bk.get_sheets()

  File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\book.py", line 678, in get_sheets
    self.get_sheet(sheetno)

  File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\book.py", line 669, in get_sheet
    sh.read(self)

  File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\sheet.py", line 804, in read
    strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)

  File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\biffh.py", line 269, in unpack_string
    return unicode(data[pos:pos+nchars], encoding)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xb2 in position 2: ordinal not in range(128)

Tiny.D · Accepted Answer · 2017-05-19 15:01:10Z

1

There is decoding issue when you open the xls file, I suspect the 5th line of xls file has special character, based on the xlrd documentation, you can use encoding_override="cp1251" to translate to Unicode:

wb = xlrd.open_workbook(xls_filename, encoding_override="cp1251")

answered May 19, 2017 at 15:01

Tiny.D

6,5562 gold badges18 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

geoinfo Over a year ago

Do you have any idea how to set the separator of the generated csv to ; ?

Tiny.D Over a year ago

simply try with: csv_out = unicodecsv.writer(fh, delimiter=';', encoding='utf-8')

Community · Accepted Answer · 2017-05-23 10:31:33Z

1

It looks like the error isn't because of the number of rows, but because of a problem handling unicode characters in your source file.

I'd recommend trying Pandas:

import pandas as pd

df = pd.read_excel('input.xls')
df.to_csv('output.csv', encoding='utf-8')

Note that (while you don't expand on the Postgres part) if this is a first step to getting your data into Postgres, once your data is loaded into a Pandas dataframe, you can send it straight to Postgres.

edited May 23, 2017 at 10:31

CommunityBot

11 silver badge

answered May 19, 2017 at 14:53

Tom M

4171 gold badge5 silver badges9 bronze badges

2 Comments

geoinfo Over a year ago

I've already tested it, but it didn't work, here is the errors :

WojtylaCz Over a year ago

It requires to install pandas and xlrd as top level packages.

Collectives™ on Stack Overflow

Read XLS and convert it to CSV using Python

2 Answers 2

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related