Skip to content

First two columns wrong in old xls files. #161

@marvinbernhardt

Description

@marvinbernhardt

I have some old xls files that I am parsing. In some of them, when parse them with python-calamine, have wrong values in the first two columns.

workbook = CalamineWorkbook.from_path(str(path))
for sheet_name in workbook.sheet_names:
    rows_2d = workbook.get_sheet_by_name(sheet_name).to_python(
        skip_empty_area=True,
    )
    print(rows_2d[0][0:3])

outputs
[0.001, 22.48, ''].
This is not the right content. If one looks at all fields, it shows that the values in the first two columns are always wrong. They should be mixed types, but calamine returns floats that look like they come from elsewhere in the file.

In comparison with xlrd:

book = xlrd.open_workbook(str(path))
for sh in book.sheets():
    for rx in range(sh.nrows):
        print(sh.row(rx)[0:3])
        break

outputs
[text:'<correct text>', empty:'', empty:'']
which is correct (I replaced the string).

When I open the xls in Excel and save it again, this behavior goes away (they also become smaller for some reason). Therefore I can not "censor" it. Those are company files so I can not upload one here uncensored.

Is there something we can do about this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions