2

I have a CSV saved as data.csv that looks like this, with two columns:

Column1|Column2
Titleone|1.5
Title|two|2.5
Title3|3.6

The third row of data in the CSV contains a pipe operator, | that is causing the error. I need a way to read in the pipe operator as part of the Column1 value for the third row. When I run pd.read_csv("data.csv", sep = "|") I get the error: ParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 3

I cannot use, on_bad_lines='skip' since I'm on an old version of Pandas. This is a workaround I found that seems to be a partial solution:

col_names = ["col1", "col2", "col3"]
df = pd.read_csv("data.csv", sep = "|", names = col_names)
3
  • Your error looks like its coming from Title|two|2.5. "Expected 2 fields, got 3" because it reads field1 = "Title", field2 = "two", field3 = 2.5. Commented Jan 14, 2022 at 4:04
  • why don't you read the data as a single column, and then do your split later -> pd.read_csv(squeeze=True) Commented Jan 14, 2022 at 4:04
  • Yes, third row in the csv or second row besides the header Commented Jan 14, 2022 at 4:07

1 Answer 1

3

on_bad_lines deprecates error_bad_lines, so if you're on an older version of pandas, you can just use that:

pd.read_csv("data.csv", sep = "|", error_bad_lines = False)

If you want to keep bad lines, you can also use warn_bad_lines, extract bad lines from the warnings and read them separately in a single column:

import contextlib

with open('log.txt', 'w') as log:
    with contextlib.redirect_stderr(log):
        df = pd.read_csv('data.csv', sep = '|', error_bad_lines = False, warn_bad_lines = True)

with open('log.txt') as f:
    f = f.readlines()

bad_lines = [int(x[0]) - 1 for x in f[0].split('line ')[1:]]

df_bad_lines = pd.read_csv('data.csv', skiprows = lambda x: x not in bad_lines, squeeze = True, header = None)
Sign up to request clarification or add additional context in comments.

2 Comments

This works, but I'd like to keep the row causing the error and retain the value "Title|two" in a single column value
Added an option to read bad lines separately

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.