Good afternoon!
I have a .csv file like this (when opened with Notepad):
"2,"" Lorem ipsum dolor sit amet, consectetur adipiscing elit.
"""
"2,"" Proin a tortor leo. Morbi dictum laoreet nulla sit amet luctus. Donec euismod egestas velit, eget consequat ex porttitor vitae. Sed venenatis ornare enim sed rutrum. Aenean congue purus vitae congue rutrum. Ut ex felis, viverra imperdiet est vel, hendrerit luctus ligula.
"""
"2,"" estibulum consequat lorem enim, ut semper erat fringilla id.
"""
"2,"" Praesent a lobortis justo. Cras in sapien enim.
"""
...
I use this to get data from a file:
train = pd.read_csv('yelp_review_polarity_csv/train.csv',
header=None,
names=['Class', 'Review'],
encoding="cp1251",
sep=",")
Here is what I get:
The second column filled with "Null" values. I need it to look something like this:
Class Review
2 Lorem ipsum dolor sit amet...
I mean that the data should be divided into two columns with a "," delimiter. How to fix it?
Note: I am using encoding cp1251 so that there are no problems with some characters from another language.
"2,"escape the comma so its not considered a column separator. You could change the quote character, but then the quotes would be part of the value. The fundamental problem is that it isn't a CSV file so some other type of parsing is needed. You may be able to hack it by simply removing the quotes (likely read line by line, remove " and save to a temp file) and then reading into pandas."""? I think you'd want to remove them completely. If you have a line2, Lorem ipsum dolor sit amet, consectetur adipiscing elit., that's a 2 column CSV (with some left side padding on column 2) and I got there just by removing quotes.