0


I have a csv file that i'm trying to clean up with python.
it has lines separated by << \n >> or empty lines.
I would like each line that does not end with << " >> to be cut/pasted to the previous line.
here is a concrete example to be more explicit!\

CSV FILE I HAVE

*"id","name","age","city","remark"\
"1","kevin","27","paris","This is too bad"\
"8","angel","18","london","Incredible !!!"\
"14","maria","33","madrid","i can't believe it."\
"16","john","28","new york","hey men,\
\nhow do you did this"\
"22","naima","35","istanbul","i'm sure it's false,\
\
\nit can't be real"
"35","marco","26","roma","you'r my hero!"\
"39","lili","37","tokyo","all you need to knows.\
\n\nthe best way to upgrade easely"\
...*

CSV FILE I WOULD LIKE TO HAVE

*"id","name","age","city","remark"\
"1","kevin","27","paris","This is too bad"\
"8","angel","18","london","Incredible !!!"\
"14","maria","33","madrid","i can't believe it."\
"16","john","28","new york","hey men,how do you did this"\
"22","naima","35","istanbul","i'm sure it's false, it can't be real"\
"35","marco","26","roma","you'r my hero!"\
"39","lili","37","tokyo","all you need to knows. the best way to upgrade easely"\
...*

someone would be how to do?
thank you in advance for your help !

i'm actually try this python code -->

text = open("input.csv", "r", encoding='utf-8') 
  
text = ''.join([i for i in text])  
  
text = text.replace("\\n", "")
 
x = open("output.csv","w") 
  
x.writelines(text) 
x.close()
4
  • 1
    What have you tried so far? If you post a focused code example, people can help you with it. Commented Apr 19, 2022 at 13:27
  • Are those backslash characters at the end or each line really there or did you add them? Commented Apr 19, 2022 at 14:41
  • no, i put the backslash to write the csv... thats an error Commented Apr 19, 2022 at 15:09
  • no, i put the backslash to write the csv... thats an error Commented Apr 19, 2022 at 15:09

3 Answers 3

1
for this_row in read_file.readlines():
    if not this_row.startswith('"'):
        prev_row = prev_row.rstrip('\n') + this_row
    else:
        write_file.write(prev_row)
        prev_row = this_row

Just a draft. You can use str.join with list-cache to gain enhancement

Sign up to request clarification or add additional context in comments.

Comments

1

There are a few points to make here:

  1. Your CSV files contains , characters inside your remarks. This means the field must be enclosed in quotes (which it is).

  2. A CSV file is allowed to contain a newline within a single field. This does not result in an extra data row, but it does make the file weird to read for a human.

  3. Python's CSV reader will automatically handle newlines in fields.

  4. Finally, your data appears to be encoded strangely and you wish to remove all the extra newlines. Each row also has a trailing backslash character that should not be there.

I suggest the following approach:

  1. Use Python's CSV reader to correctly read one row at a time (you have 7 rows + a header).
  2. Remove any newlines from the remark field.

For example:

import csv

with open('input.csv') as f_input, open('output.csv', 'w', newline='') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)

    for row in csv_input:
        if len(row) == 5:       # skip blank lines
            row[4] = row[4].replace('\n', '').replace('\\n', ' ').replace('\\', '')
            csv_output.writerow(row)

This would give you:

id,name,age,city,remark\
1,kevin,27,paris,This is too bad
8,angel,18,london,Incredible !!!
14,maria,33,madrid,i can't believe it.
16,john,28,new york,"hey men, how do you did this"
22,naima,35,istanbul,"i'm sure it's false, it can't be real"
35,marco,26,roma,you'r my hero!
39,lili,37,tokyo,all you need to knows.  the best way to upgrade easely

Comments

0

input.csv file content:

"id","name","age","city","remark"
"1","kevin","27","paris","This is too bad"
"8","angel","18","london","Incredible !!!"
"14","maria","33","madrid","i can't believe it."
"16","john","28","new york","hey men,
how do you did this"
"22","naima","35","istanbul","i'm sure it's false,

nit can't be real"
"35","marco","26","roma","you'r my hero!"
"39","lili","37","tokyo","all you need to knows.

the best way to upgrade easely"

Possible (quick and simple) solution is the following:

with open('input.csv', 'r', encoding='utf-8') as file:
    data = file.read()
    
clean_data = data.replace('"\n"', '"||"').replace("\n", "").replace('"||"', '"\n"')
    
with open('output.csv', 'w', encoding='utf-8') as file:
    file.write(clean_data)

Returns output.csv content:

"id","name","age","city","remark"
"1","kevin","27","paris","This is too bad"
"8","angel","18","london","Incredible !!!"
"14","maria","33","madrid","i can't believe it."
"16","john","28","new york","hey men,how do you did this"
"22","naima","35","istanbul","i'm sure it's false,nit can't be real"
"35","marco","26","roma","you'r my hero!"
"39","lili","37","tokyo","all you need to knows.the best way to upgrade easely"

1 Comment

it works !!! THANKS YOU A LOT !!!! it's exactly what i need !!!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.