1

I'm trying to read a log file using pandas in python with the following format:

TimeStamp, Label, VarName, Data
2017-03-09 13:03:20, USER_INPUT, Name, Test 1
2017-03-09 13:03:20, USER_INPUT, ProjectNumber, 04BGA
2017-03-09 13:03:20, USER_INPUT, PasteType, Senju F100
.
.

which works fine until i reach the following type of line:

2017-03-09 12:03:20, CASETTE_INFO, CassetteHSPGetEjectorFeedAdjust, 2,2,2,2.4,3.5,1.15,1.1

Is it possible to tell pandas somehow that if there are more than 3 separators, then squeeze the rest together as a string? like

data = "2,2,2,2.4,3.5,1.15,1.1" 

Or is there any other solutions?

I'm just skipping these lines at the moment to go around the problem

pd.read_csv(filename, error_bad_lines=False)) 

It would be nice to solve this type of problem in my log-file

1 Answer 1

3

I think you need change separator to ,\s+ - it means comma with one or more whitespaces:

df = pd.read_csv(filename, sep=",\s+", engine='python')

Sample:

import pandas as pd
from pandas.compat import StringIO

temp=u"""imeStamp, Label, VarName, Data
2017-03-09 13:03:20, USER_INPUT, Name, Test 1
2017-03-09 13:03:20, USER_INPUT, ProjectNumber, 04BGA
2017-03-09 13:03:20, USER_INPUT, PasteType, Senju F100
2017-03-09 12:03:20, CASETTE_INFO, CassetteHSPGetEjectorFeedAdjust, 2,2,2,2.4,3.5,1.15,1.1"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), sep=",\s+", engine='python')
print (df)
              imeStamp         Label                          VarName  \
0  2017-03-09 13:03:20    USER_INPUT                             Name   
1  2017-03-09 13:03:20    USER_INPUT                    ProjectNumber   
2  2017-03-09 13:03:20    USER_INPUT                        PasteType   
3  2017-03-09 12:03:20  CASETTE_INFO  CassetteHSPGetEjectorFeedAdjust   

                     Data  
0                  Test 1  
1                   04BGA  
2              Senju F100  
3  2,2,2,2.4,3.5,1.15,1.1  

If 1.solution does not work use separator which is not in data - ¥ or | and read all data to one column data. Then use str.split:

import pandas as pd
from pandas.compat import StringIO

temp=u"""TimeStamp, Label, VarName, Data
2017-03-09 13:03:20, USER_INPUT, Name, Test 1
2017-03-09 13:03:20, USER_INPUT, ProjectNumber, 04BGA
2017-03-09 13:03:20, USER_INPUT, PasteType, Senju F100
2017-03-09 12:03:20, CASETTE_INFO, CassetteHSPGetEjectorFeedAdjust, 2,2,2,2.4,3.5,1.15,1.1"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), sep = '¥', names=['data'], skiprows=1)
print (df)
                                                data
0      2017-03-09 13:03:20, USER_INPUT, Name, Test 1
1  2017-03-09 13:03:20, USER_INPUT, ProjectNumber...
2  2017-03-09 13:03:20, USER_INPUT, PasteType, Se...
3  2017-03-09 12:03:20, CASETTE_INFO, CassetteHSP...

df = df.data.str.split(',', n=3, expand=True)
df.columns = ['TimeStamp','Label','VarName','Data']
print (df)
             TimeStamp          Label                           VarName  \
0  2017-03-09 13:03:20     USER_INPUT                              Name   
1  2017-03-09 13:03:20     USER_INPUT                     ProjectNumber   
2  2017-03-09 13:03:20     USER_INPUT                         PasteType   
3  2017-03-09 12:03:20   CASETTE_INFO   CassetteHSPGetEjectorFeedAdjust   

                      Data  
0                   Test 1  
1                    04BGA  
2               Senju F100  
3   2,2,2,2.4,3.5,1.15,1.1  
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.