Customizing the separator in pandas read_csv

Question

I am reading many different data files into various pandas dataframes. The columns in these datafiles are separated by spaces. However, for each file, the number of spaces is different (for some of them, there is only one space, for others, there are two spaces and so on). Thus, every time I import the file, I have to manually go to that file and see the number of spaces that have been used and give those many number of spaces in sep:

import pandas as pd
df = pd.read_csv('myfile.dat', sep = '    ')

Is there any way I can tell pandas to assume "any number of spaces" as the separator? Also, is there any way I can tell pandas to use either tab (\t) or spaces as the separator?

piRSquared · Accepted Answer · 2016-12-20 05:00:32Z

40

Yes, you can use a simple regular expression like sep='\s+' to denote one or more spaces.

edited Dec 20, 2016 at 5:00

piRSquared

296k68 gold badges509 silver badges654 bronze badges

answered Dec 20, 2016 at 4:59

Ted Petrou

62.4k19 gold badges139 silver badges139 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Peaceful Over a year ago

That worked! Thanks. Is there any way I can tell pandas to use either space or tab as the separator?

Ted Petrou Over a year ago

The whitespace might match tab as well but I believe you can just add an or condition to the regular expression: sep=\s+|\t+

normanius · Accepted Answer · 2019-11-05 11:09:01Z

5

You can directly use delim_whitespace:

import pandas as pd
df = pd.read_csv('myfile.dat', delim_whitespace=True )

The argument delim_whitespace controls whether or not whitespace (e.g. ' ' or ' ') will be used as separator. See pandas.read_csv for details.

edited Nov 5, 2019 at 11:09

normanius

9,9698 gold badges64 silver badges97 bronze badges

answered Jul 3, 2017 at 12:00

nlahri

691 silver badge4 bronze badges

Comments

piRSquared · Accepted Answer · 2016-12-20 05:04:47Z

4

You can also use the parameter skipinitialspace=True which skips the leading spaces after any delimiter.

answered Dec 20, 2016 at 5:04

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Comments

Dustin Williams · Accepted Answer · 2018-04-10 17:21:17Z

2

One thing I found is if you use a unsupported separator. Pandas/Dask will have to use the Python engine instead of the C engine. This is a good deal slower.

answered Apr 10, 2018 at 17:21

Dustin Williams

312 bronze badges

Collectives™ on Stack Overflow

Customizing the separator in pandas read_csv

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related