Preserving column order in Python Pandas DataFrame

Question

Is there a way to preserve the order of the columns in a csv file when read and the write with Python Pandas? For example, in this code

import pandas as pd

data = pd.read_csv(filename)
data.to_csv(filename)

the output files might be different because the columns are not preserved.

wishing OP had added a "when the column names are not known in advance" qualifier to this question. All the answers posted here assume that all the columns are already known, even though OP never said so. — Nikhil VJ
– Nikhil VJ, Commented Jul 5, 2018 at 4:44

CnrL · Accepted Answer · 2017-05-14 19:44:26Z

40

There appears to be a bug in the current version of Pandas ('0.11.0'), which means that Matti John's answer will not work. If you specify columns for writing to file, they are written in alphabetical order, but simply relabelled according to the list in cols. For example, this code:

import pandas
dfdict={}
dfdict["a"]=[1,2,3,4]
dfdict["b"]=[5,6,7,8]
dfdict["c"]=[9,10,11,12]
df=pandas.DataFrame(dfdict)
df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"])

results in this (incorrect) output:

    b   a   c
0   1   5   9
1   2   6   10
2   3   7   11
3   4   8   12

You can check which version of pandas you have installed by executing:

pandas.version.version

Documentation for to_csv is here

Actually, it seems that this is a known bug and will be fixed in an upcoming release (0.11.1):

https://github.com/pydata/pandas/issues/3489

UPDATE: There still hasn't been a new release of pandas, but there is a workaround described here, which doesn't require using a different version of pandas:

github.com/pydata/pandas/issues/3454

So changing the last line in the block of code above to the following will work correctly:

df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"], engine='python')

UPDATE it seems that the argument "cols" has been renamed to "columns" and that the argument "engine" is deprecated (no longer available) in recent versions of pandas. Also, this bug is fixed in version 0.19.0.

edited May 14, 2017 at 19:44

answered Jun 6, 2013 at 9:28

CnrL

2,59824 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

arielf Over a year ago

Trying this solution with recent pandas (0.19.2) gives: TypeError: to_csv() got an unexpected keyword argument 'cols' did the API change?

CnrL Over a year ago

believe this option has been deprecated as no longer necessary.

arielf Over a year ago

Seems like it was renamed to columns. Changing cols to columns works for me now.

Nikhil VJ Over a year ago

what to do when the column names are not known in advance?

CnrL Over a year ago

Just get the column names? df.columns

Matti John · Accepted Answer · 2017-05-14 19:42:43Z

26

The column order should generally be preserved when reading and then writing a csv file like that, but if for some reason they are not in the order you want you can use the columns keyword argument in to_csv.

For example, if you have a csv with columns a, b, c, d:

data = pd.read_csv(filename)
data.to_csv(filename, columns=['a', 'b', 'c', 'd'])

edited May 14, 2017 at 19:42

user2285236

answered Mar 27, 2013 at 12:24

Matti John

20.7k7 gold badges46 silver badges41 bronze badges

Comments

Maximilian Kohl · Accepted Answer · 2016-07-14 12:42:02Z

6

Another workaround is to do this:

import pandas as pd
data = pd.read_csv(filename)
data2 = df[['A','B','C']]  #put 'A' 'B' 'C' in the desired order
data2.to_csv(filename)

edited Jul 14, 2016 at 12:42

Maximilian Kohl

6401 gold badge9 silver badges23 bronze badges

answered Jan 28, 2016 at 2:22

Lawrence Chernin

1512 silver badges11 bronze badges

1 Comment

Mtap1 Over a year ago

This was the only solution that worked for me. You could reduce a line of code by reordering and creating the CSV all in one step.

Vojta F · Accepted Answer · 2022-05-13 12:40:06Z

0

When the column names are not known in advance

... you can easily specify them by reading the first line of your CSV file which contains headers, then converting the colnames to a list, and - as others pointed out - using that the list in read_csv():

path_to_table = 'path/to/table.csv'

# read the columns in the order as in CSV:
with open(path_to_table) as f:
    first_line = f.readline()
cols = first_line.strip().split(',')
    
# use it:
df = pd.read_csv(path_to_table, names=cols, header=0)[cols]

answered May 13, 2022 at 12:40

Vojta F

5643 silver badges19 bronze badges

Collectives™ on Stack Overflow

Preserving column order in Python Pandas DataFrame

4 Answers 4

5 Comments

Comments

1 Comment

When the column names are not known in advance

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

Comments

1 Comment

When the column names are not known in advance

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related