46

Is there a way to preserve the order of the columns in a csv file when read and the write with Python Pandas? For example, in this code

import pandas as pd

data = pd.read_csv(filename)
data.to_csv(filename)

the output files might be different because the columns are not preserved.

2
  • Can you provide an example of your csv? Commented Mar 27, 2013 at 8:09
  • 4
    wishing OP had added a "when the column names are not known in advance" qualifier to this question. All the answers posted here assume that all the columns are already known, even though OP never said so. Commented Jul 5, 2018 at 4:44

4 Answers 4

40

There appears to be a bug in the current version of Pandas ('0.11.0'), which means that Matti John's answer will not work. If you specify columns for writing to file, they are written in alphabetical order, but simply relabelled according to the list in cols. For example, this code:

import pandas
dfdict={}
dfdict["a"]=[1,2,3,4]
dfdict["b"]=[5,6,7,8]
dfdict["c"]=[9,10,11,12]
df=pandas.DataFrame(dfdict)
df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"])

results in this (incorrect) output:

    b   a   c
0   1   5   9
1   2   6   10
2   3   7   11
3   4   8   12

You can check which version of pandas you have installed by executing:

pandas.version.version

Documentation for to_csv is here

Actually, it seems that this is a known bug and will be fixed in an upcoming release (0.11.1):

https://github.com/pydata/pandas/issues/3489

UPDATE: There still hasn't been a new release of pandas, but there is a workaround described here, which doesn't require using a different version of pandas:

github.com/pydata/pandas/issues/3454

So changing the last line in the block of code above to the following will work correctly:

df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"], engine='python')

UPDATE it seems that the argument "cols" has been renamed to "columns" and that the argument "engine" is deprecated (no longer available) in recent versions of pandas. Also, this bug is fixed in version 0.19.0.

Sign up to request clarification or add additional context in comments.

5 Comments

Trying this solution with recent pandas (0.19.2) gives: TypeError: to_csv() got an unexpected keyword argument 'cols' did the API change?
believe this option has been deprecated as no longer necessary.
Seems like it was renamed to columns. Changing cols to columns works for me now.
what to do when the column names are not known in advance?
Just get the column names? df.columns
26

The column order should generally be preserved when reading and then writing a csv file like that, but if for some reason they are not in the order you want you can use the columns keyword argument in to_csv.

For example, if you have a csv with columns a, b, c, d:

data = pd.read_csv(filename)
data.to_csv(filename, columns=['a', 'b', 'c', 'd'])

Comments

6

Another workaround is to do this:

import pandas as pd
data = pd.read_csv(filename)
data2 = df[['A','B','C']]  #put 'A' 'B' 'C' in the desired order
data2.to_csv(filename)

1 Comment

This was the only solution that worked for me. You could reduce a line of code by reordering and creating the CSV all in one step.
0

When the column names are not known in advance

... you can easily specify them by reading the first line of your CSV file which contains headers, then converting the colnames to a list, and - as others pointed out - using that the list in read_csv():

path_to_table = 'path/to/table.csv'

# read the columns in the order as in CSV:
with open(path_to_table) as f:
    first_line = f.readline()
cols = first_line.strip().split(',')
    
# use it:
df = pd.read_csv(path_to_table, names=cols, header=0)[cols]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.