9

I have a script that current reads raw data from a .csv file and performs some pandas data analysis against the data. Currently the .csv file is hardcoded and is read in like this:

data = pd.read_csv('test.csv',sep="|", names=col)

I want to change 2 things:

  1. I want to turn this into a loop so it loops through a directory of .csv files and executes the pandas analysis below each one in the script.

  2. I want to take each .csv file and strip the '.csv' and store that in a another list variable, let's call it 'new_table_list'.

I think I need something like below, at least for the 1st point(though I know this isn't completely correct). I am not sure how to address the 2nd point

Any help is appreciated

import os 

path = '\test\test\csvfiles'
table_list = []

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(file)
data = pd.read_csv(table_list,sep="|", names=col)
8
  • 2
    You need to use os.path.join(path, filename) to get the full name of the file to read from Commented May 14, 2018 at 19:42
  • 1
    The first argument to read_csv needs to be a filename, not a list of filenames. Commented May 14, 2018 at 19:43
  • @Barmar Ah, ok. So read_csv can't take a parameter? Has to be a single filename? Ok, I will need to change my approach I think. Thanks Commented May 14, 2018 at 19:45
  • Yes, it takes a parameter. That parameter must be a filename or an already open file object that it can read from. Commented May 14, 2018 at 19:48
  • 1
    read_csv can only read one CSV file at a time, not all the files in table_list. Commented May 14, 2018 at 19:51

6 Answers 6

11

Many ways to do it

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(pd.read_csv(filename,sep="|"))
        new_table_list.append(filename.split(".")[0])

One more

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(pd.read_csv(filename,sep="|"))
        new_table_list.append(filename[:-4])

and many more

As @barmar pointed out, better to append path as well to the table_list to avoid any issues related to path and location of files and script.

Sign up to request clarification or add additional context in comments.

2 Comments

What is the file in table_list.append(file)? Surely you want to append something useful like `pd.read_csv(pathname, sep='|', names=col)`` somewhere, rather than just two different versions of the filename to two different lists?
@abarnert yup, that makes much more sense. Though I was trying to solve just second part. Will edit. Thanks a ton.
4

You can try something like this:

import glob

data = {}
for filename in glob.glob('/path/to/csvfiles/*.csv'):
    data[filename[:-4]] = pd.read_csv(filename, sep="|", names=col)

Then data.keys() is the list of filenames without the ".csv" part and data.values() is a list with one pandas dataframe for each file.

Comments

4

I'd start with using pathlib.

from pathlib import Path

And then leverage the stem attribute and glob method.

Let's make an import function.

def read_csv(f):
    return pd.read_csv(table_list, sep="|")

The most generic approach would be to store in a dictionary.

p = Path('\test\test\csvfiles')
dod = {f.stem: read_csv(f) for f in p.glob('*.csv')}

And you can also use pd.concat to turn that into a dataframe.

df = pd.concat(dod)

Comments

2

to get the list CSV files in the directory use glob it is easier than os

from glob import glob 

# csvs will contain all CSV files names ends with .csv in a list
csvs = glob('you\\dir\\to\\csvs_folder\\*.csv')

# remove the trailing .csv from CSV files names
new_table_list = [csv[:-3] for csv in csvs]

# read csvs as dataframes
dfs = [pd.read_csv(csv, sep="|", names=col) for csv in csvs]

#concatenate all dataframes into a single dataframe
df = pd.concat(dfs, ignore_index=True)

2 Comments

You may want to double the backslashes, use a raw string or replace them with forward slashes (most libraries work with forward-slash paths even if the path separator for the current OS is a back-slash). In your example, the \to in the literal will be interpreted as 0x09o
yeah I forgot them. Thanks!
1

you can try so:

import os
path = 'your path'
all_csv_files = [f for f in os.listdir(path) if f.endswith('.csv')]
for f in all_csv_files:
    data = pd.read_csv(os.path.join(path, f), sep="|", names=col)

# list without .csv
files = [f[:-4] for f all_csv_files]

Comments

0

You can (at the moment of opening) add the filename to a Dataframe attribute as follow:

 ds.attrs['filename']='filename.csv'

You can subsequently query the dataframe for the name

 ds.attrs['filename']
'filename.csv'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.