2

Can someone help me understand how to read multiple excel files in Dask? In Pandas, I would use Glob and do this

files = glob.glob('Working Files/*.xlsx')
df = pd.concat([pd.read_excel(i, skiprows=2) for i in files], ignore_index=True)

Need help with doing the same in Dask

Thanks,

Jac

2 Answers 2

0

The easiest solution is to wrap your function in a delayed API:

import dask

files = glob.glob('Working Files/*.xlsx')

# note we are wrapping in delayed only the function, not the arguments
delayeds = [dask.delayed(pd.read_excel)(i, skiprows=2) for i in files]

# the line below launches actual computations
results = dask.compute(delayeds)

# after computation is over the results object will 
# contain a list of pandas dataframes
df = pd.concat(results, ignore_index=True)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot for your quick response; one question: By doing this pd.concat I will be moving to pandas from Dask, right? I'm trying to switch to Dask because I read that Dask is faster, is this method still the optimal way to do it? can I concatenate in Dask itself, won't that be faster? PS: pardon, if it's a dumb thought :P , I'm fairly new to these
Depending on your use case, data, etc, it might be faster to keep the data in pandas...
0

Following the approach, I had some issues with pd.concat, there I changed that creating an array insted of concat. Hope it works!

files = glob.glob(r"D:\XX\XX\XX\XX\XXX\*.xlsx")

# note we are wrapping in delayed only the function, not the arguments
delayeds = [dask.delayed(pd.read_excel)(i, skiprows=0) for i in files]

# the line below launches actual computations
results = dask.compute(delayeds)

# after computation is over the results object will 
# contain a list of pandas dataframes
dask_array = dd.from_delayed(delayeds) # here instead of pd.concat
dask_array.compute().to_csv(r"D:\XX\XX\XX\XX\XXX\*.csv") # Please be aware of the dtypes on your Excel.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.