Pandas rows containing numpy ndarrays various shapes

Question

I'd creating a Pandas DataFrame in which each particular (index, column) location can be a numpy ndarray of arbitrary shape, or even a simple number.

This works:

import numpy as np, pandas as pd
x = pd.DataFrame([[np.random.rand(100, 100, 20, 2), 3], [2, 2], [3, 3], [4, 4]],
                              index=['A1', 'B2', 'C3', 'D4'], columns=['data', 'data2'])
print(x)

but takes 50 seconds to create on my computer! Why?

np.random.rand(100, 100, 20, 2) alone is super fast (< 1 second to create)

How to speed up the creation of Pandas datasets containing ndarrays of various shapes?

When a pandas DataFrame is a homogenous type, the whole thing can be a single numpy array. When you create a list like this where the columns are hetergeneous, pandas has to do a bunch of bookkeeping and reformatting to keep track of the different datatypes. — Tim Roberts
– Tim Roberts, Commented Jun 23, 2022 at 23:25
Yes probably @TimRoberts but here I only have ~400 000 coefficients to store in the dataframe. 50 seconds for this is really problematic! Is there an easy fix here? — Basj
– Basj, Commented Jun 23, 2022 at 23:27
It's not the creation taking time, it's the print. The creation is pretty much instantaneous on my computer, as is print(x['data2']). But print(x['data']) takes about 15 seconds — Nick
– Nick, Commented Jun 23, 2022 at 23:41
In fact print(x['data']['A1']) and print(x['data']['B2']) are likewise super fast. So I guess print is just having trouble putting together elements of vastly different size. A bug perhaps? — Nick
– Nick, Commented Jun 23, 2022 at 23:43

Nick · Accepted Answer · 2022-06-23 23:48:23Z

2

It's not actually the creation that is the issue, it's the print statement. 1000 loops of the creation take 2.8 seconds on my computer. But one iteration of the print takes about 26 seconds.

Interestingly, print(x['data2']), print(x['data']['A1']) and print(x['data']['B2']) are all basically instantaneous. So it seems print is having an issue figuring out how to display items of vastly different size. Perhaps a bug?

answered Jun 23, 2022 at 23:48

Nick

147k23 gold badges67 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas rows containing numpy ndarrays various shapes

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related