Keras input Pandas dataframe

Question

I'm new to Keras and I want to fit my train data in an Excel file. My data has shape(1000, 5, 5), 1000 batches of data which are saved in 1000 spreadsheets, each sheet contain 5 columns and rows:

A	B	C	D	E
-	-	-	-	label
-	-	-	-	label
-	-	-	-	label
-	-	-	-	label
-	-	-	-	label

I want Column A, B, C to be training features and Column E to be label.

import pandas as pd
import tensorflow as tf
import multiprocessing

df = pd.read_excel('File.xlsx', sheet_name=None)
data_list = list(df.values())

def input_parser(x):
    Y = x.pop('E')
    features = ['A','B','C']
    X = x[features]
    return X, Y

dataset = tf.data.Dataset.from_tensor_slices(data_list)
dataset = dataset.map(lambda x: tuple(tf.py_function(func=input_parser,
                                                     inp=[x],
                                                     Tout=[tf.float32,tf.int64])),
                      num_parallel_calls=multiprocessing.cpu_count())

and then I got an error:

ValueError: Can't convert non-rectangular Python sequence to Tensor.

Why do I get this error? How can I fit this data to my model?

AloneTogether · Accepted Answer · 2021-11-04 13:59:54Z

1

Maybe try omitting your map function altogether and simply passing your data directly to tf.data.Dataset.from_tensor_slices:

import pandas as pd
import tensorflow as tf
import numpy as np

spread_sheet1 = {'A': [1, 2, 1, 2, 9], 'B': [3, 4, 6, 1, 4], 'C': [3, 4, 3, 1, 4], 'D': [1, 2, 6, 1, 4], 'E': [0, 1, 1, 0, 1]}
df1 = pd.DataFrame(data=spread_sheet1)

spread_sheet2 = {'A': [1, 2, 1, 2, 4], 'B': [3, 5, 2, 1, 4], 'C': [9, 4, 1, 1, 4], 'D': [1, 5, 6, 1, 7], 'E': [1, 1, 1, 0, 1]}
df2 = pd.DataFrame(data=spread_sheet2)

features = ['A','B','C']
Y = np.stack([df1['E'].to_numpy(), df2['E'].to_numpy()])
Y = tf.convert_to_tensor(Y, dtype=tf.int32)
X = np.stack([df1[features].to_numpy(), df2[features].to_numpy()])
X = tf.convert_to_tensor(X, dtype=tf.float32)


dataset = tf.data.Dataset.from_tensor_slices((X, Y))
print('Shape of X --> ', X.shape)
for x, y in dataset:
  print(x, y)

Shape of X -->  (2, 5, 3)
tf.Tensor(
[[1. 3. 3.]
 [2. 4. 4.]
 [1. 6. 3.]
 [2. 1. 1.]
 [9. 4. 4.]], shape=(5, 3), dtype=float32) tf.Tensor([0 1 1 0 1], shape=(5,), dtype=int32)
tf.Tensor(
[[1. 3. 9.]
 [2. 5. 4.]
 [1. 2. 1.]
 [2. 1. 1.]
 [4. 4. 4.]], shape=(5, 3), dtype=float32) tf.Tensor([1 1 1 0 1], shape=(5,), dtype=int32)

Reading from an excel file file.xlsx with multiple sheets can be done like this:

import pandas as pd
import tensorflow as tf
import multiprocessing

df = pd.read_excel('file.xlsx', sheet_name=None)
file_names = list(df.keys())

columns = ['A','B','C']
features = []
labels = []
for n in file_names:
  temp_df = df[n]
  features.append(temp_df[columns].to_numpy())
  labels.append(temp_df['E'].to_numpy())
  
Y = tf.convert_to_tensor(np.stack(labels), dtype=tf.int32)
X = tf.convert_to_tensor(np.stack(features), dtype=tf.float32)
dataset = tf.data.Dataset.from_tensor_slices((X, Y))

print('Shape of X --> ', X.shape)
for x, y in dataset:
  print(x, y)

edited Nov 4, 2021 at 13:59

answered Nov 1, 2021 at 14:06

AloneTogether

26.8k5 gold badges23 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Cheung Over a year ago

Thank you so much for answering my question, but I want to fit my data of tenor with shape (5, 3), and the whole column E as one batch of input each time. So I can have features like (1000, 5, 3) with column A,B,C and label (1000,5), total 1000 batches of data.

AloneTogether Over a year ago

So each batch of yours should contain 5 samples and 3 values from the columns A,B and C, so for example (32, 5, 3) where 32 is the batch size ? For your labels (32,5)?

Cheung Over a year ago

Yes, you are right. I want to use 5 sample as one input to predict the 5 label each time.

AloneTogether Over a year ago

Updated answer. Is that what you want?

Cheung Over a year ago

Yes, the result is exactly what I want. So after I use pd.read & sheet_name = None, is it turned to a dictionary of dataframe? How do I iterate those 1000 spreadsheet?

|

Collectives™ on Stack Overflow

Keras input Pandas dataframe

1 Answer 1

10 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Related