0

I'm new to Keras and I want to fit my train data in an Excel file. My data has shape(1000, 5, 5), 1000 batches of data which are saved in 1000 spreadsheets, each sheet contain 5 columns and rows:

A B C D E
- - - - label
- - - - label
- - - - label
- - - - label
- - - - label

I want Column A, B, C to be training features and Column E to be label.

import pandas as pd
import tensorflow as tf
import multiprocessing

df = pd.read_excel('File.xlsx', sheet_name=None)
data_list = list(df.values())

def input_parser(x):
    Y = x.pop('E')
    features = ['A','B','C']
    X = x[features]
    return X, Y

dataset = tf.data.Dataset.from_tensor_slices(data_list)
dataset = dataset.map(lambda x: tuple(tf.py_function(func=input_parser,
                                                     inp=[x],
                                                     Tout=[tf.float32,tf.int64])),
                      num_parallel_calls=multiprocessing.cpu_count())

and then I got an error:

ValueError: Can't convert non-rectangular Python sequence to Tensor.

Why do I get this error? How can I fit this data to my model?

1 Answer 1

1

Maybe try omitting your map function altogether and simply passing your data directly to tf.data.Dataset.from_tensor_slices:

import pandas as pd
import tensorflow as tf
import numpy as np

spread_sheet1 = {'A': [1, 2, 1, 2, 9], 'B': [3, 4, 6, 1, 4], 'C': [3, 4, 3, 1, 4], 'D': [1, 2, 6, 1, 4], 'E': [0, 1, 1, 0, 1]}
df1 = pd.DataFrame(data=spread_sheet1)

spread_sheet2 = {'A': [1, 2, 1, 2, 4], 'B': [3, 5, 2, 1, 4], 'C': [9, 4, 1, 1, 4], 'D': [1, 5, 6, 1, 7], 'E': [1, 1, 1, 0, 1]}
df2 = pd.DataFrame(data=spread_sheet2)

features = ['A','B','C']
Y = np.stack([df1['E'].to_numpy(), df2['E'].to_numpy()])
Y = tf.convert_to_tensor(Y, dtype=tf.int32)
X = np.stack([df1[features].to_numpy(), df2[features].to_numpy()])
X = tf.convert_to_tensor(X, dtype=tf.float32)


dataset = tf.data.Dataset.from_tensor_slices((X, Y))
print('Shape of X --> ', X.shape)
for x, y in dataset:
  print(x, y)
Shape of X -->  (2, 5, 3)
tf.Tensor(
[[1. 3. 3.]
 [2. 4. 4.]
 [1. 6. 3.]
 [2. 1. 1.]
 [9. 4. 4.]], shape=(5, 3), dtype=float32) tf.Tensor([0 1 1 0 1], shape=(5,), dtype=int32)
tf.Tensor(
[[1. 3. 9.]
 [2. 5. 4.]
 [1. 2. 1.]
 [2. 1. 1.]
 [4. 4. 4.]], shape=(5, 3), dtype=float32) tf.Tensor([1 1 1 0 1], shape=(5,), dtype=int32)

Reading from an excel file file.xlsx with multiple sheets can be done like this:

import pandas as pd
import tensorflow as tf
import multiprocessing

df = pd.read_excel('file.xlsx', sheet_name=None)
file_names = list(df.keys())

columns = ['A','B','C']
features = []
labels = []
for n in file_names:
  temp_df = df[n]
  features.append(temp_df[columns].to_numpy())
  labels.append(temp_df['E'].to_numpy())
  
Y = tf.convert_to_tensor(np.stack(labels), dtype=tf.int32)
X = tf.convert_to_tensor(np.stack(features), dtype=tf.float32)
dataset = tf.data.Dataset.from_tensor_slices((X, Y))

print('Shape of X --> ', X.shape)
for x, y in dataset:
  print(x, y)
Sign up to request clarification or add additional context in comments.

10 Comments

Thank you so much for answering my question, but I want to fit my data of tenor with shape (5, 3), and the whole column E as one batch of input each time. So I can have features like (1000, 5, 3) with column A,B,C and label (1000,5), total 1000 batches of data.
So each batch of yours should contain 5 samples and 3 values from the columns A,B and C, so for example (32, 5, 3) where 32 is the batch size ? For your labels (32,5)?
Yes, you are right. I want to use 5 sample as one input to predict the 5 label each time.
Updated answer. Is that what you want?
Yes, the result is exactly what I want. So after I use pd.read & sheet_name = None, is it turned to a dictionary of dataframe? How do I iterate those 1000 spreadsheet?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.