0

I'm trying to make an ANN in Python to predict something from a dataset (in this case diabetes), and I'm struggling to figure out how to solve this error.

Here is the full code:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, KFold, cross_val_score
from sklearn import preprocessing
from keras import Sequential
from keras.layers import Dense
from sklearn.metrics import confusion_matrix, accuracy_score

data = pd.read_csv('C:/Users/<<>>/Downloads/Dataset of Diabetes.csv')

# drop irrelevant columns
dropcols = ['ID', 'No_Pation']
data = data.drop(dropcols, axis=1)
data.info()

X = data.values
Y = data['CLASS'].values

label_encoder = preprocessing.LabelEncoder()
data['CLASS'] = label_encoder.fit_transform(data['CLASS'])
data['Gender'] = label_encoder.fit_transform(data['Gender'])
data['CLASS'].unique()
data['Gender'].unique()
data.info()


X = np.delete(X, 1, axis=1)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=42)

X_train = np.asarray(X_train).astype(np.float32)
Y_train = np.asarray(Y_train).astype(np.float32)

classifier = Sequential()
classifier.add(Dense(units=10, activation='relu', input_dim=X.shape[1]))
classifier.add(Dense(units=10, activation='relu'))
classifier.add(Dense(units=1, activation='sigmoid'))
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
classifier.fit(X_train, Y_train, epochs=100, batch_size=10)

Y_pred = classifier.predict(X_test)
Y_pred_int = (Y_pred > 0.5).astype(int)
cm = confusion_matrix(Y_test, Y_pred_int)
acc = accuracy_score(Y_test, Y_pred_int)
print("Accuracy:", acc)
print("Confusion Matrix:\n", cm)

This is what the last "data.info()" line returns:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 12 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Gender  1000 non-null   int32  
 1   AGE     1000 non-null   int64  
 2   Urea    1000 non-null   float64
 3   Cr      1000 non-null   int64  
 4   HbA1c   1000 non-null   float64
 5   Chol    1000 non-null   float64
 6   TG      1000 non-null   float64
 7   HDL     1000 non-null   float64
 8   LDL     1000 non-null   float64
 9   VLDL    1000 non-null   float64
 10  BMI     1000 non-null   float64
 11  CLASS   1000 non-null   int32  
dtypes: float64(8), int32(2), int64(2)
memory usage: 86.1 KB

Here is the error message that I am getting:

Traceback (most recent call last):
  File "C:\Users\<<>>\PycharmProjects\AI2\NeuralNetwork.py", line 32, in <module>
    X_train = np.asarray(X_train).astype(np.float32)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: could not convert string to float: 'M'

Another error I just realised I have (more of warning) is: UserWarning: Do not pass an input_shape/input_dim argument to a layer. When using Sequential models, prefer using an Input(shape) object as the first layer in the model instead. super().init(activity_regularizer=activity_regularizer, **kwargs) What does this mean?

Also, I have been getting getting the recurring error of "ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).

If there are also any other issues with what I have done so far, please let me know!

Many Thanks

Link to the dataset: https://data.mendeley.com/datasets/wj9rwkp9c2/1

I've already tried converting the X and Y trains to np arrays, but I'm not sure what else I need to do.

7
  • Any way you can share a link to your data or post the first few lines from your csv file here? Commented Apr 23, 2024 at 8:08
  • 2
    One thing you can try is assigning X and Y after the label_encoder lines Commented Apr 23, 2024 at 8:09
  • Please edit your question to add information. Without formatting and in a comment, the data is virtually unreadable. Commented Apr 23, 2024 at 9:29
  • Just rehashing what FlyingTeller has said since you seem to have missed it You have set X to data.values BEFORE you have encoded the gender and class of the data. Your X has values of M and F for gender which is why it is not able to encode the values as a float (also there is no need to save gender as a float) Commented Apr 23, 2024 at 9:32
  • I have added the link to the CSV in the question Commented Apr 23, 2024 at 9:54

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.