13

I'm trying to make an array of one-hot vector of integers into an array of one-hot vector that keras will be able to use to fit my model. Here's the relevant part of the code:

Y_train = np.hstack(np.asarray(dataframe.output_vector)).reshape(len(dataframe),len(output_cols))
dummy_y = np_utils.to_categorical(Y_train)

Below is an image showing what Y_train and dummy_y actually are.

I couldn't find any documentation for to_categorical that could help me.

Thanks in advance.

1
  • 1
    Y_train is already a one-hot vector, you can use that directly and there is no need to use to_categorical, what is the actual problem? Commented Jan 5, 2017 at 22:10

2 Answers 2

34

np_utils.to_categorical is used to convert array of labeled data(from 0 to nb_classes - 1) to one-hot vector.

The official doc with an example.

In [1]: from keras.utils import np_utils # from keras import utils as np_utils
Using Theano backend.

In [2]: np_utils.to_categorical?
Signature: np_utils.to_categorical(y, num_classes=None)
Docstring:
Convert class vector (integers from 0 to nb_classes) to binary class matrix, for use with categorical_crossentropy.

# Arguments
    y: class vector to be converted into a matrix
    nb_classes: total number of classes

# Returns
    A binary matrix representation of the input.
File:      /usr/local/lib/python3.5/dist-packages/keras/utils/np_utils.py
Type:      function

In [3]: y_train = [1, 0, 3, 4, 5, 0, 2, 1]

In [4]: """ Assuming the labeled dataset has total six classes (0 to 5), y_train is the true label array """

In [5]: np_utils.to_categorical(y_train, num_classes=6)
Out[5]:
array([[ 0.,  1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.],
       [ 1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.]])
Sign up to request clarification or add additional context in comments.

3 Comments

nb_classes should be num_classes according to documentation here: keras.io/utils
In the public API, you cannot access np_util (it's internal). You are supposed to access utilities via the utils module, e.g. from keras import utils utils.to_categorical(...) See this issue: github.com/keras-team/keras/issues/8838
@ArayanSingh Thanks for pointing, I added the import in comment.
10
from keras.utils.np_utils import to_categorical

UPDATED --- keras.utils.np_utils doesn't work in newer versions; if so use:

from tensorflow.keras.utils import to_categorical

In both cases

to_categorical(0, max_value_of_array)

It assumes the class values were in string and you will be label encoding them, hence starting everytime from 0 to n-classes.

for the same example:- consider an array of {1,2,3,4,2}

The output will be [zero value, one value, two value, three value, four value]

array([[ 0.,  1.,  0., 0., 0.],
       [ 0.,  0.,  1., 0., 0.],
       [ 0.,  0.,  0., 1., 0.],
       [ 0.,  0.,  0., 0., 1.],
       [ 0.,  0.,  1., 0., 0.]],

Let's look at another example:-

Again, for an array having 3 classes, Y = {4, 8, 9, 4, 9}

to_categorical(Y) will output

array([[0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0. ],
       [0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0. ],
       [0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1. ],
       [0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0. ],
       [0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1. ]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.