I am looking at the Keras Glove word embedding example and it is not clear why the first row of the embedding matrix is populated with zeros.
First, the embedding index is created where words are associated with arrays.
embeddings_index = {}
with open(os.path.join(GLOVE_DIR, 'glove.6B.100d.txt')) as f:
for line in f:
word, coefs = line.split(maxsplit=1)
coefs = np.fromstring(coefs, 'f', sep=' ')
embeddings_index[word] = coefs
Then the embedding matrix is created by looking at words from the index created by tokenizer.
# prepare embedding matrix
num_words = min(MAX_NUM_WORDS, len(word_index) + 1)
embedding_matrix = np.zeros((num_words, EMBEDDING_DIM))
for word, i in word_index.items():
if i >= MAX_NUM_WORDS:
continue
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
# words not found in embedding index will be all-zeros.
embedding_matrix[i] = embedding_vector
Since the loop will start with i=1, then the first row will contain only zeros and random numbers if the matrix is initialized differently. Is there a reason for skipping the first row?