Skip to content

Empty hypothesis when periods are included in dataset #62

@vijayantajain

Description

@vijayantajain

Hello Uri,

I am trying to train the Code2Seq model on the Funcom dataset. I tokenized the dataset by removing all special characters except for periods and commas. When I train a Code2Seq model on this dataset, I get the following error :

Saved after 1 epochs in: models/funcom-test/model_iter1
Finished 1 epochs
Done testing, epoch reached
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: 2 root error(s) found.
  (0) Out of range: End of sequence
	 [[{{node IteratorGetNext}}]]
  (1) Out of range: End of sequence
	 [[{{node IteratorGetNext}}]]
	 [[IteratorGetNext/_27]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/vijayantajain/code/experiments/code2seq/model.py", line 96, in train
    _, batch_loss = self.sess.run([optimizer, train_loss])
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/opt/conda/lib/python3.7/site-packages/tensorflowI have tried this couple of times by changing model configurations, batch-size but still get this error when the comments have periods.
_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: 2 root error(s) found.
  (0) Out of range: End of sequence
	 [[node IteratorGetNext (defined at /opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Out of range: End of sequence
	 [[node IteratorGetNext (defined at /opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
	 [[IteratorGetNext/_27]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'IteratorGetNext':
  File "code2seq.py", line 39, in <module>
    model.train()
  File "/home/vijayantajain/code/experiments/code2seq/model.py", line 77, in train
    config=self.config)
  File "/home/vijayantajain/code/experiments/code2seq/reader.py", line 43, in __init__
    self.output_tensors = self.compute_output()
  File "/home/vijayantajain/code/experiments/code2seq/reader.py", line 192, in compute_output
    return self.iterator.get_next()
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/data/ops/iterator_ops.py", line 426, in get_next
    name=name)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_dataset_ops.py", line 2518, in iterator_get_next
    output_shapes=output_shapes, name=name)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "code2seq.py", line 39, in <module>
    model.train()
  File "/home/vijayantajain/code/experiments/code2seq/model.py", line 108, in train
    results, precision, recall, f1, rouge = self.evaluate()
  File "/home/vijayantajain/code/experiments/code2seq/model.py", line 230, in evaluate
    hyp_path=predicted_file_name, ref_path=ref_file_name, avg=True, ignore_empty=True)
  File "/opt/conda/lib/python3.7/site-packages/rouge/rouge.py", line 47, in get_scores
    ignore_empty=ignore_empty)
  File "/opt/conda/lib/python3.7/site-packages/rouge/rouge.py", line 105, in get_scores
    return self._get_avg_scores(hyps, refs)
  File "/opt/conda/lib/python3.7/site-packages/rouge/rouge.py", line 145, in _get_avg_scores
    sc = fn(hyp, ref, exclusive=self.exclusive)
  File "/opt/conda/lib/python3.7/site-packages/rouge/rouge.py", line 53, in <lambda>
    "rouge-1": lambda hyp, ref, **k: rouge_score.rouge_n(hyp, ref, 1, **k),
  File "/opt/conda/lib/python3.7/site-packages/rouge/rouge_score.py", line 253, in rouge_n
    raise ValueError("Hypothesis is empty.")
ValueError: Hypothesis is empty.

When I check pred.txt in the models directory I see that some lines are empty which is most likely causing the error.

When I remove all special characters in the Funcom dataset, including periods and comma, and train again I do not get this error.

Any idea on why the model would not predict anything for some examples if there are periods and commas in the dataset?

Thanks!
VJ

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions