Skip to content
This repository was archived by the owner on Nov 3, 2023. It is now read-only.

Eliminate dummy batches and init_cuda_buffer.#3732

Merged
stephenroller merged 1 commit into
masterfrom
noinitcuda
Jun 21, 2021
Merged

Eliminate dummy batches and init_cuda_buffer.#3732
stephenroller merged 1 commit into
masterfrom
noinitcuda

Conversation

@stephenroller

Copy link
Copy Markdown
Contributor

Patch description
Eliminate the dreaded _dummy_batch and _init_cuda_buffer.

Users have complained about the presence of these methods. When debugging, the first batch being a dummy causes some confusion (why am I getting gibberish data?) and the manual implementation of a _dummy_batch is annoying (why is my model breaking on the first pass?)

Adopt Fairseq's newer approach:

  • Cache the very first batch we ever seen. If it's invalid, we should be raising an exception anyway. If it's valid, it will be good enough
  • Do NOT initialize the cuda buffer. Just keep that batch around
  • If we need to recover from an OOM, used the cached dummy batch as the data.

Testing steps
CI.

@klshuster klshuster left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for making this change. I'm approving but did have one perhaps pertinent question

Cache a batch to be used as a dummy during _fake_forward_pass.
"""
if not hasattr(self, '_dummy_batch'):
self._dummy_batch = batch

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this unnecessarily use up memory? what if the first batch is huge for example?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a worse problem for images. For text, imagine it's a 2x2048x1024 LongTensor (2x for content and label; bs 2048; string length 1024). A long is 8 bytes. So we'll be wasting 32MiB of memory.

@stephenroller stephenroller merged commit 36a63c7 into master Jun 21, 2021
@stephenroller stephenroller deleted the noinitcuda branch June 21, 2021 22:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants