7

I'm training a CNN model on images. Initially, I was training on image patches of size (256, 256) and everything was fine. Then I changed my dataloader to load full HD images (1080, 1920) and I was cropping the images after some processing. In this case, the GPU memory keeps increasing with every batch. Why is this happening?

PS: While tracking losses, I'm doing loss.detach().item() so that loss is not retained in the graph.

2 Answers 2

5

As suggested here, deleting the input, output and loss data helped.

Additionally, I had the data as a dictionary. Just deleting the dictionary isn't sufficient. I had to iterate over the dict elements and delete all of them.

Sign up to request clarification or add additional context in comments.

Comments

1

I had a similar issue but it accumulated much much more slowly, after millions of iterations there was a lot of memory being used (hard to debug as you would imagine). I think it's because I had run export CUDA_LAUNCH_BLOCKING=1 export TORCH_USE_CUDA_DSA=1 to turn on the debugging flags before starting my run.

Another thing worth trying for those with this issue is to clear memory each epoch.

import gc
import torch
gc.collect()
torch.cuda.empty_cache()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.