Skip to content

Conversation

@nouiz
Copy link
Contributor

@nouiz nouiz commented Jul 26, 2021

Some of the refactoring caused crash when using TF_CUDA_MALLOC_ASYNC_SUPPORTED_PREALLOC.
This PR fix them and add a test to be sure it continue to work.

This PR is on top of #50961 as otherwise there would be code conflict.

Now, GpuCudaMallocAsyncAllocator doesn't create its own compute stream. But it use one stream set after construction when GpuCudaMallocAsyncAllocator::SetStream is called. So it is now impossible to preallocate all memory during the object construction.
So this PR postpone the memory preallocation to when the stream is available, in SetStream.

This PR also check that the stream isn't set when SetStream is called.

@sanjoy

@tensorflow-bot tensorflow-bot bot added the size:M CL Change Size: Medium label Jul 26, 2021
@google-cla google-cla bot added the cla: yes label Jul 26, 2021
@gbaned gbaned self-assigned this Jul 27, 2021
@gbaned gbaned added the comp:core issues related to core part of tensorflow label Jul 27, 2021
@gbaned gbaned requested a review from sanjoy July 27, 2021 04:16
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 27, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 27, 2021
@sanjoy
Copy link
Contributor

sanjoy commented Jul 27, 2021

Please make the PR description a bit more descriptive (what was the bug?).

CC @hanbinyoon

@nouiz
Copy link
Contributor Author

nouiz commented Jul 27, 2021

I updated the description.

@nouiz nouiz force-pushed the upstream-cudaMallocAsync-Preallocate branch from 1cb58ca to 17d3b28 Compare July 27, 2021 21:31
@google-ml-butler google-ml-butler bot removed the ready to pull PR ready for merge process label Jul 27, 2021
@gbaned gbaned requested review from sanjoy and removed request for sanjoy July 28, 2021 13:46
@google-ml-butler google-ml-butler bot added the awaiting review Pull request awaiting review label Aug 2, 2021
@tensorflowbutler tensorflowbutler removed the awaiting review Pull request awaiting review label Aug 5, 2021
@gbaned
Copy link
Contributor

gbaned commented Aug 9, 2021

@nouiz Can you please check @sanjoy's comments and keep us posted ? Thanks!

@gbaned gbaned added the stat:awaiting response Status - Awaiting response from author label Aug 9, 2021
@nouiz nouiz force-pushed the upstream-cudaMallocAsync-Preallocate branch from 17d3b28 to e41fc39 Compare August 19, 2021 13:45
@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Aug 21, 2021
@nouiz nouiz force-pushed the upstream-cudaMallocAsync-Preallocate branch from e41fc39 to f95e862 Compare August 30, 2021 16:58
@nouiz
Copy link
Contributor Author

nouiz commented Aug 30, 2021

I rebased this PR since the dependent PR is merged.
I also did the review comment. So it should be ready to merge now.

@gbaned gbaned requested a review from sanjoy September 1, 2021 14:44
@google-ml-butler google-ml-butler bot added the awaiting review Pull request awaiting review label Sep 1, 2021
@nouiz
Copy link
Contributor Author

nouiz commented Sep 7, 2021

Any update?

Copy link
Contributor

@sanjoy sanjoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a minor comment.

@gbaned gbaned removed the awaiting review Pull request awaiting review label Sep 8, 2021
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Sep 8, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Sep 8, 2021
@gbaned gbaned added ready to pull PR ready for merge process and removed ready to pull PR ready for merge process labels Sep 9, 2021
@nouiz
Copy link
Contributor Author

nouiz commented Sep 10, 2021

The CI seems to have failed:

feedback/copybara — Google internal checks FAILED for runs with create time 2021-09-09T07:29:11.802148007Z.

Is this related to this PR?

@copybara-service copybara-service bot merged commit 5a44e35 into tensorflow:master Sep 11, 2021
@google-ml-butler google-ml-butler bot removed the ready to pull PR ready for merge process label Sep 11, 2021
@nouiz nouiz deleted the upstream-cudaMallocAsync-Preallocate branch January 26, 2022 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes comp:core issues related to core part of tensorflow size:M CL Change Size: Medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants