Skip to content

Conversation

@kshitij12345
Copy link
Collaborator

@kshitij12345 kshitij12345 commented Sep 2, 2020

Fixes #43723

use weight.contiguous on fast-path as it expects contiguous tensor.

TODO:

  • Add tests

use weight.contiguous on fast-path as it expects
contiguous tensor.
@dr-ci
Copy link

dr-ci bot commented Sep 2, 2020

💊 CI failures summary and remediations

As of commit fe6961b (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 2/2 non-CircleCI failure(s)

ci.pytorch.org: 2 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 8 times.

@codecov
Copy link

codecov bot commented Sep 2, 2020

Codecov Report

Merging #44032 into master will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #44032   +/-   ##
=======================================
  Coverage   69.29%   69.29%           
=======================================
  Files         381      381           
  Lines       47214    47214           
=======================================
+ Hits        32717    32718    +1     
+ Misses      14497    14496    -1     
Impacted Files Coverage Δ
torch/testing/_internal/expecttest.py 78.57% <0.00%> (+1.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update df8da5c...fe6961b. Read the comment docs.

@smessmer smessmer requested a review from glaringlee September 2, 2020 21:15
@smessmer smessmer added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: operators and removed module: cuda Related to torch.cuda, and CUDA support in general labels Sep 2, 2020
Copy link
Contributor

@glaringlee glaringlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding tests.
I have a nit comment, please take a look at.

* use method chaining.
@glaringlee
Copy link
Contributor

@kshitij12345 Thanks a lot for fixing this. i will approve this once the CI test is done without issues.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glaringlee has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@glaringlee glaringlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now.

@kshitij12345
Copy link
Collaborator Author

@glaringlee Gentle Ping :)

@glaringlee
Copy link
Contributor

@kshitij12345
Hi, your code will be landed today.
The code is in landing process, here is what happened, after the approval on github, I imported the code to our fb internal system, and there will be another round approval there before landing the code. We were on time off since last Friday due to labor day, so the other approval will happen today and your code will be landed then. Thanks for reaching out.

@kshitij12345
Copy link
Collaborator Author

@glaringlee Oops I forgot it was Federal Holiday. Thanks!

@facebook-github-bot
Copy link
Contributor

@glaringlee merged this pull request in 6dd53fb.

@kshitij12345 kshitij12345 deleted the fix/embedding-bag/fast-path branch September 9, 2020 03:40
def test_embedding_bag_non_contiguous_weight(self, device, dtype):
weight_tensor = torch.randn(4, 3, dtype=dtype, device=device)

weight_tensor_non_contig = weight_tensor[:, :3] # This is non-contiguous strided.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is contiguous tensor

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! Great catch!!
Was supposed to be

weight_tensor = torch.randn(3, 4, dtype=dtype, device=device)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kshitij12345 I'll put a fix for u.

Copy link
Collaborator Author

@kshitij12345 kshitij12345 Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Thanks! Btw I am free this evening so I can put it up as well. Let me know if I should.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no worries. I have a clean pytorch repo on hand, will patch this shortly.

@glaringlee
Copy link
Contributor

#44382 is entered for patching this.

facebook-github-bot pushed a commit that referenced this pull request Sep 9, 2020
Summary:
Pull Request resolved: #44382

This is to fix a typo that introduced in #44032.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D23601316

Pulled By: glaringlee

fbshipit-source-id: 17d6de5900443ea46c7a6ee9c7614fe6f2d92890
auto* output_data = output.data_ptr<float>();

if (isFastPathIndexSelect(src, output)) {
auto* src_data = src.contiguous().data_ptr<float>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, this is still a bug, right? If the tensor is non-contiguous then contiguous() will return a new tensor and it will be immediately destroyed (because we don't keep a reference to it around). So src_data will point to the deallocated memory :(

I wonder why ASAN doesn't catch it.

It should be

auto src_contig = src.contiguous();
auto* src_data = src_contig.data_ptr<float>();

cc @glaringlee @ngimel

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dzhulgakov oh, shoot.........my bad, will fix soon.

glaringlee pushed a commit that referenced this pull request Sep 11, 2020
fix dangling ptr in #44032

Differential Revision: [D23661007](https://our.internmc.facebook.com/intern/diff/D23661007)

[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

embedding_bag running on CPU produces wrong result when weight tensor is non-contiguous.

8 participants