-
Notifications
You must be signed in to change notification settings - Fork 26.3k
serialization: validate sparse tensors after loading #34059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Baranowski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't be surprised if I did something wrong working with the Python-C++ interface.
@ezyang, could you please suggest a reviewer
💊 CI failures summary and remediationsAs of commit 5b84ca0 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 86 times. |
|
@Baranowski to start, can you get one of your teammates working on sparse to do a first review of this PR? I can take a look after that. |
|
btw it looks like there are some test failures |
|
Yes, thank you. I'm looking.
…On Mon, Mar 2, 2020 at 7:40 PM Edward Z. Yang ***@***.***> wrote:
btw it looks like there are some test failures
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#34059?email_source=notifications&email_token=AAEBT2WB6T6WAJKVT6MEAI3RFP4TVA5CNFSM4K7RQTZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENQOQYY#issuecomment-593553507>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEBT2RO32HJELW6YHSCBKLRFP4TVANCNFSM4K7RQTZA>
.
--
*Wojciech Baranowski*
*Software Engineer*
[image: www.quansight.com] <http://www.quansight.com>
|
|
For the record, (at least some) CI failures are due to #34126, I have put up a PR for that. |
b780c06 to
85f3ad4
Compare
|
@hameerabbasi, this is ready for a review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need a TORCH_API here, since this is the symbol that is "not found".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is TORCH_API in the header file. Besides, I believe TORCH_API doesn't do anything when building with gcc, as is the case on our dev machine.
Anyway, this was taking way more time than I considered worth it, so I abandoned this attempt and just left _validate_sparse_coo_tensor_args as a method on torch. I can try some more if you or another reviewer feel strongly about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't feel too strongly, seems like the methods these are related to naturally go in torch. I was just hoping to give some pointers on how to make it work.
3b5d71d to
421b985
Compare
|
@ezyang this is ready for a review. No rush. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit misleading that this returns a bool, but actually will never return false (and instead will just raise an error message). I think raising error messages is right, but then maybe the return type should just be void in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither void, nor None work as a result type in native_function.yaml. Or at least I couldn't find a way to make them work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, it's because it's spelled (). So blah_blah(...) -> ()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary here OR you should remove the expand_values_if_needed from the validate/unsafe calls below.
torch/tensor.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The logic here seems inverted. IF it is sparse, then I should validate sparse coo tensor args (not the other way around!)
torch/serialization.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the sparse tensor is recursively stored inside of another struct, e.g., a dict?
|
You need to actually make sure the hook is called for every object you unpickle, and no, I don't want you to write some sort of function to recursively traverse arbitrary structs looking for sparse tensors, that way leads to madness. Maybe we can maintain some global variable (ugh!) of "deserialized sparse tensors that need validating" and then check them at the end with load. |
95ef830 to
eb912e5
Compare
|
Done. The CI failures don't look related. |
eb912e5 to
27eab3f
Compare
|
Getting a second opinion on this |
27eab3f to
c2af0b8
Compare
|
@ezyang, any updates? There is no rush. I'm just making sure this doesn't fall off the radar. |
|
cc @gchanan |
|
@hameerabbasi Do you have time for another review. I've changed it substantially in response to Ed's review, and it seems that FB folks have been too busy recently. |
|
gchanan is aware but tied up with 1.5 release right now. |
hameerabbasi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small question.
torch/_utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct in thinking that this would skip all Tensors if one raised? If so, is that the intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. That was my intention. Do you think that I should leave the ones that haven't been verified yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, that should be fine because the exception will go straight through _legacy_load() or _load() without being caught, which won't return any result so all the unvalidated sparse tensors will be discarded anyway.
torch/_utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to get rid of the global state here? Could cause issues when threading or leave things in an invalid state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I can think of. I don't know enough about Python threading and GIL to understand the dangers. Do you think I should guard this with some mutex in _legacy_load() in serialization.py? That would get really ugly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you somehow put it inside _legacy_load and pass it around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot think of a way to do that.
|
@gchanan No rush, just a friendly periodic ping to make sure you are still aware. I will send another ping in a week or so, unless instructed otherwise. |
|
@gchanan, this is another friendly low-pri ping for a review |
|
@gchanan, another friendly ping for a review but I'm in no rush. Next ping coming in about a month or so. |
|
@gchanan, another friendly ping for a review. I won't send any more of those. |
ezyang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to stick my neck out and approve this. Can we get a merge to master?
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
c2af0b8 to
0f8f570
Compare
Fixes pytorch#33439 This introduces torch._sparse_coo_tensor_unsafe(...) and torch._validate_sparse_coo_tensor_args(...)
0f8f570 to
5b84ca0
Compare
|
@ezyang Done. Sorry for the delay. I was chasing a spurious CI failure that looked like it was my fault. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Fixes #33439
This introduces torch._sparse_coo_tensor_unsafe(...) and
torch._validate_sparse_coo_tensor_args(...)