Skip to content

Conversation

@Baranowski
Copy link
Contributor

Fixes #33439

This introduces torch._sparse_coo_tensor_unsafe(...) and
torch._validate_sparse_coo_tensor_args(...)

Copy link
Contributor Author

@Baranowski Baranowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't be surprised if I did something wrong working with the Python-C++ interface.

@ezyang, could you please suggest a reviewer

@dr-ci
Copy link

dr-ci bot commented Mar 2, 2020

💊 CI failures summary and remediations

As of commit 5b84ca0 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 86 times.

@ezyang
Copy link
Contributor

ezyang commented Mar 2, 2020

@Baranowski to start, can you get one of your teammates working on sparse to do a first review of this PR? I can take a look after that.

@ezyang ezyang self-requested a review March 2, 2020 17:21
@Baranowski Baranowski requested a review from hameerabbasi March 2, 2020 17:36
@ezyang
Copy link
Contributor

ezyang commented Mar 2, 2020

btw it looks like there are some test failures

@ezyang ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 2, 2020
@Baranowski
Copy link
Contributor Author

Baranowski commented Mar 2, 2020 via email

@Baranowski
Copy link
Contributor Author

For the record, (at least some) CI failures are due to #34126, I have put up a PR for that.

@Baranowski Baranowski force-pushed the wbaranowski-sparse_load-33439 branch from b780c06 to 85f3ad4 Compare March 12, 2020 22:11
@Baranowski
Copy link
Contributor Author

@hameerabbasi, this is ready for a review

@Baranowski Baranowski changed the title serialization: validate sparse tensors after loading [WiP] serialization: validate sparse tensors after loading Mar 16, 2020
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need a TORCH_API here, since this is the symbol that is "not found".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is TORCH_API in the header file. Besides, I believe TORCH_API doesn't do anything when building with gcc, as is the case on our dev machine.

Anyway, this was taking way more time than I considered worth it, so I abandoned this attempt and just left _validate_sparse_coo_tensor_args as a method on torch. I can try some more if you or another reviewer feel strongly about it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't feel too strongly, seems like the methods these are related to naturally go in torch. I was just hoping to give some pointers on how to make it work.

@Baranowski Baranowski force-pushed the wbaranowski-sparse_load-33439 branch from 3b5d71d to 421b985 Compare March 18, 2020 15:08
@Baranowski Baranowski changed the title [WiP] serialization: validate sparse tensors after loading serialization: validate sparse tensors after loading Mar 18, 2020
@Baranowski
Copy link
Contributor Author

@ezyang this is ready for a review. No rush.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit misleading that this returns a bool, but actually will never return false (and instead will just raise an error message). I think raising error messages is right, but then maybe the return type should just be void in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither void, nor None work as a result type in native_function.yaml. Or at least I couldn't find a way to make them work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, it's because it's spelled (). So blah_blah(...) -> ()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary here OR you should remove the expand_values_if_needed from the validate/unsafe calls below.

torch/tensor.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The logic here seems inverted. IF it is sparse, then I should validate sparse coo tensor args (not the other way around!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the sparse tensor is recursively stored inside of another struct, e.g., a dict?

@ezyang
Copy link
Contributor

ezyang commented Mar 19, 2020

You need to actually make sure the hook is called for every object you unpickle, and no, I don't want you to write some sort of function to recursively traverse arbitrary structs looking for sparse tensors, that way leads to madness. Maybe we can maintain some global variable (ugh!) of "deserialized sparse tensors that need validating" and then check them at the end with load.

@Baranowski Baranowski force-pushed the wbaranowski-sparse_load-33439 branch from 95ef830 to eb912e5 Compare March 21, 2020 10:23
@Baranowski
Copy link
Contributor Author

Done. The CI failures don't look related.

@Baranowski Baranowski force-pushed the wbaranowski-sparse_load-33439 branch from eb912e5 to 27eab3f Compare March 21, 2020 19:04
@ezyang ezyang requested a review from gchanan March 23, 2020 19:02
@ezyang
Copy link
Contributor

ezyang commented Mar 23, 2020

Getting a second opinion on this

@Baranowski Baranowski force-pushed the wbaranowski-sparse_load-33439 branch from 27eab3f to c2af0b8 Compare March 31, 2020 15:04
@Baranowski
Copy link
Contributor Author

@ezyang, any updates? There is no rush. I'm just making sure this doesn't fall off the radar.

@ezyang
Copy link
Contributor

ezyang commented Mar 31, 2020

cc @gchanan

@Baranowski
Copy link
Contributor Author

@hameerabbasi Do you have time for another review. I've changed it substantially in response to Ed's review, and it seems that FB folks have been too busy recently.

@hameerabbasi hameerabbasi self-requested a review April 9, 2020 20:22
@ezyang
Copy link
Contributor

ezyang commented Apr 10, 2020

gchanan is aware but tied up with 1.5 release right now.

Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small question.

torch/_utils.py Outdated
Comment on lines +157 to +162
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in thinking that this would skip all Tensors if one raised? If so, is that the intention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. That was my intention. Do you think that I should leave the ones that haven't been verified yet?

Copy link
Contributor Author

@Baranowski Baranowski Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that should be fine because the exception will go straight through _legacy_load() or _load() without being caught, which won't return any result so all the unvalidated sparse tensors will be discarded anyway.

torch/_utils.py Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to get rid of the global state here? Could cause issues when threading or leave things in an invalid state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I can think of. I don't know enough about Python threading and GIL to understand the dangers. Do you think I should guard this with some mutex in _legacy_load() in serialization.py? That would get really ugly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you somehow put it inside _legacy_load and pass it around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot think of a way to do that.

@Baranowski
Copy link
Contributor Author

@gchanan No rush, just a friendly periodic ping to make sure you are still aware. I will send another ping in a week or so, unless instructed otherwise.

@Baranowski
Copy link
Contributor Author

@gchanan, this is another friendly low-pri ping for a review

@Baranowski
Copy link
Contributor Author

@gchanan, another friendly ping for a review but I'm in no rush. Next ping coming in about a month or so.

@Baranowski
Copy link
Contributor Author

@gchanan, another friendly ping for a review. I won't send any more of those.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to stick my neck out and approve this. Can we get a merge to master?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@Baranowski Baranowski force-pushed the wbaranowski-sparse_load-33439 branch from c2af0b8 to 0f8f570 Compare June 23, 2020 05:57
Fixes pytorch#33439

This introduces torch._sparse_coo_tensor_unsafe(...) and
torch._validate_sparse_coo_tensor_args(...)
@Baranowski Baranowski force-pushed the wbaranowski-sparse_load-33439 branch from 0f8f570 to 5b84ca0 Compare June 24, 2020 14:18
@Baranowski
Copy link
Contributor Author

@ezyang Done. Sorry for the delay. I was chasing a spurious CI failure that looked like it was my fault.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in fcadca1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sparse tensor persistency - error when loading

6 participants