Revisit `torch._six.string_classes` removal (#94709) #97863

XuehaiPan · 2023-03-29T06:26:32Z

Revisit torch._six.string_classes (which is (str, bytes)) removal: isinstance(obj, string_classes) -> isinstance(obj, str).

Both str and bytes are Sequence classes.

In [1]: from typing import Sequence

In [2]: issubclass(bytes, Sequence)
Out[2]: True

In [3]: issubclass(str, Sequence)
Out[3]: True

Re-add bytes to type guards like:

def is_seq(obj):
    return isinstance(obj, Sequence) and not isinstance(obj, (str, bytes))

Ref:

pytorch-bot · 2023-03-29T06:26:37Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/97863

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 042ed02:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/onnx/_internal/jit_utils.py

Co-authored-by: Aaron Gokaslan <skylion.aaron@gmail.com>

XuehaiPan · 2023-03-29T13:26:03Z

@pytorchbot merge

pytorchmergebot · 2023-03-29T13:33:09Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-03-29T13:48:24Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Lint / lintrunner / linux-job

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

XuehaiPan · 2023-03-29T14:08:54Z

@pytorchbot merge

pytorchmergebot · 2023-03-29T14:12:39Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

albanD

This is not covering the change to:

torch/__init__,py
distributed_c10d.py
rendezvous.py
_serialization.py
serialization.py
module.py
common_utils.py

Any reason why not?

XuehaiPan · 2023-03-29T16:21:13Z

@albanD I have checked every line change in the original PR. I think bytes are hardly ever used in these cases. Users are using str instead.

__import__("some_module.some_member")
net.register_buffer("buf", buf)

rather than

__import__(b"some_module.some_member")
net.register_buffer(b"buf", buf)

XuehaiPan · 2023-03-29T16:21:55Z

Closed by mistake.

albanD · 2023-03-30T14:06:41Z

Thanks for the detailed study!
Some high level comments:

"url are mostly passed as str." this is not a justification within PT I'm afraid! We have enough users and dependencies that we cannot assume that people won't do bad stuff! Maybe someone had to re-encode their url and ended up with bytes instead of a str without knowing it.
"If the argument map_location is a file path, it should be a str rather than bytes." this kind of things are based on type hints. But they are not really enforced. And more generally, our type hints are lacking/inacurate enough that I don't think our users expect them to be really accurate. So we shouldn't use type hints as a justification.
"In docstrings, the arguments should be file-like or string." Well, given six, it looks like "string" is interpreted as both "str" and "bytes".
I definitely agree with you that no one should be passing bytes to these functions. But there are so many uses of these APIs that, in my experience, if you can do something weird, someone will always rely on it: https://xkcd.com/1172/

The ones that raise errors when passed bytes are fine to leave out, but I think we should revert the others that are user facing.

rendezvous.py should be changed back. urllib.parse.urlparse used there supports bytes properly.
serialization.py there are too many callsite for me to easily figure out if this can be ok to be bytes or not. I would revert.

NivekT

The DataLoader part makes sense to me. Might want to add this to 2.0.1 milestone?

albanD

Thanks!

albanD · 2023-03-30T14:23:31Z

@pytorchbot merge

pytorchmergebot · 2023-03-30T14:25:38Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Skylion007 · 2023-03-30T15:28:43Z

torch/distributed/rendezvous.py

 def rendezvous(url: str, rank: int = -1, world_size: int = -1, **kwargs):
-    if not isinstance(url, str):
+    if not isinstance(url, (str, bytes)):
        raise RuntimeError("`url` must be a string. {}: {}".format(type(url), url))


The function signature of rendezvous is now wrong (url should be a Union[str, bytes] right?)

Yes, can you open an issue so that we can ask the distributed team what they want to do about that?

…97863) Revisit `torch._six.string_classes` (which is `(str, bytes)`) removal: `isinstance(obj, string_classes) -> isinstance(obj, str)`. Both `str` and `bytes` are `Sequence` classes. ```python In [1]: from typing import Sequence In [2]: issubclass(bytes, Sequence) Out[2]: True In [3]: issubclass(str, Sequence) Out[3]: True ``` Re-add `bytes` to type guards like: ```python def is_seq(obj): return isinstance(obj, Sequence) and not isinstance(obj, (str, bytes)) ``` Ref: - pytorch#94709 (comment) - pytorch#97737 - pytorch#97789 Pull Request resolved: pytorch#97863 Approved by: https://github.com/Skylion007, https://github.com/albanD

…97789, #97863) (#98055) * [DataLoader] Short circuit pin_memory recursion when operating on bytes (#97737) Slack thread: https://pytorch.slack.com/archives/GEEQ2K4MD/p1679962409906099 I was seeing some massive (~2x) slowdowns on a job after running it on PyTorch 2.0. From some profiling in `py-spy` it looked like the pin_memory thread was doing a lot more work than before. Looking at a trace in `nsys` I saw the thread doing the forward pass having a bunch of `pthread_cond_timedwait` with GIL reacquire calls in it’s call stack, and it seemed like the thread doing the forward pass was getting blocked (waiting for the GIL) by the pin memory thread (which was holding the GIL). After some debugging I found out the issue. If a `bytes` was passed into `pin_memory`, previously in 1.13 (before #94709) it would short-circuit and return here https://github.com/pytorch/pytorch/blob/d922c29a22e4bf0fba49526f7536395eb8cd66f4/torch/utils/data/_utils/pin_memory.py#L54-L55 since `bytes` was in `torch._six.string_classes`: ``` >>> from torch._six import string_classes >>> string_classes (<class 'str'>, <class 'bytes'>) >>> ``` However after #94709, if a `bytes` was passed into `pin_memory` it would fall into here instead https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L68-L73 because the previous check is now doing `isinstance(data, str)` instead of `isinstance(data, (str, bytes))`! https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L56-L57 As a result, `pin_memory` gets called recursively for each element in the `bytes` leading to a ton of wasted recursion. This also explains the slowdown / GIL contention I was seeing. This PR simply changes `isinstance(data, str)` to `isinstance(data, (str, bytes))` to match the behavior before #94709 Pull Request resolved: #97737 Approved by: https://github.com/albanD, https://github.com/NivekT * [DataLoader] Fix collation logic (#97789) Similar to #97737, a previous auto-refactor changed how `bytes` are handled during collation, which can potentially lead to performance regression. This PR undoes that. Pull Request resolved: #97789 Approved by: https://github.com/albanD * Revisit `torch._six.string_classes` removal (#94709) (#97863) Revisit `torch._six.string_classes` (which is `(str, bytes)`) removal: `isinstance(obj, string_classes) -> isinstance(obj, str)`. Both `str` and `bytes` are `Sequence` classes. ```python In [1]: from typing import Sequence In [2]: issubclass(bytes, Sequence) Out[2]: True In [3]: issubclass(str, Sequence) Out[3]: True ``` Re-add `bytes` to type guards like: ```python def is_seq(obj): return isinstance(obj, Sequence) and not isinstance(obj, (str, bytes)) ``` Ref: - #94709 (comment) - #97737 - #97789 Pull Request resolved: #97863 Approved by: https://github.com/Skylion007, https://github.com/albanD --------- Co-authored-by: Eric Zhang <ezhang887@gmail.com> Co-authored-by: Kevin Tse <ktse@fb.com>

According to pytorch/pytorch#97863 torch._six has been removed. I propose the following modification to avoid the error "module 'torch' has no attribute '_six'". This solution is also suggested in other projects.

Revisit torch._six.string_classes removal (pytorch#94709)

65ad82a

XuehaiPan requested review from BowenBao, H-Huang, abock, awgu, fegin, kwen2501, mrshenli, rohan-varma, wanchaol and zhaojuanmao as code owners March 29, 2023 06:26

pytorch-bot bot added the release notes: onnx torch.onnx related changes that should show up in the release notes label Mar 29, 2023

XuehaiPan mentioned this pull request Mar 29, 2023

[BE] Remove dependency on six and future #94709

Closed

pytorchbot added the open source label Mar 29, 2023

Skylion007 reviewed Mar 29, 2023

View reviewed changes

torch/onnx/_internal/jit_utils.py Outdated Show resolved Hide resolved

Skylion007 approved these changes Mar 29, 2023

View reviewed changes

Merge isinstance checks

4edd82d

Co-authored-by: Aaron Gokaslan <skylion.aaron@gmail.com>

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 29, 2023

XuehaiPan mentioned this pull request Mar 29, 2023

Parenthesize whole conditional expression with function calls rather than wrap the last function call psf/black#3629

Open

Reformat

f028ebe

albanD reviewed Mar 29, 2023

View reviewed changes

XuehaiPan closed this Mar 29, 2023

XuehaiPan requested review from albanD and removed request for H-Huang, awgu, ejguan, mrshenli and zhaojuanmao March 30, 2023 07:31

NivekT reviewed Mar 30, 2023

View reviewed changes

albanD added this to the 2.0.1 milestone Mar 30, 2023

XuehaiPan added 2 commits March 30, 2023 14:17

Accept bytes as url in torch/distributed/rendezvous.py

1b049eb

Accept bytes as location in torch/serialization.py

042ed02

XuehaiPan requested review from d4l3k and kiukchung as code owners March 30, 2023 14:19

albanD added release notes: python_frontend python frontend release notes category topic: bug fixes topic category and removed release notes: onnx torch.onnx related changes that should show up in the release notes labels Mar 30, 2023

albanD approved these changes Mar 30, 2023

View reviewed changes

Skylion007 reviewed Mar 30, 2023

View reviewed changes

pytorchmergebot added the Merged label Mar 30, 2023

pytorchmergebot closed this in e688869 Mar 30, 2023

XuehaiPan deleted the revisit-string-classes branch March 31, 2023 01:54

This was referenced Mar 31, 2023

[release v2.0.1] Revisit torch._six.string_classes removal (#97737, #97789, #97863) #98055

Merged

[v2.0.1] Release Tracker #97272

Closed

ghabault mentioned this pull request Jul 26, 2024

Replace torch._six.string_classes by str ermongroup/ddim#39

Open

Revisit torch._six.string_classes removal (#94709) #97863

Revisit torch._six.string_classes removal (#94709) #97863

Uh oh!

Conversation

XuehaiPan commented Mar 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/97863

✅ No Failures

Uh oh!

Uh oh!

XuehaiPan commented Mar 29, 2023

Uh oh!

pytorchmergebot commented Mar 29, 2023

Merge started

Uh oh!

pytorchmergebot commented Mar 29, 2023

Merge failed

Uh oh!

XuehaiPan commented Mar 29, 2023

Uh oh!

pytorchmergebot commented Mar 29, 2023

Merge started

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

XuehaiPan commented Mar 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

XuehaiPan commented Mar 29, 2023

Uh oh!

albanD commented Mar 30, 2023

Uh oh!

NivekT left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD commented Mar 30, 2023

Uh oh!

pytorchmergebot commented Mar 30, 2023

Merge started

Uh oh!

Skylion007 Mar 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albanD Mar 30, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Revisit `torch._six.string_classes` removal (#94709) #97863

Revisit `torch._six.string_classes` removal (#94709) #97863

XuehaiPan commented Mar 29, 2023 •

edited

Loading

pytorch-bot bot commented Mar 29, 2023 •

edited

Loading

XuehaiPan commented Mar 29, 2023 •

edited

Loading

NivekT left a comment •

edited

Loading

Skylion007 Mar 30, 2023 •

edited

Loading