Skip to content

Conversation

@XuehaiPan
Copy link
Collaborator

@XuehaiPan XuehaiPan commented Mar 29, 2023

Revisit torch._six.string_classes (which is (str, bytes)) removal: isinstance(obj, string_classes) -> isinstance(obj, str).

Both str and bytes are Sequence classes.

In [1]: from typing import Sequence

In [2]: issubclass(bytes, Sequence)
Out[2]: True

In [3]: issubclass(str, Sequence)
Out[3]: True

Re-add bytes to type guards like:

def is_seq(obj):
    return isinstance(obj, Sequence) and not isinstance(obj, (str, bytes))

Ref:

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 29, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/97863

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 042ed02:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: onnx torch.onnx related changes that should show up in the release notes label Mar 29, 2023
Co-authored-by: Aaron Gokaslan <skylion.aaron@gmail.com>
@XuehaiPan
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 29, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@XuehaiPan
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not covering the change to:

  • torch/__init__,py
  • distributed_c10d.py
  • rendezvous.py
  • _serialization.py
  • serialization.py
  • module.py
  • common_utils.py

Any reason why not?

@XuehaiPan
Copy link
Collaborator Author

XuehaiPan commented Mar 29, 2023

@albanD I have checked every line change in the original PR. I think bytes are hardly ever used in these cases. Users are using str instead.

__import__("some_module.some_member")
net.register_buffer("buf", buf)

rather than

__import__(b"some_module.some_member")
net.register_buffer(b"buf", buf)

@XuehaiPan XuehaiPan closed this Mar 29, 2023
@XuehaiPan
Copy link
Collaborator Author

Closed by mistake.

@XuehaiPan XuehaiPan requested review from albanD and removed request for H-Huang, awgu, ejguan, mrshenli and zhaojuanmao March 30, 2023 07:31
@albanD
Copy link
Collaborator

albanD commented Mar 30, 2023

Thanks for the detailed study!
Some high level comments:

  • "url are mostly passed as str." this is not a justification within PT I'm afraid! We have enough users and dependencies that we cannot assume that people won't do bad stuff! Maybe someone had to re-encode their url and ended up with bytes instead of a str without knowing it.
  • "If the argument map_location is a file path, it should be a str rather than bytes." this kind of things are based on type hints. But they are not really enforced. And more generally, our type hints are lacking/inacurate enough that I don't think our users expect them to be really accurate. So we shouldn't use type hints as a justification.
  • "In docstrings, the arguments should be file-like or string." Well, given six, it looks like "string" is interpreted as both "str" and "bytes".
  • I definitely agree with you that no one should be passing bytes to these functions. But there are so many uses of these APIs that, in my experience, if you can do something weird, someone will always rely on it: https://xkcd.com/1172/

The ones that raise errors when passed bytes are fine to leave out, but I think we should revert the others that are user facing.

  • rendezvous.py should be changed back. urllib.parse.urlparse used there supports bytes properly.
  • serialization.py there are too many callsite for me to easily figure out if this can be ok to be bytes or not. I would revert.

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DataLoader part makes sense to me. Might want to add this to 2.0.1 milestone?

@albanD albanD added this to the 2.0.1 milestone Mar 30, 2023
@albanD albanD added release notes: python_frontend python frontend release notes category topic: bug fixes topic category and removed release notes: onnx torch.onnx related changes that should show up in the release notes labels Mar 30, 2023
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@albanD
Copy link
Collaborator

albanD commented Mar 30, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

def rendezvous(url: str, rank: int = -1, world_size: int = -1, **kwargs):
if not isinstance(url, str):
if not isinstance(url, (str, bytes)):
raise RuntimeError("`url` must be a string. {}: {}".format(type(url), url))
Copy link
Collaborator

@Skylion007 Skylion007 Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature of rendezvous is now wrong (url should be a Union[str, bytes] right?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, can you open an issue so that we can ask the distributed team what they want to do about that?

@XuehaiPan XuehaiPan deleted the revisit-string-classes branch March 31, 2023 01:54
XuehaiPan added a commit to XuehaiPan/pytorch that referenced this pull request Mar 31, 2023
…97863)

Revisit `torch._six.string_classes` (which is `(str, bytes)`) removal: `isinstance(obj, string_classes) -> isinstance(obj, str)`.

Both `str` and `bytes` are `Sequence` classes.

```python
In [1]: from typing import Sequence

In [2]: issubclass(bytes, Sequence)
Out[2]: True

In [3]: issubclass(str, Sequence)
Out[3]: True
```

Re-add `bytes` to type guards like:

```python
def is_seq(obj):
    return isinstance(obj, Sequence) and not isinstance(obj, (str, bytes))
```

Ref:

- pytorch#94709 (comment)
- pytorch#97737
- pytorch#97789
Pull Request resolved: pytorch#97863
Approved by: https://github.com/Skylion007, https://github.com/albanD
atalman pushed a commit that referenced this pull request Apr 5, 2023
…97789, #97863) (#98055)

* [DataLoader] Short circuit pin_memory recursion when operating on bytes (#97737)

Slack thread: https://pytorch.slack.com/archives/GEEQ2K4MD/p1679962409906099

I was seeing some massive (~2x) slowdowns on a job after running it on PyTorch 2.0. From some profiling in `py-spy` it looked like the pin_memory thread was doing a lot more work than before. Looking at a trace in `nsys` I saw the thread doing the forward pass having a bunch of `pthread_cond_timedwait` with GIL reacquire calls in it’s call stack, and it seemed like the thread doing the forward pass was getting blocked (waiting for the GIL) by the pin memory thread (which was holding the GIL).

After some debugging I found out the issue. If a `bytes` was passed into `pin_memory`, previously in 1.13 (before #94709) it would short-circuit and return here
https://github.com/pytorch/pytorch/blob/d922c29a22e4bf0fba49526f7536395eb8cd66f4/torch/utils/data/_utils/pin_memory.py#L54-L55
since `bytes` was in `torch._six.string_classes`:
```
>>> from torch._six import string_classes
>>> string_classes
(<class 'str'>, <class 'bytes'>)
>>>
```

However after #94709, if a `bytes` was passed into `pin_memory` it would fall into here instead
https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L68-L73
because the previous check is now doing `isinstance(data, str)` instead of `isinstance(data, (str, bytes))`!
https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L56-L57

As a result, `pin_memory` gets called recursively for each element in the `bytes` leading to a ton of wasted recursion. This also explains the slowdown / GIL contention I was seeing.

This PR simply changes `isinstance(data, str)` to `isinstance(data, (str, bytes))` to match the behavior before #94709

Pull Request resolved: #97737
Approved by: https://github.com/albanD, https://github.com/NivekT

* [DataLoader] Fix  collation logic (#97789)

Similar to #97737, a previous auto-refactor changed how `bytes` are handled during collation, which can potentially lead to performance regression. This PR undoes that.
Pull Request resolved: #97789
Approved by: https://github.com/albanD

* Revisit `torch._six.string_classes` removal (#94709) (#97863)

Revisit `torch._six.string_classes` (which is `(str, bytes)`) removal: `isinstance(obj, string_classes) -> isinstance(obj, str)`.

Both `str` and `bytes` are `Sequence` classes.

```python
In [1]: from typing import Sequence

In [2]: issubclass(bytes, Sequence)
Out[2]: True

In [3]: issubclass(str, Sequence)
Out[3]: True
```

Re-add `bytes` to type guards like:

```python
def is_seq(obj):
    return isinstance(obj, Sequence) and not isinstance(obj, (str, bytes))
```

Ref:

- #94709 (comment)
- #97737
- #97789
Pull Request resolved: #97863
Approved by: https://github.com/Skylion007, https://github.com/albanD

---------

Co-authored-by: Eric Zhang <ezhang887@gmail.com>
Co-authored-by: Kevin Tse <ktse@fb.com>
ghabault added a commit to ghabault/ddim that referenced this pull request Jul 26, 2024
According to pytorch/pytorch#97863 torch._six has been removed. I propose the following modification to avoid the error "module 'torch' has no attribute '_six'".
This solution is also suggested in other projects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: python_frontend python frontend release notes category topic: bug fixes topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants