Skip to content

Conversation

@Baranowski
Copy link
Contributor

Fixes #38913

@dr-ci
Copy link

dr-ci bot commented Jun 2, 2020

💊 CI failures summary and remediations

As of commit 590810a (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 73 times.

@Baranowski Baranowski force-pushed the wbaranowski-dataset_typing-38913 branch 2 times, most recently from d119a8f to 85efdfa Compare June 8, 2020 06:22
@Baranowski Baranowski changed the title [WiP] type annotations for dataloader, dataset, sampler type annotations for dataloader, dataset, sampler Jun 10, 2020
@Baranowski Baranowski marked this pull request as ready for review June 10, 2020 05:21
@Baranowski Baranowski requested a review from apaszke as a code owner June 10, 2020 05:21
@ngimel ngimel requested a review from zou3519 June 10, 2020 19:21
@ngimel ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 10, 2020
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the note at

# NOTE [ Lack of Default `__len__` in Python Abstract Base Classes ]
#
# Many times we have an abstract class representing a collection/iterable of
# data, e.g., `torch.utils.data.Sampler`, with its subclasses optionally
# implementing a `__len__` method. In such cases, we must make sure to not
# provide a default implementation, because both straightforward default
# implementations have their issues:
#
# + `return NotImplemented`:
# Calling `len(subclass_instance)` raises:
# TypeError: 'NotImplementedType' object cannot be interpreted as an integer
#
# + `raise NotImplementedError()`:
# This prevents triggering some fallback behavior. E.g., the built-in
# `list(X)` tries to call `len(X)` first, and executes a different code
# path if the method is not found or `NotImplemented` is returned, while
# raising an `NotImplementedError` will propagate and and make the call
# fail where it could have use `__iter__` to complete the call.
#
# Thus, the only two sensible things to do are
#
# + **not** provide a default `__len__`.
#
# + raise a `TypeError` instead, which is what Python uses when users call
# a method that is not defined on an object.
# (@ssnl verifies that this works on at least Python 3.7.)
, we should not be providing a default implementation of __len__. Do we need SizedSampler for all the types to check out? Scrolling through this PR it looks like SizedSampler is only used to construct the other Samplers

@Baranowski Baranowski force-pushed the wbaranowski-dataset_typing-38913 branch from 09b3792 to be10252 Compare June 16, 2020 07:53
Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some questions/suggestions. The PR generally looks fine to me otherwise.

I'm not very familiar with how type hints for these files are tested in our CI. Is the test_typing in test_dataloader.py sufficient to check everything, or are there other tests somewhere?

@Baranowski
Copy link
Contributor Author

Thanks for the review @zou3519. Re: tests, there is also test/test_type_hints.py which runs mypy.

Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of the # type: ignores, but it doesn't look like we can do anything about it and this PR also makes the current state better than before so let's ship it

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Collaborator

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too, thanks @Baranowski.

One thing I noticed is that this is still present in mypy.ini:

[mypy-torch.utils.data.dataset]
ignore_errors = True

I suspect you've run mypy directly on the file to test your changes, in which case the ignore doesn't matter. In CI the ignore does matter though, so I'd recommend to remove it in this PR (or as a follow up) if tests test_type_hints.py passes locally.

@Baranowski Baranowski marked this pull request as draft June 18, 2020 11:58
@Baranowski
Copy link
Contributor Author

I was relying on test_type_hints.py, so I will need to fix up dataset.py as well

Copy link
Collaborator

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo the couple of minor open comments.

The PR is still draft, I assume it's not anymore - looks about ready.

@Baranowski Baranowski force-pushed the wbaranowski-dataset_typing-38913 branch from 2118617 to 678e5cf Compare June 19, 2020 07:29
@Baranowski Baranowski marked this pull request as ready for review June 19, 2020 13:07
@Baranowski
Copy link
Contributor Author

The CI failure looks unrelated so this PR should be ready now.

Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm as well

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some other internal failures that I am reading, will report back

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This triggered in one of the internal tests. index doesn't have to be an int and indeed in the documentation we have a note that we support non-integral indices/keys with custom samplers. So let's type index as Any. (Ideally we would also add a test to pytorch to type-check a custom sampler with non-integral key, but that sounds potentially annoying)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Baranowski Baranowski force-pushed the wbaranowski-dataset_typing-38913 branch from 91eb2c6 to c672dda Compare June 24, 2020 08:12
Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the last external change we need. I have some code ready to change some (previously incorrectly typed) internal code snippets that are using Dataloader / Dataset

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to list sampler: Sampler here? Some internal code attempts to access dataloader.sampler and the type appears was inferred to be Optional[Sampler]. However, after __init__, we're sure that sampler is a Sampler

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I verified that adding sampler: Sampler makes the problem go away, but I'm not sure if there was a reason why it wasn't here)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@zou3519
Copy link
Contributor

zou3519 commented Jun 25, 2020

@Baranowski, @ssnl, how do you feel about adding an abstract __len__ method to Sampler?

# NOTE [ Lack of Default `__len__` in Python Abstract Base Classes ]

The rationale behind this is that I'm sure there is a lot of user code out there that checks len(loader.sampler) and it feels wrong for that to not type-check if most of the Samplers include with PyTorch support __len__. Making Sampler inherit from typing.Sized gives Sampler a __len__ function that returns a TypeError.

@ssnl I read

# + raise a `TypeError` instead, which is what Python uses when users call
, and I think if we verify that for all versions of Python that we support, if the following gives something sensible, then it should be an acceptable solution.

class Foo(typing.Sized):
    pass
len(Foo())

@Baranowski
Copy link
Contributor Author

@zou3519 I don't have enough experience with Python to have a valuable opinion. I'm happy to do whatever you guys think is best.

@ssnl
Copy link
Collaborator

ssnl commented Jun 25, 2020

@zou3519 However the sampler doesn't necessarily have to be Sized (i.e., have a working __len__ though). Really it is just required to be an Iterable.

If we make Sampler inherit from Sized and Iterable and not implement __len__ then abc complains:

In [4]: class A(typing.Sized, typing.Iterable):
   ...:     def __iter__(self): yield from [1,2,3]
   ...:

In [5]: A()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-6234893e030b> in <module>
----> 1 A()

~/miniconda3/lib/python3.7/typing.py in __new__(cls, *args, **kwds)
    816             obj = super().__new__(cls)
    817         else:
--> 818             obj = super().__new__(cls, *args, **kwds)
    819         return obj
    820

TypeError: Can't instantiate abstract class A with abstract methods __len__

I don't know if we should be adding a __len__ that just raises TypeError to resolve this... Maybe we should?

@zou3519
Copy link
Contributor

zou3519 commented Jun 30, 2020

@ssnl thanks for the context and testing that. I'm not sure if we should raise a TypeError either. It sounds like we should keep things as is-right now (Sampler without __len__ method), and mark this PR as potentially bc-breaking (it can break type-checked user code that tries to access dataloader.sampler; the workaround would be to add a # type: ignore or to use typing.cast to cast the sampler to the expected sampler type).

@Baranowski
Copy link
Contributor Author

Baranowski commented Jul 4, 2020

@zou3519 @ssnl so as I understand, there is nothing more for me to do here? Is this ready to import and merge?

@zou3519
Copy link
Contributor

zou3519 commented Jul 6, 2020

so as I understand, there is nothing more for me to do here? Is this ready to import and merge?

Yeah that's correct. I'm working on the import and merge

@zou3519
Copy link
Contributor

zou3519 commented Jul 6, 2020

@Baranowski actually, could you please rebase this PR and resolve the merge conflict? I think it should just be accepting the changes to tools/pyi/gen_pyi.py

@Baranowski to provide some more transparency, we have some internal code that previously wasn't type checked but now is type checked as a result of adding the annotations. I've been working on correcting the annotations for that code and am done with that, so this should be good to go very soon after we resolve the merge conflict and wait for tests to pass

@Baranowski Baranowski force-pushed the wbaranowski-dataset_typing-38913 branch from 28bd2f2 to 590810a Compare July 6, 2020 19:44
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Typing Error in torch.utils.data.DataLoader and Dataset

8 participants