Skip to content

On Lack of Default __len__ in Python Abstract Base Classes #122410

@kuraga

Description

@kuraga

Thoughts about # NOTE [ Lack of Default __len__ in Python Abstract Base Classes ].

TL;DR:

  1. We don't have the __len__ ("abstract") methods but seems like they could be (via a TypeError or an abstract method implementations).
  2. We don't have the __len__ docstrings but they could be (without the method itself).
  3. We don't point in the documentation datasets have the size. (Or they don't?)

We have https://github.com/pytorch/pytorch/blob/v2.2.1/torch/utils/data/sampler.py#L70-L95 (introduced in #19228):

# NOTE [ Lack of Default `__len__` in Python Abstract Base Classes ]
#
# Many times we have an abstract class representing a collection/iterable of
# data, e.g., `torch.utils.data.Sampler`, with its subclasses optionally
# implementing a `__len__` method. In such cases, we must make sure to not
# provide a default implementation, because both straightforward default
# implementations have their issues:
#
#   + `return NotImplemented`:
#     Calling `len(subclass_instance)` raises:
#       TypeError: 'NotImplementedType' object cannot be interpreted as an integer

But what should return len(subclass_instance) without an implemented length value?
No matter, on other hand, NotImplemented is for binary operators only.
Let's remove this part of the comment?

#
#   + `raise NotImplementedError()`:
#     This prevents triggering some fallback behavior. E.g., the built-in
#     `list(X)` tries to call `len(X)` first, and executes a different code
#     path if the method is not found or `NotImplemented` is returned, while
#     raising a `NotImplementedError` will propagate and make the call fail
#     where it could have used `__iter__` to complete the call.

This comment says we should write list(iter(X)) instead of list(X). (See the list cconstructor documentation.)

#
# Thus, the only two sensible things to do are
#
#   + **not** provide a default `__len__`.
#
#   + raise a `TypeError` instead, which is what Python uses when users call
#     a method that is not defined on an object.
#     (@ssnl verifies that this works on at least Python 3.7.)

It is a good idea, isn't it? This comment says it is the correct one.

More options:

This comment suggest to make an abstract method.

Are we able to add the documentation stub (without the method itself)?

Thanks!

cc @svekars @brycebortree @VitalyFedyunin @ejguan @dzhulgakov @ssnl

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: datatorch.utils.datamodule: docsRelated to our documentation, both in docs/ and docblockstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions