Sampler for IterableDataset

## 🚀 Feature

The IterableDataset is too restrictive by not allowing the combination with samplers. Sampling from a stream is well understood and possible on the fly. IterableDataset should support these use cases.

## Motivation

The IterableDataset abstraction is great for abstracting a stream of data we want to iterate over in a forward fashion. Right now it is not compatible with samplers, though. From the docs:

> Neither sampler nor batch_sampler is compatible with iterable-style datasets, since such datasets have no notion of a key or an index.

Here are two different use-cases where sampling from an IterableDataset is necessary:

1. The user knows the total size in advance

For example I have one IterableDataset per video (yielding clips), and I know the number of frames for each video and the total number of videos in advance. I can sample `k` random clips with

```
pick = set(random.sample(range(self.total), k))
mask = [i in pick for i in range(self.total)]

it = itertools.chain(*self.videos)
it = itertools.compress(it, mask)
```

and abstract over this in an IterableDataset to only walk once through all videos.

2. The user does not know the total size in advance

For example I have videos with clips but I don't want / can get the number of frames per video and therefore don't know the total size in advance. I can still sample `k` random clips out of an unknown `n` total clips e.g. via [reservoir sampling](https://en.wikipedia.org/wiki/Reservoir_sampling) and only walk once through all videos.

What are your thoughts on this?

cc @SsnL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sampler for IterableDataset #28743

🚀 Feature

Motivation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sampler for IterableDataset #28743

Description

🚀 Feature

Motivation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions