Skip to content

[feature request] Stratified splits in random_split function #5231

@rasbt

Description

@rasbt

It's probably less of an issue on "biggish" datasets, but a param for specifying stratification in random_split (which was merged here: #4435) would be nice. If the targets are class labels this could be done based on the integer labels. For regression problems, there are some interesting ideas posted here: http://scottclowe.com/2016-03-19-stratified-regression-partitions/

If this is of interest, we could discuss how to implement that efficiently, I am happy to help.

cc @ssnl @VitalyFedyunin

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: dataloaderRelated to torch.utils.data.DataLoader and SamplertriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions