Skip to content

Conversation

@thuyen
Copy link
Contributor

@thuyen thuyen commented May 27, 2018

Fix #7882

The random module is seeded here. The problem with that line is that seed is not an int but a Tensor. Python then will use hash(seed) to seed the random module (docs). Without a hash function for Tensor, the actually seed will be the address of the tensor. And it changes every time!

The line should be changed to: random.seed(int(seed))

Test on fllowing script from #7882

import torch
import random

from torch.utils.data import Dataset, DataLoader

class Data(Dataset):
    def __len__(self):
        return 10000
    def __getitem__(self, index):
        print(index, torch.rand(2, 2).sum().item(), random.uniform(0, 1))
        return 1

seed = 2018
torch.manual_seed(seed)
loader = DataLoader(Data(), num_workers=4, shuffle=True)

for x in loader:
    print('-'*10)
    break

Before

First run

4717 2.202341079711914 0.9952153654478976
4607 2.3166141510009766 0.6813692345925851
4194 1.9806793928146362 0.6281118075687344
2595 2.95841383934021 0.8414756141240453
4691 0.9809015393257141 0.7622458327788627
9868 2.521920680999756 0.5253262288522356
7367 2.333574056625366 0.35079311205192487
9490 3.02830171585083 0.16235006783937567
----------
6759 3.1252167224884033 0.4424384676992986

Next run

4607 2.3166141510009766 0.15198273935290807
4194 1.9806793928146362 0.36414129463658884
4691 0.9809015393257141 0.027569260048619926
4717 2.202341079711914 0.5512619092026773
7367 2.333574056625366 0.7932627754589792
9490 3.02830171585083 0.19395324967791994
9868 2.521920680999756 0.5497794735158222
2595 2.95841383934021 0.782779934368899
----------
6759 3.1252167224884033 0.7098308465010348

After

First run

4194 1.9806793928146362 0.28176797222610817
4717 2.202341079711914 0.9839100412778289
4607 2.3166141510009766 0.6780782905745018
4691 0.9809015393257141 0.3976280065444453
9868 2.521920680999756 0.7470951277406424
2595 2.95841383934021 0.3961659556772974
7367 2.333574056625366 0.021153138172570696
9490 3.02830171585083 0.07198172828873439
----------
6759 3.1252167224884033 0.5803779056735119

Next run

4717 2.202341079711914 0.9839100412778289
4194 1.9806793928146362 0.28176797222610817
4607 2.3166141510009766 0.6780782905745018
2595 2.95841383934021 0.3961659556772974
4691 0.9809015393257141 0.3976280065444453
9868 2.521920680999756 0.7470951277406424
7367 2.333574056625366 0.021153138172570696
9490 3.02830171585083 0.07198172828873439
----------
6759 3.1252167224884033 0.5803779056735119

@ssnl
Copy link
Collaborator

ssnl commented May 27, 2018

Good find, but can we instead change the code on line 249?

self.sample_iter = iter(self.batch_sampler)

base_seed = torch.LongTensor(1).random_()[0]
base_seed = int(torch.LongTensor(1).random_()[0])

This comment was marked as off-topic.

@ssnl
Copy link
Collaborator

ssnl commented May 28, 2018

@pytorchbot retest this please

@apaszke
Copy link
Contributor

apaszke commented May 28, 2018

Can we also add a test for this?

Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_distributed seems broken on master.

@ssnl
Copy link
Collaborator

ssnl commented May 29, 2018

@pytorchbot retest this please

1 similar comment
@ssnl
Copy link
Collaborator

ssnl commented May 29, 2018

@pytorchbot retest this please

@soumith soumith merged commit 146b951 into pytorch:master May 29, 2018
weiyangfb pushed a commit to weiyangfb/pytorch that referenced this pull request Jun 11, 2018
* fix seeding random module

* make base seed int

* follow 0.4 idiom

* add a test for random seeding
@isalirezag
Copy link

is this bug fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Pytorch] DataLoader and python random module

6 participants