Skip to content

Conversation

@z-a-f
Copy link

@z-a-f z-a-f commented Jun 22, 2020

Stack from ghstack:

Differential Revision: D22158979

[ghstack-poisoned]
@dr-ci
Copy link

dr-ci bot commented Jun 22, 2020

💊 CI failures summary and remediations

As of commit 22e676a (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_backward_compatibility_check_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 13 02:30:30 The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not.
Sep 13 02:30:30 processing existing schema:  add_constant(__torch__.torch.classes._TorchScriptTesting._ElementwiseInterpreter _0, str _1, Tensor _2) -> (None _0) 
Sep 13 02:30:30 processing existing schema:  set_input_names(__torch__.torch.classes._TorchScriptTesting._ElementwiseInterpreter _0, str[] _1) -> (None _0) 
Sep 13 02:30:30 processing existing schema:  set_output_name(__torch__.torch.classes._TorchScriptTesting._ElementwiseInterpreter _0, str _1) -> (None _0) 
Sep 13 02:30:30 processing existing schema:  __call__(__torch__.torch.classes._TorchScriptTesting._ElementwiseInterpreter _0, Tensor[] _1) -> (Tensor _0) 
Sep 13 02:30:30 processing existing schema:  __getstate__(__torch__.torch.classes._TorchScriptTesting._ElementwiseInterpreter _0) -> ((str[], str?, Dict(str, Tensor), (str, str[], str)[]) _0) 
Sep 13 02:30:30 processing existing schema:  __setstate__(__torch__.torch.classes._TorchScriptTesting._ElementwiseInterpreter _0, (str[], str?, Dict(str, Tensor), (str, str[], str)[]) _1) -> (None _0) 
Sep 13 02:30:30 processing existing schema:  get(__torch__.torch.classes._TorchScriptTesting._LiteInterpreterTest _0, Tensor _1) -> (str _0) 
Sep 13 02:30:30 processing existing schema:  __getstate__(__torch__.torch.classes._TorchScriptTesting._LiteInterpreterTest _0) -> (int _0) 
Sep 13 02:30:30 processing existing schema:  __setstate__(__torch__.torch.classes._TorchScriptTesting._LiteInterpreterTest _0, int _1) -> (None _0) 
Sep 13 02:30:30 processing existing schema:  __init__(__torch__.torch.classes.dist_rpc.WorkerInfo _0, str _1, int _2) -> (None _0) 
Sep 13 02:30:30 The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not.  
Sep 13 02:30:30  
Sep 13 02:30:30 Broken ops: [ 
m2=0.5, float? n2=0, float? m3=0.5, float? start_warmup_multiplier=0.10000000000000001, int? constant_warmup_num_iter=10000000, int? linear_warmup_num_iter=10000000, float? cyclical_max_lr=0.050000000000000003, int? cyclical_step_size=1000000, float? cyclical_decay=0.999, float? cosine_min_lr=0.01, float? cosine_max_lr=0.050000000000000003, int? cosine_period=50, float? cosine_t_mult=1., float? cosine_lr_shrink=0.98999999999999999, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output) 
Sep 13 02:30:30 ] 
Sep 13 02:30:30 + cleanup 
Sep 13 02:30:30 + retcode=1 
Sep 13 02:30:30 + set +x 
Sep 13 02:30:30 =================== sccache compilation log =================== 
Sep 13 02:30:30 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Sep 13 02:30:30 Compile requests                 0 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 245 times.

@z-a-f z-a-f changed the title [quant] conv_transpose2d [quant] conv_transpose1d / conv_transpose2d Jun 22, 2020
@z-a-f z-a-f requested a review from jerryzh168 June 22, 2020 08:59
z-a-f pushed a commit that referenced this pull request Jun 22, 2020
ghstack-source-id: b46e3f3
Pull Request resolved: #40370
@z-a-f z-a-f requested a review from jamesr66a June 22, 2020 08:59
@z-a-f
Copy link
Author

z-a-f commented Jun 22, 2020

UPDATE: Resolved!

There is a weird bug, that I was tracking the whole weekend -- would appreciate if someone could tell me what's wrong.

Here is repro code

import numpy as np
import torch
from torch import nn

torch.backends.quantized.engine = 'qnnpack'

B = 2
L = 24

iC = 1
oC = 2
kL = 1

strides = (1,)
pads = (0,)
o_pads = (0,)
dilations = (1,)
groups = 1

X = torch.randn(B, iC, L).to(torch.float)
W = torch.randn(iC, oC // groups, kL)

X_s = 1.2
X_zp = 0
X_dtype = torch.quint8

W_s = 0.2
W_zp = 0
W_dtype = torch.qint8

b = None

y_s = 4.2
y_zp = 0

#################

X_q = torch.quantize_per_tensor(X, X_s, X_zp, X_dtype)
W_q = torch.quantize_per_tensor(W, W_s, W_zp, W_dtype)

#################

kernel_size = kL,

conv_ref = nn.ConvTranspose1d(iC, oC, kernel_size, stride=strides, padding=pads,
                              output_padding=o_pads, groups=groups,
                              bias=(b is not None), dilation=dilations)

conv_ref.weight = nn.Parameter(W)
if b is not None:
  conv_ref.bias = nn.Parameter(b)

y_ref = conv_ref(X)
y_ref_dq = torch.quantize_per_tensor(y_ref, y_s, y_zp, X_dtype).int_repr()

#################
W_prepack = torch.ops.quantized.conv_transpose1d_prepack(W_q, b, strides, pads,
                                                         o_pads, dilations,
                                                         groups)
W_unpacked, bias = torch.ops.quantized.conv_transpose1d_unpack(W_prepack)

print(W_q.dequantize() - W_unpacked.dequantize())


#################

y_q = torch.ops.quantized.conv_transpose1d(X_q, W_prepack, y_s, y_zp).int_repr()

#################

print(y_ref_dq.to(torch.float) - y_q.to(torch.float))

print(
  f'''
    iC: {W.shape[0]}
    oC: {W.shape[1] * groups}
    k: {W.shape[2:]}
  '''
)

result for the difference is expected to have maximum 1-off (actually >1-off due to deconv), but it looks like the only the first iC channels in the output are correct. That means that if oC > iC, the rest of the channels are off :'(

Output:

tensor([[[0.],
         [0.]]])
tensor([[[  0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
            0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
            0.,   0.],
         [  0., -36.,   0.,   0.,   0., -36., -36.,   0., -36.,   0.,   0.,
            0.,   0.,   0., -36.,   0., -73., -73.,   0.,   0., -36., -36.,
            0.,   0.]],

        [[  0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
            0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
            0.,   0.],
         [  0., -36.,   0.,   0.,   0., -36.,   0.,   0.,   0.,   0.,   0.,
            0.,   0., -36.,   0.,   0., -36.,   0., -36.,   0.,   0.,   0.,
          -36.,   0.]]])

    iC: 1
    oC: 2
    k: torch.Size([1])

cc @jerryzh168 @jamesr66a @kimishpatel @supriyar

@kimishpatel
Copy link
Contributor

@z-a-f, I can't say without knowing the details of the op. I am assuming this does not fail with fbgemm backend.
One thing I can suggest is: Can you print raw y_q result and see what is there in the second channel? I am wondering if second channel is not populated at all. Additionally in your implementation you can perhaps zero out the output tensor to confirm if the second is not populated at all vs populated with wrong data.

Zafar added 10 commits June 23, 2020 14:53
Copy link
Contributor

@vkuzo vkuzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to add a numerical correctness test in this PR?

Copy link
Contributor

@vkuzo vkuzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. Should we add a numerical correctness test in this PR?

@z-a-f
Copy link
Author

z-a-f commented Sep 2, 2020

looks good. Should we add a numerical correctness test in this PR?

Yes -- I had the tests, they accidentally were submitted as part of a different PR -- I will migrate them here

@z-a-f z-a-f requested a review from vkuzo September 3, 2020 00:22
Copy link
Contributor

@vkuzo vkuzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg as long as CI passes. Can we include the context and the test plan in the PR summary before landing, since this is non-trivial?

@facebook-github-bot
Copy link
Contributor

@z-a-f merged this pull request in 9d4943d.

@facebook-github-bot facebook-github-bot deleted the gh/z-a-f/41/head branch September 18, 2020 14:17
xuzhao9 pushed a commit that referenced this pull request Sep 18, 2020
Summary: Pull Request resolved: #40370

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D22158979

Pulled By: z-a-f

fbshipit-source-id: f5cb812c9953efa7608f06cf0188de447f73f358
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants