TensorIterator does not work with different input/output types

## 🐛 Bug

TensorIterator expects all the inputs and outputs to have the same type. This prevents us from using TensorIterator for operations like quantized batchnorm, where the input is quantized (quint8) but the alpha (scale) and beta (shift) values are in float.

## To Reproduce

Steps to reproduce the behavior:

1. Create a TensorIterator op that has different input/output dtypes 
2. build pytorch

Example - 
```
AT_DISPATCH_QINT_TYPES(input.scalar_type(), "qbatch_norm", [&]() {
      using Vec = Vec256<quint8>;
      cpu_kernel_vec(
        iter,
        [&] (uint8_t in, float a, float b) -> quint8 {
          long quantized_down = out_zero_point +
              std::lrintf(a * (in - in_zero_point) + b);
          if (ReluFused) { // static if
            quantized_down = std::max<long>(quantized_down, out_zero_point);
          }
          return quint8(std::min<long>(
              std::max<long>(quantized_down, std::numeric_limits<uint8_t>::min()),
              std::numeric_limits<uint8_t>::max()));

        },
        [&] (Vec in, Vec256<float> a, Vec256<float> b) -> Vec {
          ...
        });
    });
```
You should see compile error of the type - 


```
In file included from aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.AVX2.cpp:5:
../aten/src/ATen/native/cpu/Loops.h:70:10: error: no viable conversion from returned value of type 'tuple<[...], Vec256<c10::quint8>, Vec256<c10::quint8>>' to function return type 'tuple<[...], Vec256<float>, Vec256<float>>'
  return std::make_tuple(
         ^~~~~~~~~~~~~~~~
../aten/src/ATen/native/cpu/Loops.h:80:10: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::dereference_vec_impl<function_traits<(lambda at aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.AVX2.cpp:992:5)>, 0, 1, 2>' requested here
  return dereference_vec_impl<traits>(data, opt_scalar, S, i, Indices{});
         ^
../aten/src/ATen/native/cpu/Loops.h:149:18: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::dereference_vec<function_traits<(lambda at aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.AVX2.cpp:992:5)> >' requested here
    auto args1 = dereference_vec<traits>(&data[1], opt_scalar, S, i);
                 ^
../aten/src/ATen/native/cpu/Loops.h:211:14: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::vectorized_loop<(lambda at aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.AVX2.cpp:992:5), (lambda at aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.AVX2.cpp:992:5)>' requested here
      return vectorized_loop(data, n, 0, std::forward<func_t>(op), std::forward<vec_func_t>(vop));
             ^
```


## Expected behavior

Allow different types for input and output tensors. 

Specifically, don't restrict the type to be dependent on the return type - https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/Loops.h#L137


cc @jamesr66a, @raghuramank100 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TensorIterator does not work with different input/output types #33166

🐛 Bug

To Reproduce

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TensorIterator does not work with different input/output types #33166

Description

🐛 Bug

To Reproduce

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions