Skip to content

TensorIterator does not work with different input/output types #33166

@supriyar

Description

@supriyar

🐛 Bug

TensorIterator expects all the inputs and outputs to have the same type. This prevents us from using TensorIterator for operations like quantized batchnorm, where the input is quantized (quint8) but the alpha (scale) and beta (shift) values are in float.

To Reproduce

Steps to reproduce the behavior:

  1. Create a TensorIterator op that has different input/output dtypes
  2. build pytorch

Example -

AT_DISPATCH_QINT_TYPES(input.scalar_type(), "qbatch_norm", [&]() {
      using Vec = Vec256<quint8>;
      cpu_kernel_vec(
        iter,
        [&] (uint8_t in, float a, float b) -> quint8 {
          long quantized_down = out_zero_point +
              std::lrintf(a * (in - in_zero_point) + b);
          if (ReluFused) { // static if
            quantized_down = std::max<long>(quantized_down, out_zero_point);
          }
          return quint8(std::min<long>(
              std::max<long>(quantized_down, std::numeric_limits<uint8_t>::min()),
              std::numeric_limits<uint8_t>::max()));

        },
        [&] (Vec in, Vec256<float> a, Vec256<float> b) -> Vec {
          ...
        });
    });

You should see compile error of the type -

In file included from aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.AVX2.cpp:5:
../aten/src/ATen/native/cpu/Loops.h:70:10: error: no viable conversion from returned value of type 'tuple<[...], Vec256<c10::quint8>, Vec256<c10::quint8>>' to function return type 'tuple<[...], Vec256<float>, Vec256<float>>'
  return std::make_tuple(
         ^~~~~~~~~~~~~~~~
../aten/src/ATen/native/cpu/Loops.h:80:10: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::dereference_vec_impl<function_traits<(lambda at aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.AVX2.cpp:992:5)>, 0, 1, 2>' requested here
  return dereference_vec_impl<traits>(data, opt_scalar, S, i, Indices{});
         ^
../aten/src/ATen/native/cpu/Loops.h:149:18: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::dereference_vec<function_traits<(lambda at aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.AVX2.cpp:992:5)> >' requested here
    auto args1 = dereference_vec<traits>(&data[1], opt_scalar, S, i);
                 ^
../aten/src/ATen/native/cpu/Loops.h:211:14: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::vectorized_loop<(lambda at aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.AVX2.cpp:992:5), (lambda at aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.AVX2.cpp:992:5)>' requested here
      return vectorized_loop(data, n, 0, std::forward<func_t>(op), std::forward<vec_func_t>(vop));
             ^

Expected behavior

Allow different types for input and output tensors.

Specifically, don't restrict the type to be dependent on the return type - https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/Loops.h#L137

cc @jamesr66a, @raghuramank100

Metadata

Metadata

Labels

enhancementNot as big of a feature, but technically not a bug. Should be easy to fixmodule: TensorIteratormodule: vectorizationRelated to SIMD vectorization, e.g., Vec256triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions