-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Open
Labels
enhancementNot as big of a feature, but technically not a bug. Should be easy to fixNot as big of a feature, but technically not a bug. Should be easy to fixmodule: TensorIteratormodule: vectorizationRelated to SIMD vectorization, e.g., Vec256Related to SIMD vectorization, e.g., Vec256triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Bug
TensorIterator expects all the inputs and outputs to have the same type. This prevents us from using TensorIterator for operations like quantized batchnorm, where the input is quantized (quint8) but the alpha (scale) and beta (shift) values are in float.
To Reproduce
Steps to reproduce the behavior:
- Create a TensorIterator op that has different input/output dtypes
- build pytorch
Example -
AT_DISPATCH_QINT_TYPES(input.scalar_type(), "qbatch_norm", [&]() {
using Vec = Vec256<quint8>;
cpu_kernel_vec(
iter,
[&] (uint8_t in, float a, float b) -> quint8 {
long quantized_down = out_zero_point +
std::lrintf(a * (in - in_zero_point) + b);
if (ReluFused) { // static if
quantized_down = std::max<long>(quantized_down, out_zero_point);
}
return quint8(std::min<long>(
std::max<long>(quantized_down, std::numeric_limits<uint8_t>::min()),
std::numeric_limits<uint8_t>::max()));
},
[&] (Vec in, Vec256<float> a, Vec256<float> b) -> Vec {
...
});
});
You should see compile error of the type -
In file included from aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.AVX2.cpp:5:
../aten/src/ATen/native/cpu/Loops.h:70:10: error: no viable conversion from returned value of type 'tuple<[...], Vec256<c10::quint8>, Vec256<c10::quint8>>' to function return type 'tuple<[...], Vec256<float>, Vec256<float>>'
return std::make_tuple(
^~~~~~~~~~~~~~~~
../aten/src/ATen/native/cpu/Loops.h:80:10: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::dereference_vec_impl<function_traits<(lambda at aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.AVX2.cpp:992:5)>, 0, 1, 2>' requested here
return dereference_vec_impl<traits>(data, opt_scalar, S, i, Indices{});
^
../aten/src/ATen/native/cpu/Loops.h:149:18: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::dereference_vec<function_traits<(lambda at aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.AVX2.cpp:992:5)> >' requested here
auto args1 = dereference_vec<traits>(&data[1], opt_scalar, S, i);
^
../aten/src/ATen/native/cpu/Loops.h:211:14: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::vectorized_loop<(lambda at aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.AVX2.cpp:992:5), (lambda at aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.AVX2.cpp:992:5)>' requested here
return vectorized_loop(data, n, 0, std::forward<func_t>(op), std::forward<vec_func_t>(vop));
^
Expected behavior
Allow different types for input and output tensors.
Specifically, don't restrict the type to be dependent on the return type - https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/Loops.h#L137
Metadata
Metadata
Assignees
Labels
enhancementNot as big of a feature, but technically not a bug. Should be easy to fixNot as big of a feature, but technically not a bug. Should be easy to fixmodule: TensorIteratormodule: vectorizationRelated to SIMD vectorization, e.g., Vec256Related to SIMD vectorization, e.g., Vec256triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module