Fix AVX2 implementation of IQ4_K, IQ4_KS, IQ5_K, IQ6_K #427
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I have made the exact same mistake a number of times.
On$z_i = x_{2i} y_{2i} + x_{2i+1} y_{2i+1}$ . The quant values $c \cdot b$ , where $b = \sum y_i$ has been pre-computed when quantizing the activations
AVX2the instruction to perform dot products ofint8_tvectors (as needed in quantized matrix multiplications) is_mm256_maddubs_epi8(x, y), wherexmust be unsigned andysigned, and the result is a SIMD vector of signedint16_tvaluesxand quantized activationsyare signed, so one way to deal with the the strangeness of this instruction is to add a suitable constant valuectoxso that it becomes unsigned, use_mm256_maddubs_epi8(c+x, y)to accumulate the dot product, and at the end subtracty. The issue arises when thexvalues span the fullint8_trange as it is the case with the non-linear quantsIQ4_NL, IQ4_XS, IQ4_K, IQ4_KS, IQ5_K, IQ5_KS, IQ6_K. In that casec = 128, thec+xvalues span the fulluint8_trange, and hence it is possible to overflow the signedint16_trange.I had though that I had fixed this mistake, but while working on the
IQ5_KStype added in PR #422 I noticed that the issue still existsIQ4_K, IQ4_KS, IQ5_K, IQ6_Kand was only fixed for the corresponding repacked variants.The PR corrects the problem. There will be a slight (a few percent) PP performance degradation on
AVX2for these quantization types.