Enhancing Utility Functions in ops.h#105
Conversation
jan-wassenberg
left a comment
There was a problem hiding this comment.
Nice, great to see the Transform* make this quite compact. Thanks for adding!
jan-wassenberg
left a comment
There was a problem hiding this comment.
Super, the other spots are much nicer now, thank you :)
|
Nice! Did you see any performance differences (positive or negative) @enum-class ? |
| auto vmax = vmin; | ||
| Foreach(d, x, mask_pos, vmin, | ||
| [&vmax](const auto d, const auto value) | ||
| HWY_ATTR { vmax = hn::Max(vmax, value); }); |
There was a problem hiding this comment.
@pchx noticed we are missing the reduction set here - after Foreach, broadcasting the max to all lanes:
vmax = hn::MaxOfLanes(d, vmax); // broadcast
There was a problem hiding this comment.
Thanks @pchx and @jan-wassenberg.
In my local test I have noticed ReduceMax is missing.
I will fix it.
ops.h
Outdated
|
|
||
| hn::Transform(d, top_k.data(), top_k.size(), | ||
| [&temperature_inv](D d, hn::Vec<D> v) HWY_ATTR { | ||
| return hn::Mul(hn::Exp(d, hn::Log(d, v)), temperature_inv); |
There was a problem hiding this comment.
Looks like we have just v/temp here - I think we want log/temp, then only exp.
We are also missing the second half of the original code?
There was a problem hiding this comment.
I am confused about vtemp things, could you please explain a bit more?
About the second half, I think it already exists inside the std::discrete_distribution.
I mean it will divide each weight by the sum of all n weights. am I missing something ?
There was a problem hiding this comment.
Sure! What I see now is exp(log(v)), which is just v, then we mul by 1/temp.
I think we intended exp(log*inv).
For the second half, I am not familiar with discrete_distribution. As a sanity check, how about we temporarily run both the old and the new code, and ensure their results are the same (which is generally a helpful methodology for porting to SIMD :) ), then we can remove the old before merging?
There was a problem hiding this comment.
Ah, discrete_distribution does make the second half of the original code redundant -- we don't need to divide by the sum of weights.
Thanks!
@austinvhuang Du to hardware limitations, I couldn't do the performance test on the entire pipeline. |
jan-wassenberg
left a comment
There was a problem hiding this comment.
Impressive work! Thanks for adding the test :)
Also fix copybara. Refs #105 PiperOrigin-RevId: 619157071
I have tried to use highway library in the AddFrom, MulBy, MulByConst, MulByConstAndAdd, and create_distribution functions.