Skip to content

Enhancing Utility Functions in ops.h#105

Merged
copybara-service[bot] merged 4 commits intogoogle:devfrom
enum-class:improve_ops_utility
Mar 25, 2024
Merged

Enhancing Utility Functions in ops.h#105
copybara-service[bot] merged 4 commits intogoogle:devfrom
enum-class:improve_ops_utility

Conversation

@enum-class
Copy link
Contributor

I have tried to use highway library in the AddFrom, MulBy, MulByConst, MulByConstAndAdd, and create_distribution functions.

Copy link
Member

@jan-wassenberg jan-wassenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, great to see the Transform* make this quite compact. Thanks for adding!

Copy link
Member

@jan-wassenberg jan-wassenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super, the other spots are much nicer now, thank you :)

@jan-wassenberg jan-wassenberg added the copybara-import Trigger Copybara for merging pull requests label Mar 19, 2024
@austinvhuang
Copy link
Contributor

Nice! Did you see any performance differences (positive or negative) @enum-class ?

auto vmax = vmin;
Foreach(d, x, mask_pos, vmin,
[&vmax](const auto d, const auto value)
HWY_ATTR { vmax = hn::Max(vmax, value); });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pchx noticed we are missing the reduction set here - after Foreach, broadcasting the max to all lanes:
vmax = hn::MaxOfLanes(d, vmax); // broadcast

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pchx and @jan-wassenberg.
In my local test I have noticed ReduceMax is missing.
I will fix it.

ops.h Outdated

hn::Transform(d, top_k.data(), top_k.size(),
[&temperature_inv](D d, hn::Vec<D> v) HWY_ATTR {
return hn::Mul(hn::Exp(d, hn::Log(d, v)), temperature_inv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have just v/temp here - I think we want log/temp, then only exp.
We are also missing the second half of the original code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about vtemp things, could you please explain a bit more?

About the second half, I think it already exists inside the std::discrete_distribution.
I mean it will divide each weight by the sum of all n weights. am I missing something ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! What I see now is exp(log(v)), which is just v, then we mul by 1/temp.
I think we intended exp(log*inv).

For the second half, I am not familiar with discrete_distribution. As a sanity check, how about we temporarily run both the old and the new code, and ensure their results are the same (which is generally a helpful methodology for porting to SIMD :) ), then we can remove the old before merging?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, discrete_distribution does make the second half of the original code redundant -- we don't need to divide by the sum of weights.

Thanks!

@enum-class
Copy link
Contributor Author

Nice! Did you see any performance differences (positive or negative) @enum-class ?

@austinvhuang Du to hardware limitations, I couldn't do the performance test on the entire pipeline.
I did check individual functions like MulBy, MulByConst and MulByConstAndAdd as much as fair I could.
Anyway, On 12th Gen Intel Core i7-1265U with SSE, -O3, I saw roughly 2x speedup.
I can share the details if you are interested:)

@enum-class enum-class changed the base branch from main to dev March 23, 2024 13:11
Copy link
Member

@jan-wassenberg jan-wassenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive work! Thanks for adding the test :)

@copybara-service copybara-service bot merged commit 9f1595c into google:dev Mar 25, 2024
copybara-service bot pushed a commit that referenced this pull request Mar 26, 2024
Also fix copybara. Refs #105

PiperOrigin-RevId: 619157071
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

copybara-import Trigger Copybara for merging pull requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants