base64 length generic [Illustration]#904
Draft
lemire wants to merge 17 commits intoyagiz/add-binary-length-base64from
Draft
base64 length generic [Illustration]#904lemire wants to merge 17 commits intoyagiz/add-binary-length-base64from
lemire wants to merge 17 commits intoyagiz/add-binary-length-base64from
Conversation
Co-authored-by: Erik Corry <erik@arbat.com>
This avoids zero-extend in the inner loop. Since we are accumulating the result in a 64 bit register we want to keep it all 64 bit clean.
Port the AVX2 binary_length_from_base64 function to use AVX-512 instructions for the icelake implementation. Key differences from AVX2: - Process 64 bytes per iteration instead of 32 - Use _mm512_cmpgt_epi8_mask which returns __mmask64 directly - Use _mm_popcnt_u64 for popcount - Guard against overshoot=0 case to avoid UB from shifting by 64 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
5 tasks
6211533 to
32c0869
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is not mean to be merge, but it is to illustrate how we can support all of our kernels without necessarily crafting a custom implementation for each when computing the base64 length.
We don't want to code everything down to the metal with intrinsics, and this approach here gives decent results.
cc @anonrig @erikcorry
It gets 72 GB/s on my mac laptop.
THIS IS FOR ILLUSTRATION PURPOSES.