Skip to content

Add clang-tidy, fix narrowing issues, fix constness#65

Merged
copybara-service[bot] merged 3 commits intogoogle:devfrom
enum-class:narrowing-issues
Mar 4, 2024
Merged

Add clang-tidy, fix narrowing issues, fix constness#65
copybara-service[bot] merged 3 commits intogoogle:devfrom
enum-class:narrowing-issues

Conversation

@enum-class
Copy link
Contributor

  • introduce clang-tidy into codebase
  • rectify narrowing issues by adjusting type conversions appropriately to prevent potential data loss or undefined behavior
  • Review the codebase to identify areas where const correctness can be improved.

unfortunately I did not find tests in repo to check.

ops.h Outdated
constexpr float eps = 1e-6f;
float ss = SquaredL2(x, size);
ss = 1.0f / sqrtf(ss / static_cast<int>(size) + eps);
ss = 1.0f / sqrtf(ss / static_cast<float>(size) + eps);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this is actually a slight pessimization, it is cheaper to convert int32 to float than u64 -> float. If we want to be explicit, cast<float>(cast<int>(size)) would be ideal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that is true

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of adding a function like FloatFromSmallInt? Could be a template supporting types such as size_t, and would avoid replicating these two casts in various places.

Thanks BTW for adding const and fixing these issues :)

Copy link
Contributor Author

@enum-class enum-class Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of creating a safe_cast function to ensure casting without causing narrowing. But in the meantime I will go with FloatFromSmallInt thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. How about we only do range checks (if any) in debug builds, to avoid overhead? You can use HWY_DASSERT for such things.

@austinvhuang
Copy link
Contributor

This looks pretty good. @jan-wassenberg if this looks alright to you we can probably go ahead and merge.

ops.h Outdated
std::is_arithmetic_v<To> && std::is_arithmetic_v<From>, To>
StaticCast(From from) noexcept {
if constexpr (std::is_unsigned_v<From> && std::is_floating_point_v<To>)
return static_cast<To>(static_cast<int64_t>(from));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int64_t -> float is also somewhat expensive. How about we replace int64_t with hwy::SignedFromSize<sizeof(from)>? This evaluates to int for From=float.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it seems I've created traits similar to hwy::SignedFromSize. No need to reinvent the wheel I'll just use that.
I think it would be hwy::SignedFromSize<2 * sizeof(from)> , right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, I've experimented with this function on Compiler Explorer and noticed that on 64-bit machines, int64 can sometimes yield faster performance. What's your take on this?

float func1(size_t size)
{
    return static_cast<float>(static_cast<int64_t>(size));
}

float func2(size_t size)
{
    return static_cast<float>(static_cast<int>(size));
}

In X86-64:

**clang-14**
func1(unsigned long):                              # @func1(unsigned long)
        push    rbp
        mov     rbp, rsp
        mov     qword ptr [rbp - 8], rdi
        cvtsi2ss        xmm0, qword ptr [rbp - 8]
        pop     rbp
        ret
func2(unsigned long):                              # @func2(unsigned long)
        push    rbp
        mov     rbp, rsp
        mov     qword ptr [rbp - 8], rdi
        mov     rax, qword ptr [rbp - 8]
        cvtsi2ss        xmm0, eax
        pop     rbp
        ret
**clang-17**
func1(unsigned long):                              # @func1(unsigned long)
        push    rbp
        mov     rbp, rsp
        mov     qword ptr [rbp - 8], rdi
        cvtsi2ss        xmm0, qword ptr [rbp - 8]
        pop     rbp
        ret
func2(unsigned long):                              # @func2(unsigned long)
        push    rbp
        mov     rbp, rsp
        mov     qword ptr [rbp - 8], rdi
        mov     rax, qword ptr [rbp - 8]
        cvtsi2ss        xmm0, eax
        pop     rbp
        ret
**gcc-13**
func1(unsigned long):
        push    rbp
        mov     rbp, rsp
        mov     QWORD PTR [rbp-8], rdi
        mov     rax, QWORD PTR [rbp-8]
        pxor    xmm0, xmm0
        cvtsi2ss        xmm0, rax
        pop     rbp
        ret
func2(unsigned long):
        push    rbp
        mov     rbp, rsp
        mov     QWORD PTR [rbp-8], rdi
        mov     rax, QWORD PTR [rbp-8]
        pxor    xmm0, xmm0
        cvtsi2ss        xmm0, eax
        pop     rbp
        ret

In Arm:

**clang-17**
func1(unsigned long):                              // @func1(unsigned long)
        sub     sp, sp, #16
        str     x0, [sp, #8]
        ldr     d0, [sp, #8]
        fmov    x8, d0
        scvtf   s0, x8
        add     sp, sp, #16
        ret
func2(unsigned long):                              // @func2(unsigned long)
        sub     sp, sp, #16
        str     x0, [sp, #8]
        ldr     x8, [sp, #8]
        scvtf   s0, w8
        add     sp, sp, #16
        ret
**gcc-13**
func1(unsigned long):
        sub     sp, sp, #16
        str     x0, [sp, 8]
        ldr     x0, [sp, 8]
        scvtf   s0, x0
        add     sp, sp, 16
        ret
func2(unsigned long):
        sub     sp, sp, #16
        str     x0, [sp, 8]
        ldr     x0, [sp, 8]
        fmov    s0, w0
        scvtf   s0, s0
        add     sp, sp, 16
        ret

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks for showing the exact codegen. It's not 100% obvious which is better because 64-bit cvtsi2ss has lower throughput, and we can afford an extra reg. Looks like same-size is indeed better on Arm, though, and it's not super time-critical, so let's go with your current code 👍

@jan-wassenberg
Copy link
Member

Looks good to me as well, thanks for adding the StaticCast :) Just one more small suggestion, then we're good to go. Or let me know if you'd prefer we make this change.

@jan-wassenberg jan-wassenberg added the copybara-import Trigger Copybara for merging pull requests label Mar 4, 2024
@copybara-service copybara-service bot merged commit cd74681 into google:dev Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

copybara-import Trigger Copybara for merging pull requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants