Skip to content

make detect encoding functions constexpr#920

Open
shikharish wants to merge 5 commits intosimdutf:masterfrom
shikharish:encoding-constexpr
Open

make detect encoding functions constexpr#920
shikharish wants to merge 5 commits intosimdutf:masterfrom
shikharish:encoding-constexpr

Conversation

@shikharish
Copy link
Contributor

Closes #891

Signed-off-by: Shikhar <shikharish05@gmail.com>
Signed-off-by: Shikhar <shikharish05@gmail.com>
@shikharish
Copy link
Contributor Author

cc: @lemire @pauldreik

@lemire
Copy link
Member

lemire commented Jan 13, 2026

@shikharish thanks. This will nicely close our gap. Let us include this in our next release.

Copy link
Collaborator

@pauldreik pauldreik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I am in a hurry so sorry for not doing a thorough review.
I think it is mostly fine but I have some questions.

simdutf_warn_unused encoding_type check_bom(const uint8_t *byte, size_t length);
simdutf_warn_unused encoding_type check_bom(const char *byte, size_t length);
template <typename BytePtr>
simdutf_warn_unused inline simdutf_constexpr14 encoding_type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use a macro guarded c++20 concept here to restrict the bytepr.

return encoding_type::unspecified;
}

simdutf_warn_unused inline encoding_type check_bom(const char *byte,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be really_inline

#include <simdutf/scalar/utf32_to_utf8/utf32_to_utf8.h>
#include <simdutf/scalar/utf32_to_utf8/valid_utf32_to_utf8.h>
#include <simdutf/scalar/utf8.h>
#include <simdutf/scalar/detect.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this included twice?

Comment on lines +132 to +135
#include <simdutf/scalar/utf8.h>
#include <simdutf/scalar/utf16.h>
#include <simdutf/scalar/utf32.h>
#include <simdutf/scalar/detect.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is ok to move around the includes, but remove those that are duplicates

#include <simdutf/scalar/utf32.h>
#include <simdutf/scalar/detect.h>

#if SIMDUTF_SPAN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

* of units processed if successful.
*/
simdutf_warn_unused result
simdutf_warn_unused simdutf_constexpr23 result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this changed?

}

template <endianness big_endian>
template <endianness big_endian, typename InputPtr>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a macro guarded c++20 concept check.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and reused the already existing concept and style)

Signed-off-by: Shikhar <shikharish05@gmail.com>
Signed-off-by: Shikhar <shikharish05@gmail.com>
@lemire
Copy link
Member

lemire commented Jan 16, 2026

@shikharish Could you have a look at @pauldreik's comments ? There are a few that are apparently outstanding (meaning that it is unclear whether you answered or changed the code accordingly).

@shikharish
Copy link
Contributor Author

@lemire I made all the requested changes. There were a few things I misunderstood before but I've fixed them in the last few commits.

@lemire
Copy link
Member

lemire commented Jan 16, 2026

@pauldreik Can you check that your concerns were addressed?

#if SIMDUTF_CPLUSPLUS20
#include <concepts>
template <typename BytePtr>
concept check_bom_byteptr = requires(BytePtr p, size_t i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please look at the other constexpr changes recently merged and reuse the already existing concepts.

}

template <endianness big_endian>
template <endianness big_endian, typename InputPtr>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and reused the already existing concept and style)

Signed-off-by: Shikhar <shikharish05@gmail.com>
@shikharish shikharish requested a review from pauldreik January 21, 2026 11:39
@shikharish
Copy link
Contributor Author

@pauldreik Please review.

@pauldreik
Copy link
Collaborator

@shikharish when you get review comments, please go through them one by one and fix/answer them. otherwise I have to go through everything again. there are still issues left from earlier comments.

what is also left to do:

  • update the README which says these two functions miss constexpr functionality.
  • a function template is already implicitly inline, don't mark as inline.
  • the changes in base64, why are they there? surprised to see them in this PR.

Comment on lines +2 to +3
#include <iostream>
#include <array>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these don't seem to be used.

Comment on lines +20 to +24
#if SIMDUTF_SPAN
template <endianness big_endian, simdutf::detail::indexes_into_utf16 InputPtr>
#else
template <endianness big_endian, typename InputPtr>
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template <endianness big_endian, typename InputPtr>
#if SIMDUTF_CPLUSPLUS20
  requires simdutf::detail::indexes_into_utf16<InputPtr>
#endif
simdutf_warn_unused simdutf_constexpr23 bool

}

template <endianness big_endian>
inline simdutf_warn_unused simdutf_constexpr23 bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a simdutf really inline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make immediate versions of autodetect_encoding and detect_encodings (constexpr)

3 participants