first I must says that I hate templates.... it makes unreadable complex code.... not speaking about 5000000 lines long error messages, with 10000 possibilities and that takes ages to understand for something that, at the end is trivial and only produce a single error message in plain C++.
But I'm facing a problem that I'm puzzled how to solve :
I'm tired of poor unicode support of C++, they created char8_t but it's only a kind of uint8_t that can't handle multibytes.... and not speaking of the broken u8string, u8string_view and others "stl basic_string<>" : they aren't ensuring utf-8 boundaries, you can do a substring ending at the middle of an utf8 sequence or starting in the middle of another or both....
so I'm writing a small lib to handle it correctly respecting the "codepoint" concept : I've u8CodepointView, u16CodepointView, u32CodepointView, that allow access codepoints without risk to break multibytes, and allow to validate. They are readonly access without copy. They are child class of utfCodepointView base abstract class
then I've the same but modifiable : u8Codepoint... u8string.... all read access are forced to go through the *View to avoid the broken current design of stl string_view and string, that duplicate most of access (but that's normal since the view concept came latter, because of the inefficiency of the string concept). Also I've implemented way to use regular C++ api (cout, cin....) because a string that isn't displayable is useless. so my type can be safely explicitly casted to std::string and others of C++ stl.
the problem I have is that I want utfStringView force to implement some API, and also implement some generic functions that use pointer/reference to this base class to work on u8string, u16string, u32string without having to care about what's behind.... string comparison, REAL string comparison, of u8string and u16string for example, changing string content... etc... in most of case, appart type conversion, I don't have to care if it's a utf8 utf16 utf32(ucs) behind... at the end they are all a sequence of codepoints.
some of theses functions are
class utfStringView {
public :
virtual utfCodepointView operator[] (const cpidx &aIdx) = 0;
virtual utfStringView substr(const cpidx& aStart, size_t aCount) = 0;
};
cpidx type is just an int wrapper, allowing me to override functions, nothing interesting here. Just a workaround allowing to access both at byte level and codepoint level using an unsigned int index.
taking for example the
virtual utfCodepointView operator[] (const cpidx &aIdx) = 0;
it returns the aIdx'th codepointView, but the codepointView can be u8CodepointView instance, u16CodepointView, u32CodepointView.... and all have their own validate(), size() virtual functions.
so, if i want to get the size of the 3bytes wide utf-8 codepoint at position 10 of the string, doing lString[10].size() I have to use covariant return type : if the utfCodepointview class was not pure abstract class, the returned value will only call the utfCodepointview::size() and not the u8CodepointView::size(). And since I made it abstract, I can't instantiate an object of this type.
so I've to move to
virtual utfCodepointView* operator[] (const cpidx &aIdx) = 0;
virtual utfCodepointView& operator[] (const cpidx &aIdx) = 0;
but this brings me to lifetime issue : I don´t want to allocate a new object, because the utfCodepointView objects are created to be lightweight and allow efficient copy (they have only 2 member : a pointer to data, and a size). So returning a pointer.... that will destroy all efficiency by allocating, deleting... for a 2 CPU's register operation...and something that will most of the time be accessed then dropped to access the next, that's non sens! I will just spend time allocating and deleting 2 bytes objects...
and returning a reference... to what? the object can only be a local variable, so destroyed at return of operator.... can't make it a member because it won't be thread safe, or I'll have to protect it by mutex.... and again will loose all efficiency....
same is happening for the substr function...
the only solution I see is to make the utfStringView a template... and I don't like it....
template<class BaseCodepoint, class BaseSTring>
class utfStringView
{
public:
constexpr virtual BaseCodepoint operator[] (const cpidx &aIdx) = 0;
constexpr virtual BaseSTring substr(const cpidx& aStart, size_t aCount) = 0;
}
and this also won't allow me to fully use polymorphism on utfStringView derivated object to work on u8string, u16string, u32string in the same way, so I'll have to write all other generic functions in template....
by the way, I moved to c++23 because I love constexpr.... and some really nice constexpr feature are only in c++23.... so, if there is something that save me in the standard and that I don't know about.... I'm not limited by the version of the standard...
so my question : is there a way to do it without template? keeping polymorphism and allowing me to works on string without caring about what's behind....
thanks and regards