Suppose I wanted to detect unicode characters and encode them using \u notation. If I had to use a byte array, are there simple rules I can follow to detect groups of bytes that belong to a single character?
I am referring to UTF-8 bytes that need to be encoded for an ASCII-only receiver. At the moment, non-ASCII-Printable characters are stripped. s/[^\x20-\x7e\r\n\t]//g.
I want to improve this functionality to write \u0000 notation.
\uhas nothing whatsoever to do with UTF8.