unicode.md

Unicode

Most letters and symbols that are common in the English speaking world fit into a single char, so pretending that a char is always "a single letter or symbol" is generally a good enough mental model.

Where this falls apart is with things like emoji (👨‍🍳) which are generally considered to be one symbol, but cannot be represented in a single char.

char chef = '👨‍🍳';

chars are actually "utf-16 code units". Many symbols require multiple "code units" to represent.

For a full explanation, refer to this old Computerphile video.

It describes "utf-8", which is 8 bits per "code unit." Java's char uses 16 bits, but that is the only difference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unicode

Uh oh!

FilesExpand file tree

unicode.md

Latest commit

History

unicode.md

File metadata and controls

Unicode