char

"

'

char32_t s1 [] = U "\u0073tring" ; char16_t s2 [] = U "\u0073tring" ; char8_t s2 [] = U "\u0073tring" ; char32_t c1 = U ' \u0073 ' ; char16_t c1 = u ' \u0073 ' ; char8_t c1 = u8 ' \u0073 ' ;

In C++ 20 there are 2 kinds and 6 forms of Unicode literals. Character literals and string literals, in UTF-8, UTF-16, and UTF-32 encodings. Each of them uses a distinct char type to signal in the type system what the encoding is for the data. For plainliterals, the encoding is the execution character set, which is implementation defined. It might be something useful, like UTF-8, or old, like UCS-2, or something distinctly unhelpful, like EBCDIC. Whatever it is, it was fixed at compilation time, and is not affected by things like LOCALE settings. The source character set is the encoding the compiler believes your source code is written in. Including the octets that happen to be betweenandcharacters.

Unicode codepoint U+0073 is ‘s’. So all of the strings are “string”, and all of the characters are ‘s’, however the rendering of each in memory is different. For the string types, each unit in the string is 32, 16 or 8 bits respectively, and for the character literals, each is one code unit, again of 32, 16, or 8 bits.