Skip to content


Author: Yikai Cui.


A universal character name in an identifier does not designate a character whose encoding falls into one of the specified ranges (



A universal character name is something like \uXXXX or \UXXXXXXXX. It can be used to represent a character that is not in the basic source character set.


The standard specifies that a universal character name shall not designate a code point where the hexadecimal value is:

  • less than 00A0 other than 0024($), 0040(@), or 0060(`)
  • in the range D800 through DFFF inclusive
  • greater than 10FFFF


  • 小于00A0,除了0024($), 0040(@), 或者0060(`)
  • 在D800到DFFF之间
  • 大于10FFFF

The code points less than 00A0 can be represented in the basic source character set, so they are not allowed to be represented by universal character names. Here is an example where the use of universal character names in basic source character set can be confusing: suppose a lexer meets an identifier with universal character name \u0022 (where '\u0022' is the value, or code point, of "), it may treat the universal character name as '"', which may cause compilation errors.


The code points in the range D800 through DFFF are reserved for surrogate pairs in UTF-16, so they are not allowed to be represented by universal character names.


The code points greater than 10FFFF are not valid Unicode code points (unused unicode plains), so they are not allowed to be represented by universal character names.



#include <stdio.h>

int main() {
    int a\u0024 = 1; // not an undefined behavior! (1)
    int b\u0022 = 2; // undefined behavior! (2)
    return a$;
  1. Not an undefined behavior. Here \u0024 falls into the range of the universal character names, therefore is not an undefined behavior. The variable can even be used by a$ (see the return statement)
  2. Undefined behavior! Here \u0022 does not fall into any the range of the universal character names, therefore is an undefined behavior.

View source


Nearly all (Standard conforming) implementations raise errors on invalid universal character names during compilation.



Using universal character names in identifiers is not recommended, not only because the source file may not be successfully compiled in earlier compilers conforming C99 or earlier, but also because using characters that are not in the basic character set might reduce the readablity of the source code.


The advice is simple: just use English. If you must use them, make sure they are valid.
