Skip to content


Author: Yikai Cui.


The initial character of an identifier is a universal character name designating a digit (



In, it is stated that the identifiers always start with one of the following: nondigit, XID_Start character, universal-character-name of class XID_Start.


An XID_Start character is an implementation-defined character whose corresponding code point in ISO/IEC 10646 has the XID_Start property. The author would like to emphasize that the XID_Start property is not a part of the C standard, but a part of the Unicode standard. The XID_Start property is defined in the Unicode Standard Annex #31, which is a part of the Unicode standard. The Unicode Standard has a lot to do with this undefined behavior, but the author would like to put a stop and focus on the C standard.

XID_Start字符是一个由实现定义的字符,其在ISO/IEC 10646中对应的代码点具有XID_Start属性。需要强调的是,XID_Start属性不是C标准的一部分,而是Unicode标准的一部分。XID_Start属性在Unicode标准附录#31中定义,该附录是Unicode标准的一部分。Unicode标准与此未定义行为有很大关系,但笔者不打算拘泥于这一部分而是专注于C标准。

If the initial character of an identifier is a universal character name designating a digit, for example \u0030 (which is the universal character name of the digit 0), confusion may occur. The specific behavior of the implementation is therefore undefined.


The situation where a initial character of an identifier is a universal character name designating a digit also violates undefined behavior No. 28 ("A universal character name in an identifier does not designate a character whose encoding falls into one of the specified ranges").



#include <stdio.h>

int main() {
    int \u0030A = 1; // undefined behavior! (1)
  1. undefined behavior! Here \u0030 is the universal character name of the digit 0. The initial character of the identifier is a universal character name designating a digit, which is undefined behavior. Also, this violates undefined behavior No. 28 ("A universal character name in an identifier does not designate a character whose encoding falls into one of the specified ranges").

View source


Nearly all (Standard conforming) implementations raise errors on invalid universal character names during compilation.



Using universal character names in identifiers is not recommended, not only because the source file may not be successfully compiled in earlier compilers conforming C99 or earlier, but also because using characters that are not in the basic character set might reduce the readablity of the source code.


The advice is simple: just use English. If you must use them, make sure they are valid.
