Skip to content

UB29⚓︎

Author: Yikai Cui.

Definition⚓︎

The initial character of an identifier is a universal character name designating a digit (6.4.2.1)

标识符的首个字符是一个指代数字的通用字符名。

Description⚓︎

In 6.4.2.1(Identifiers), it is stated that the identifiers always start with one of the following: nondigit, XID_Start character, universal-character-name of class XID_Start.

由C语言对标识符的定义,标识符的首个字符必须是以下三者之一:非数字字符、XID_Start字符、XID_Start类别的通用字符名。

An XID_Start character is an implementation-defined character whose corresponding code point in ISO/IEC 10646 has the XID_Start property. The author would like to emphasize that the XID_Start property is not a part of the C standard, but a part of the Unicode standard. The XID_Start property is defined in the Unicode Standard Annex #31, which is a part of the Unicode standard. The Unicode Standard has a lot to do with this undefined behavior, but the author would like to put a stop and focus on the C standard.

XID_Start字符是一个由实现定义的字符,其在ISO/IEC 10646中对应的代码点具有XID_Start属性。需要强调的是,XID_Start属性不是C标准的一部分,而是Unicode标准的一部分。XID_Start属性在Unicode标准附录#31中定义,该附录是Unicode标准的一部分。Unicode标准与此未定义行为有很大关系,但笔者不打算拘泥于这一部分而是专注于C标准。

If the initial character of an identifier is a universal character name designating a digit, for example \u0030 (which is the universal character name of the digit 0), confusion may occur. The specific behavior of the implementation is therefore undefined.

如果标识符的首个字符是一个指代数字的通用字符名,例如\u0030(它是数字0的通用字符名),则可能会产生混淆。因此,实现的具体行为是未定义的。

The situation where a initial character of an identifier is a universal character name designating a digit also violates undefined behavior No. 28 ("A universal character name in an identifier does not designate a character whose encoding falls into one of the specified ranges").

标识符的首个字符是一个指代数字的通用字符名的情况也违反了未定义行为28("标识符中的通用字符名不指代其编码落入指定范围之一的字符")。

Code⚓︎

UB29.c
#include <stdio.h>

int main() {
    int \u0030A = 1; // undefined behavior! (1)
}
  1. undefined behavior! Here \u0030 is the universal character name of the digit 0. The initial character of the identifier is a universal character name designating a digit, which is undefined behavior. Also, this violates undefined behavior No. 28 ("A universal character name in an identifier does not designate a character whose encoding falls into one of the specified ranges").

View source

Note⚓︎

Nearly all (Standard conforming) implementations raise errors on invalid universal character names during compilation.

几乎所有(符合标准的)编译器在编译期间对无效的通用字符名都会报错。

Advice⚓︎

Using universal character names in identifiers is not recommended, not only because the source file may not be successfully compiled in earlier compilers conforming C99 or earlier, but also because using characters that are not in the basic character set might reduce the readablity of the source code.

不推荐在标识符中使用通用字符名。不仅因为源文件可能无法在符合C99或更早版本的编译器中成功编译,还因为因为使用不在基本字符集中的字符可能会降低源代码的可读性。

The advice is simple: just use English. If you must use them, make sure they are valid.

建议很简单:只使用英文编写代码。如果必须使用通用字符名,请确保它们是有效的。