跳过正文

C 语言-个人参考手册

··51795 字
C
目录

我总结和使用的 C 语言参考手册。

C 版本
#

编译时使用 -std 参数指定 C 版本:

  • 标准 C 版本: c90 c89 c99 c11 c17 c18 c23
  • 带 GNU 扩展的 C 版本: gnu89 gnu90 gnu11 gnu17 gnu18 gnu23

未指定 -std 参数时,clang 使用 C99,gcc 使用 gnu17gnu++17

2022 年,Linux kernel 将 C 版本从 gnu89 切换到 gnu11: Programming Language

使用 -std 指定标准 C 版本的同时指定 -pedantic-pedantic-errors 选项,会在使用非标准 C 特性,如 GNU C 扩展特性时,打印警告或错误,从而确保编译通过的程序具有最大的可移植性。

# -Wall: 打印所有警告;
gcc -std=c11 -pedantic foo.c
gcc -Wall -Wextra -std=c2x -pedantic foo.c

-ansi 等效于 -std=c90

源码中可以使用宏 __STDC_VERSION__ 来判断 C 版本(long 类型),实现条件编译:

  • C89/C90:没有定义该宏。
  • C95:199409L
  • C99:199901L
  • C11:201112L
  • C17/C18:201710L
  • C23:202311L
#if __STDC_VERSION__ >= 1999901L
#include <stdio.h>
#endif

参考: https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/C-Dialect-Options.html

源文件
#

源文件经过词法解析后形成各种类型的 token:

  1. 标识符:用来命名 type、variable、struct, union、enumeration tags, their members, typedef names, labels, macros 等。
  2. 关键字
  3. 字面量:数值常量,字符常量,字符串常量,C99 支持的复合字面量
  4. 运算符: 对操作数进行操作,组成表达式,可以是单目、双目、前缀或后缀。
  5. 分隔符:用于分割 token,包含 ( ) [ ] { } ; , . : 。空白也用于分割 token,但它本身不是 token。
  6. 空白和注释:包含 空格、tab、换行、\f 和 \v;, 空白字符会被忽略。
#include <stdio.h>

int main() {
	printf( "hello, world\n" );
	return 0;
}

// 等效于
#include <stdio.h> int main(){printf("hello, world\n");return 0;}

解析成 token 后,需要进行语法分析,来生成语句,如表达式语句、if 语句、for 语句等。

类型
#

C 类型系统的核心思想是 抽象和组合 ,例如:

  1. struct/union/array/pointer 都是由基本类型组合而来。
  2. 函数由语句组合而来;
  3. 语句由表达式或 if/while/for 等组合而来;
  4. 表达式由操作符和操作数组合而来;

对于 array、struct、union 等符合类型,有两种初始化方式:

  1. 大括号表达式:大括号中的值必须是常量表达式,而且只能用于初始化,不能用于后续赋值。
  2. C99 支持的复合字面量(Compound Literals):可以用于初始化和后续赋值,字面量中可以使用变量。

基本类型
#

常量(字面量)类型:

// char 是 8 bits 的整型
printf("%d %d\n", 5, '5'); // 5 53

char c = '6';
int x = c;  // x 值是 54
int y = c - '0'; // y 值是 6

long long int x;
// 等效于
long long x;

short int x;
// 等效于
short x;

int a = 0x1A2B;
int b = 0x1a2b; // 十六进制
int c = 012; // 八进制
int x = 0b101010;  // 二进制
printf("%x", a);
printf("%o\n", c);
printf("%d\n", x);

整型默认为 int,浮点默认为 double,通过给字面量添加 U 和 L 后缀来改变字面量值类型:

int           x = 1234; // int
long int      x = 1234L; // long
long long int x = 1234LL // long long

unsigned int           x = 1234U;
unsigned long int      x = 1234UL;
unsigned long long int x = 1234ULL;

float x       = 3.14f; // 3.14F
double x      = 3.14;
long double x = 3.14L;
printf("%e\n", 123456.0);  // Prints 1.234560e+05
Type Suffix
int None(默认)
long int L
long long int LL
unsigned int U
unsigned long int UL
unsigned long long int ULL
float F
double None(默认)
long double L

没有 long long double 类型 !

Unix/Linux/MacOS 系统使用 I32LP64 标准:

  1. int/float:32 位;
  2. long/long long/double/pointer:64 位;
  3. long doubule:128 位。

arm64 位系统:

Type Bytes Minimum Value Maximum Value
char 1 -128 127100
signed char 1 -128 127
short 2 -32768 32767
int 4 -2147483648 2147483647
long 8 -9223372036854775808 9223372036854775807
long long 8 -9223372036854775808 9223372036854775807
unsigned char 1 0 255
unsigned short 2 0 65535
unsigned int 4 0 4294967295
unsigned long 8 0 18446744073709551615
unsigned long long 8 0 18446744073709551615

limits.h 头文件定义了这些类型的取值范围:

Type Min Macro Max Macro
char CHAR_MIN CHAR_MAX
signed char SCHAR_MIN SCHAR_MAX
short SHRT_MIN SHRT_MAX
int INT_MIN INT_MAX
long LONG_MIN LONG_MAX
long long LLONG_MIN LLONG_MAX
unsigned char 0 UCHAR_MAX
unsigned short 0 USHRT_MAX
unsigned int 0 UINT_MAX
unsigned long 0 ULONG_MAX
unsigned long long 0 ULLONG_MAX

size_t 一般是 long unsigned int 的类型别名,占用 8 Bytes, 使用 %zu 格式化显示。

浮点类型大小固定(单位 byte):

Type sizeof
float 4
double 8
long double 16

浮点数有效数字精度的最小值:

Type Decimal Digits You Can Store Minimum 实际
float FLT_DIG 6 7
double DBL_DIG 10 16
long double LDBL_DIG 10 16

有效数字位数(精度):指浮点数在表示和计算时能够保持的精确位数。对于单精度和双精度浮点数,有效数字位数是由浮点数的尾数部分(也称为有效数字或分数部分)的位数决定的。具体如下:

  1. 单精度浮点数(32位):
    • 符号位:1位
    • 指数部分:8位
    • 尾数部分:23位
  2. 双精度浮点数(64位):
    • 符号位:1位
    • 指数部分:11位
    • 尾数部分:52位

有效数字位数的计算

  1. 单精度浮点数: 尾数部分有 23 位,但是因为浮点数采用规范化形式,隐藏了一位隐含的 1, 因此,总的有效数字位数为24位。 单精度浮点数的有效数字位数大约为 7 位十进制数 。一位十进制大概 3bit,故共21bits)
  2. 双精度浮点数: 尾数部分有52位,同样包含了一位隐含的1。 因此,总的有效数字位数为 53 位。 双精度浮点数的有效数字位数大约为 16位十进制数

浮点数的有效数字位数可以通过以下公式估算,其中,nn 是尾数部分的总位数(包括隐含的1位):

Decimal Digits≈log⁡10(2n)log⁡10(10)=n⋅log⁡10(2)log⁡10(10)≈n⋅0.3010Decimal Digits≈log10​(10)log10​(2n)=log10​(10)n⋅log10​(2)​≈n⋅0.3010
/*
  0.12345
  0.123456
  0.1234567
  0.12345678
  0.123456791  <-- Things start going wrong
  0.1234567910
*/

#include <stdio.h>
#include <float.h>

int main(void)
{
	// Both these numbers have 6 significant digits, so they can be stored
	// accurately in a float:

	float f = 3.14159f;
	float g = 0.00000265358f;

	printf("%.5f\n", f);   // 3.14159       -- correct!
	printf("%.11f\n", g);  // 0.00000265358 -- correct!

	// Now add them up
	f += g;                // 3.14159265358 is what f _should_ be

	printf("%.11f\n", f);  // 3.14159274101 -- wrong!
}

C99 bool 和固定宽度整型类型
#

C99 stdbool.h 提供了 bool 类型(占用 1 byte)和常量值 true/false

#ifndef __STDBOOL_H
#define __STDBOOL_H

#define bool _Bool
#define true 1
#define false 0

#endif /* __STDBOOL_H */

C99 stdint.h 中新增了以下类型,来解决之前 整型大小 不固定的问题:

  • int8_t、int16_t、int32_t、int64_t
  • uint8_t、uint16_t、uint32_t、uint64_t
  • int_least8_t、int_least6_t、int_least32_t、int_least64_t
  • int_fast8_t、int_fast6_t、int_fast32_t、int_fast64_t
  • uintmax_t、uintptr_t

以及一些类型的最大、最小值:

INT8_MAX           INT8_MIN           UINT8_MAX
INT16_MAX          INT16_MIN          UINT16_MAX
INT32_MAX          INT32_MIN          UINT32_MAX
INT64_MAX          INT64_MIN          UINT64_MAX

INT_LEAST8_MAX     INT_LEAST8_MIN     UINT_LEAST8_MAX
INT_LEAST16_MAX    INT_LEAST16_MIN    UINT_LEAST16_MAX
INT_LEAST32_MAX    INT_LEAST32_MIN    UINT_LEAST32_MAX
INT_LEAST64_MAX    INT_LEAST64_MIN    UINT_LEAST64_MAX

INT_FAST8_MAX      INT_FAST8_MIN      UINT_FAST8_MAX
INT_FAST16_MAX     INT_FAST16_MIN     UINT_FAST16_MAX
INT_FAST32_MAX     INT_FAST32_MIN     UINT_FAST32_MAX
INT_FAST64_MAX     INT_FAST64_MIN     UINT_FAST64_MAX

INTMAX_MAX         INTMAX_MIN         UINTMAX_MAX

对于常量,可以使用下面的宏:

INT8_C(x)     UINT8_C(x)
INT16_C(x)    UINT16_C(x)
INT32_C(x)    UINT32_C(x)
INT64_C(x)    UINT64_C(x)
INTMAX_C(x)   UINTMAX_C(x)

// 示例:
uint16_t x = UINT16_C(12);
intmax_t y = INTMAX_C(3490);

array
#

函数内的数组变量和其它自动变量类似,需要显式初始化, 否则值是随机的:

#include <stdio.h>

int main(void)
{
  // 定义数组,值是随机的。
	float f[4];

	f[0] = 3.14159;
	f[1] = 1.41421;
	f[2] = 1.61803;
	f[3] = 2.71828;

	for (int i = 0; i < 4; i++) {
		printf("%f\n", f[i]);
	}
}

数组初始化
#

  • 大括号初始化:大括号中的值必须都是 常量表达式 (因为它们都在编译期间求值),未初始化的元素默认初始化为 0;
  • 大括号指定 index range 初始化(GNU C 扩展),未指定的 index 值默认初始化为 0;
  • C99 开始支持复合字面量初始化;
// 未初始化,为随机值
int my_array[5];

// 先声明再赋值
struct point point_array [3];
point_array[0].x = 2;
point_array[0].y = 3;

// 全量初始化,指定的值都必须是常量表达式
int my_array[5] = { 0, 1, 2, 3, 4 };

// 部分初始化,剩余元素都初始化为 0
int my_array[5] = { 0, 1, 2 };
int my_array[5] = { 0 }; // 所有元素都为 0
// int my_array[5] = {}; // C23 之前不支持,必须至少指定一个元素的值。C23 支持空初始化表达式。

// GNU C 扩展:指定 index range 初始化,未指定的部分都初始化为 0;
int new_array[100] = { [0 ... 9] = 1, [10 ... 98] = 2, 3 };

// 不指定数组长度,自动根据初始化字面量值来计算
int my_array[] = { 0, 1, 2, 3, 4 };
int my_array[] = { 0, 1, 2, [99] = 99 };

// 宏常量表达式作为数组长度
#define COUNT 5
int a[COUNT] = {[COUNT-3]=3, 2, 1};

// 结构数组初始化
struct point {
	int x, y;
};
// 大括号初始化
struct point point_array [3] = { {2, 3}, {4, 5}, {6, 7} };
// 指定 index 初始化
struct point point_array [3] = { [0]={2}, [1]={4, 5}, [2]={6, 7}};
// 指定部分 field
struct point point_array [3] = { {2}, {4, 5}, {6, 7} };

// 复合字面量初始化,相比大扩展多了数组类型的类型转换(int[])
int globalArray[] = (int[]){ 1, 2, 3, 4, 5 };
数组名和函数传参
#

数组名表示内存的开始地址,它不是变量,编译器不会为数组名分配内存空间,故不能作为左值使用:

  • 函数不能返回数组类型 ,但是可以返回数组的指针或包含数组的 struct。
  • 数组名不能做左值, 所以数组之间不能赋值(但是相同的结构类型对象之间可以直接赋值)。

数组名作为右值使用时(如将数组名作为实参传递),等效为指向首元素的指针:

// 数组作为函数参数类型时等效为指针类型,一维数组的长度被忽略(多维数组参数不能忽略长度)
int foo(const int sz[10]);
// 等效于
int foo(const int sz[]);
// 等效于
int foo(const int *sz);

int a[2] = {0, 1};
int b[2] = {1, 2};
// 数组之间不能直接赋值:a 类型是数组,而 b 做右值是指针,两者类型不匹配。
a = b;

// 等效于 int *arr
void printArray(int arr[])
{
  // arr 是指针类型,所以 sizeof 值为 8
	printf("Size of Array in Functions: %d\n", sizeof(arr));
	printf("Array Elements: ");
	for (int i = 0; i < 5; i++) {
		printf("%d ",arr[i]); // 不管哪种方式,都支持 arr[N]  *(sz+N) 访问数组元素
	}
}

// 传入数组长度
void double_array(int *a, int len)
{
	for (int i = 0; i < len; i++)
		a[i] *= 2;
}

数组类型 extern 变量声明: extern 也必须声明为数组类型,而不是指针类型 。这是因为编译器需要为指针变量分配内存,而数组名代表一块连续内存区域的首地址, 它不是变量,编译器不为数组名分配内存。

// 数组变量定义
int array[5] = {1, 2, 3};

extern int array[]; // 正确
extern int *array; // 错误,编译时报错

数组 index 操作
#

A[i] 等效为指针表达式 (*((A)+(i))) ,所以数组名在表达式右边时等效为指针。

指针运算表达式 ptr + N 结果与 ptr 指向的对象类型,+ N 表示跳过 N 个该对象类型的地址空间。

int array[10];
int *ptr = array; // 数组作为右值时,等效为一个指针。

// array[0] == *ptr
// array[N] == *(ptr+N)
// ptr == array == &array[0]

aint arr[5] = { 10, 20, 30, 40, 50 };
int* ptr = &arr[0];
for (int i = 0; i < 5; i++) {
        printf("%d ", *ptr++); // 后缀单目运算符优先级最高
}

数组大小和元素数量
#

使用 sizeof 运算符获得类型或表达式值的大小:

  • sizeof 的参数可以是类型或表达式,对于类型 必须使用括号语法
  • sizeof 返回值类型是 size_t, 需要使用 %zu 格式化;
int x[12];
printf("%zu\n", sizeof x);     // 48,表达式
printf("%zu\n", sizeof(int));  // 4,类型
printf("%zu\n", sizeof x / sizeof(int)); // 12

void foo(int x[12])
{
	printf("%zu\n", sizeof x);  // 8
	printf("%zu\n", sizeof(int)); // 4
	printf("%zu\n", sizeof x / sizeof(int)); // 2
}

多维数组
#

在定义、声明或向函数传递多维数组变量时,除第一维外需要指定其它维的值:

// 字符串是 char 数组。
char *name[]={"Illegal manth", "Jan", "Feb", "Mar"};
char aname[][15] = { "Illegal month", "Jan", "Feb", "Mar" };
array function call function
int a[5]; func(int a[]); func(a);
func(int *a); // a[i], *(a+i)
int a[5]; func (int (*a)[5]); // a[0][i], *(*a+i) func(&a);
int a[5][5]; func (int (*a)[5]); // a[i][j], *(*(a+i)+j) func(a);
func (int a[][5]); // a[i][j], *(*(a+i)+j) func(a);
int a[5][5][5] func (int a[][5][5]); // a[i][j][k]; func(a);
func (int (*a)[5][5]); // a[i][j][k]; func(a);
int *a[5]; func (int *a[]); func(a);
func (int **a); func(a);

向函数传递 int a[2][2] 类型的二维数组名称 a 时, 实际传递的是 &a[0], 因为 a[0] 是一维数组, 所以 &a[0] 是数组指针, 类型为 int(*)[2] 。在声明函数参数时, 可以使用以下任意一种:

  1. int a[][2];
  2. int (*a)[2];
#include <stdio.h>

// 多维数组参数:可以省略一维,但必须指定后续维度的数组长度
void print_2D_array(int a[2][3]) { // 等效为:int a[][3] 或 int (*a)[3]
	for (int row = 0; row < 2; row++) {
		for (int col = 0; col < 3; col++)
			printf("%d ", a[row][col]);
		printf("\n");
	}
}

int main(void) {
	int x[2][3] = {
		{1, 2, 3},
		{4, 5, 6}
	};
	print_2D_array(x);
}

类似的三维数组 int c[2][2][2], 在声明函数参数时, 可以使用以下任意一种:

  1. int cc[][2][2];
  2. int (*cc)[2][2];
  3. c 是一个三维数组的名字, 相当于一个二维数组的指针, 所以 c+1 指向第二个二维数组。
  4. cc 是一个指针, 指向一个 2*2 的 int 数组, 当 cc = c 时, cc 实际指向三维数组 c 的第一个二维数组
  5. c 是 3 维数组的数组名,作为右值时是指向一个 2*3 的二维数组的指针:int (*cp)[2][3] = c;
  6. c+1 指向第二个 2*3 数组的指针: int (*cp)[2][3] = c+1;
  7. *(c+1) 或 c[1] 为 第二个 2*3 数组 ,而非该数组的第一个元素: int (*cp)[3] = c[1];
  8. *(*(c+1) + 1) 或 c[1][1] 为一个 3 个元素的数组,而非该数组的第一个元素:int *c5 = c[1][1];
  9. c[1][1][0] 才是数组数组的第一个元素;
//二维数组:
int a[3][3]

// a 是一个二维数组名,作为右值时表示一个一维数组的首地址指针
int (*p)[3] = a;

// a+1 为第二个一维数组的首地址指针
int (*p)[3] = a+1;

// a[1] 是一个一维数组名 ,作为右值时表示该数组的首地址指针
int *p = a[1];

总结:

int (*a)[5][5] 的使用方式和三维数组类似 int a[][5][5] ,但是前者 a 是指针变量,后者 a 是数组名(而非变量),当作为函数参数类型时,后者 a 等效为指针类型。

建议:函数参数使用 int a[][5][5] 而非 int (*a)[5][5] 形式。

多维数组的初始化:

// 多维数组字面量初始化, 每一维都是一级 {}
int c[2][2] = {{0,0}, {1,1}};
int b[][3][3] = 	  { // b
	{  //b[0]
		{1, 2, 3},  // b[0][0]
		{1, 2, 3}, // b[0][1]
		{1, 2, 3},
	},
	{
		{4, 5, 6}, // b[1][0]
		{4, 5, 6},
		{4, 5, 6},
	}
};

// 也可以打平列出多维数组的所有元素,根据元素数量自动计算第一维的值元素数量必须是后续维度的整数倍(2*2 = 4)
int c[][2][2] = {
	0,0,0,0, // c[0]
	1,1,1,1, // c[1]
	2,2,2,2, // c[2]
};

int a[3][2] = {
	{1, 2},
	{3},  // 未列出的元素为 0
	{5, 6}
};
/* 1 2 */
/* 3 0 */
/* 5 6 */

int a[3][2] = {
        {1, 2},
        // {3, 4},
        {5, 6}
};

/* 1 2 */
/* 5 6 */
/* 0 0 */

// 多维数组也可以打平初始化
int a[3][2] = { 1, 2, 3, 4, 5, 6 };
/* 1 2 */
/* 3 4 */
/* 5 6 */

// 整个数组都是 0
int a[3][2] = {0};
int a[3][2] = {}; // 错误:必须指定一个值

零长数组
#

零长数组不占用内存存储空间。

struct 类型最后一个 field 支持 Flexible Array Members,可以是 GNU C 扩展的或 C99 开始支持的零长数组,两者的区别是前者需要指定长度为 0,后续不指定长度。

int buffer[0];
printf("%d\n", sizeof(buffer)); // 0

struct buffer{
    int len;
    int a[0];
};
 printf("%d\n", sizeof(struct buffer)); // 4

// 编译器扩展的 0 长数组
struct len_string {
	int length;
	char data[0];
};

struct len_string *s = malloc(sizeof *s + 40);
s->length = 40;
strcpy(s->data, "Hello, world!");

// C99 正式支持零长数组,必须是 struct 最后一个成员,不指定大小;
struct len_string {
	int length;
	char data[];
};

struct len_string *len_string_from_c_string(char *s)
{
	int len = strlen(s);

	// Allocate "len" more bytes than we'd normally need
	struct len_string *ls = malloc(sizeof *ls + len);

	ls->length = len;

	// Copy the string into those extra bytes
	memcpy(ls->data, s, len);

	return ls;
}

为何不用指针代替零长数组?

数组名用来表征一块连续内存存储空间的地址,而指针是一个变量,编译器要给它单独再分配一个内存空间,用来存放它指向的变量的地址;对于一个指针变量,编译器要为这个指针变量单独分配一个存储空间,然后在这个存储空间上存放另一个变量的地址,我们就说这个指针指向这个变量。而数组名,编译器不会再给其分配一个存储空间的,它仅仅是一个符号,跟函数名一样,用来表示一个地址。

struct buffer1{
	int len;
	int a[0];
};
struct buffer2{
	int len;
	int *a;
};
int main(void)
{
	printf("buffer1: %d\n", sizeof(struct buffer1));
	printf("buffer2: %d\n", sizeof(struct buffer2));
	return 0;
}

/* buffer1:4 */
/* buffer2:8 */

VLA
#

https://en.cppreference.com/w/c/language/array#Variable-length_arrays

C99 支持可变长数组 Variable-Length Arrays (VLAs,linux kernel 不允许使用 VLA) ,在此之前 GNU C 扩展也支持 VLA:

  • 在运行时而非编译时确定数组的长度,数组长度可以为变量。
  • 可变长数组在栈上分配,和 malloc() 相比,优点:不需要手动 free 释放内存,sizeof() 返回数组的内存大小。
  • 只能在 block 作用域中,如函数参数或自动变量,不支持文件或全局作用域。
    • 不支持 static 类型 VLA,不支持用初始化表达式初始化 VLA,不支持在 struct/union 中使用 VLA。
  • 可以在 block 作用域中用 typedef 来声明 VLA 类型多维数组;
  • sizeof 可以正常使用,但是如果作为函数参数的 VLA, sizeof 返回的指针变量的大小。
#if __STDC_NO_VLA__ == 1
#error Sorry, need VLAs for this program!
#endif

#include <stdio.h>

int main(void)
{
	int n;

	printf("Enter a number: ");
	fflush(stdout);
	scanf(" %d", &n);

	// 可变长数组:数组长度是变量, 在栈上内存分配(类似于堆上内存分配的 int v[n]);
	int v[n * 100];

	// sizeof 返回数组总大小
	size_t num_elems = sizeof v / sizeof v[0];

	for (int i = 0; i < n; i++)
		v[i] = i * 10;

	for (int i = 0; i < n; i++)
		printf("v[%d] = %d\n", i, v[i]);
}

// 函数参数数组也可以是可变长数组
void fvla(int m, int C[m][m])
{
	typedef int VLA[m][m];
	int D[m];
	int (*s)[m];
	s = malloc(m * sizeof(int));
	static int (*q)[m] = &B;

  //  static int E[m]; // Error: static duration VLA
  //  extern int F[m]; // Error: VLA with linkage
  //  extern int (*r)[m]; // Error: VM with linkage
}

VLA 作为函数参数: 函数内 VLA 是一个指针,sizeof 返回指针大小。

// 对于使用 VLA 的函数声明, 长度部分可以使用 * 或变量
void foo(size_t x, int a[*]); // 函数声明
void foo(size_t x, int a[x]); // 函数声明

// 函数定义,必须用变量指定长度
void foo(size_t x, int a[x]) // x 是一个变量, 作为 a 数组的长度, 是一个可变长数组, 编译器自动分配内存
{
	printf("%zu\n", sizeof a); // same as sizeof(int*) // 函数内, a 是一个指针变量类型而非数组.
}

VLA 也支持 typedef 运算符,但是定义的数组大小为执行 typedef 时刻变量值:

#include <stdio.h>

int main(void)
{
	int w = 10;

	typedef int goat[w]; // goat 类型为固定大小的数组 int[10]

	// goat is an array of 10 ints
	goat x;   // 但是还是不能对 gota 进行字面量初始化。

	// Init with squares of numbers
	for (int i = 0; i < w; i++)
		x[i] = i*i;

	// Print them
	for (int i = 0; i < w; i++)
		printf("%d\n", x[i]);

	// Now let's change w...

	w = 20;

	// But goat is STILL an array of 10 ints, because that was the value of w when the typedef executed.
}

多维 VLA:

int w = 10;
int h = 20;

int x[h][w];
int y[5][w];
int z[10][w][20];


// 向函数传递多维 VLA 数组
#include <stdio.h>

void print_matrix(int h, int w, int m[h][w])
{
	for (int row = 0; row < h; row++) {
		for (int col = 0; col < w; col++)
			printf("%2d ", m[row][col]);
		printf("\n");
	}
}

int main(void)
{
	int rows = 4;
	int cols = 7;

	int matrix[rows][cols];

	for (int row = 0; row < rows; row++)
		for (int col = 0; col < cols; col++)
			matrix[row][col] = row * col;

	print_matrix(rows, cols, matrix);
}

数组 qualifiter
#

在使用数组类型的 函数参数 时,可以在方括号中指定 type qualifiters(const、volatile)和 static 关键字:

  • int p[static 4] :static 表示传入的数组 p 至少包含 4 个元素;
  • int a[const 20]: 等效于 int * const a, 20 会被忽略;
  • const int a[const 20] :等效于 const int * const a ,20 会被忽略;
  • double a[static restrict 10]:表示 a 数组至少有 10 个元素, 而且只会通过该指针来修改对应内存区域(编译器优化);
// 指针变量的 type qualifiters
int *const p;
int *volatile p;
int *const volatile p;
// etc.

// 数组的 type qualifiters 在方括号内指定
int func(int *const volatile p) {...}
int func(int p[const volatile]) {...}
int func(int p[const volatile 10]) {...}

// static N:表示 p 数组包含至少 4 个元素
int func(int p[static 4]) {...}

int main(void)
{
	int a[] = {11, 22, 33, 44};
	int b[] = {11, 22, 33, 44, 55};
	int c[] = {11, 22};

	func(a);  // OK!
	func(b);  // OK!
	func(c);  // Undefined behavior! c is under 4 elements!
}

int f(const int a[20])
{
	// in this function, a has type const int* (pointer to const int)
}

int g(const int a[const 20])
{
	// in this function, a has type const int* const (const pointer to const int)
}

// restrict 表示只会通过该指针来修改对应内存区域(没有其它方式),编译器可以据此进行优化.
void fadd(double a[static restrict 10], const double b[static restrict 10])
{
	for (int i = 0; i < 10; i++) // loop can be unrolled and reordered
	{
		if (a[i] < 0.0)
			break;
		a[i] += b[i];
	}
}

string
#

C 没有字符串类型,它实际是以 \0 结尾的 char 数组。

strlen() 计算字符串长度时不包含末尾的 \0 字符, 返回值类型是 size_t, 使用 %zd 来打印.

char *string = "abcd";
char string[] = "abcd";
// 数组长度要包含最后的 '\0'
char string[5] = "abcd";

// 不能通过字符串指针来修改字符串
char *string = "abcd";
sring[0] = 'z';  // 错误

// 但是通过数组可以修改字符串中字符
char string[] = {'a', 'b', 'c', 'd', '\0'};
string[0] = 'z';

转义字符:

  1. 特殊转义字符: \n \' \" \\ \a \b \f \r \t \v \?
  2. 数值转义字符: \1, \123, \x4D, \u2620, \U00002620
#include <stdio.h>
#include <threads.h>

int main(void)
{
	printf("Use \\n for newline\n");  // Use \n for newline
	printf("Say \"hello\"!\n");       // Say "hello"!
	printf("%c\n", '\'');             // '

	for (int i = 10; i >= 0; i--) {
		printf("\rT minus %d second%s... \b", i, i != 1? "s": "");
		fflush(stdout);  // Force output to update
		thrd_sleep(&(struct timespec){.tv_sec=1}, NULL);
	}

	printf("\rLiftoff!             \n");
}

// \1231-3 位八进制数,如 \0
// \x4D(必须是2位)
// \u2620 (必须是4位)
// \U0001243F (必须是8位)
printf("A\102C\n");  // 102 is `B` in ASCII/UTF-8
printf("\xE2\x80\xA2 Bullet 1\n");
printf("\xE2\x80\xA2 Bullet 2\n");
printf("\xE2\x80\xA2 Bullet 3\n");

空白字符(空格、\t、换行)分割的字符串会被 自动连接 ,从而支持超长字符串换行。

#include <stdio.h>
#include <string.h>

int main(void)
{
	char s[] = "Hello, world!";
	char t[100];
	strcpy(t, s); // 需要确保 t 空间要足够容纳 s

	t[0] = 'z';

	printf("%s\n", s);  // "Hello, world!"
	printf("%s\n", t);  // "zello, world!"

字符串长度
#

string.h 中的 strlen() 函数返回字符串(\0 终止)长度(bytes):

#include <stdio.h>
#include <string.h>

int main(void)
{
    char *s = "Hello, world!";

    printf("The string is %zu bytes long.\n", strlen(s));
}

int my_strlen(char *s)
{
    int count = 0;

    while (s[count] != '\0')  // Single quotes for single char
        count++;

    return count;
}

字符串操作
#

#include <stdio.h>
#include <string.h>

int main(void)
{
    char s[] = "Hello, world!";
    char t[100];  // Each char is one byte, so plenty of room

    // This makes a copy of the string!
    strcpy(t, s);

    // We modify t
    t[0] = 'z';

    // And s remains unaffected because it's a different string
    printf("%s\n", s);  // "Hello, world!"

    // But t has been changed
    printf("%s\n", t);  // "zello, world!"
}

pointer
#

指针是针对于单个标识符的,所以建议 * 和标识符连在一起:

int *foo, *bar;
int *baz, quux; // baz 为指针,quux 为 int

char *name; // name 为指针类型
char *args[4]; // args 为数组类型:4 个元素的数组类型,元素类型为 chart *
char (*args)[4]; // args 为指针类型:指向包含 4 个 char 元素的数组

// max 为函数指针变量,只能指向签名为 int (int, int) 的函数
int (*max)(int, int);

// max 为函数指针类型:定义签名为 int (int, int) 的函数类型
typedef int (*max)(int, int);

对于数组 int a[2]

  • a:数组 a 元素的首地址,做右值时地址,类型为 int *a ;
  • &a:指向一个 2 个 int 型数组的指针,类型为 int (*a)[2]
  • a[i]:等效为 *(a+i) ,即加偏移后再解引用;

a[b] 等效为 *(a + b) ,a 和 b 都可以是表达式,所以更准确的形式: (*((a) + (b)))

NULL 指针
#

C 和操作系统保证 NULL 指针对应的地址 0 永远不可能是有效的地址,所以返回指针的程序都使用特殊的 NULL 指针来表示程序出错。

下面几个值是等效的:

  • NULL
  • 0
  • '\0'
  • (void *)0
int *x;
if ((x = malloc(sizeof(int) * 10)) == NULL) {
	printf("Error allocating 10 ints\n");
}

指针转换
#

安全的转换规则如下:

  1. 任意指针类型值转换为 stdint.h 中定义的 intptr_tuintptr_t ,这两个类型可以在转为整型;
  2. void * 转换,或转换为 void *
  3. char * 转换,或转换为 char * (或 signed char *, unsigned chart *)
  4. struct 的指针转换为它的第一个成员的指针,或反之;
#include <stdio.h>

int main(void)
{
    int i = 10;

    printf("The value of i is %d\n", i);

    // 将指针转换为 void * 来比慢编译器警告。
    printf("And its address is %p\n", (void *)&i);
}

指针变量保存的是内存地址,也是一个整型值(对于 I32LP64 系统,int/float 是 32 位,double、long 和 pointer 都是 64 位)。所以,可以使用强制类型转换将一个整型字面量值类型转换为指针:

int *foo = (int *)(0x11111111);
// 常见的场景是宏定义:先将 0 转换为 void *, 然后再转换为任意类型指针, 再转换为
// intptr_t 值,再将 intptr_t 值转换为任意整型.
#define OFFSETOF(type, member) ((int)(intptr_t)&(((type *)(void*)0)->member) )

如果将类型 A 指针转换为另一个类型 B 指针,则 A 和 B 类型需要兼容 ,称为 strict aliasing (如通过 typedef 定义的 alias 就满足兼容性要求)。否则编译时 告警(非错误)

  • A 必须是指针类型,所以上面的整型强制转换为指针不受该规则约束。
// OK
int a = 1;
int *p = &a;

// OK: 任意类型可以转换为 void * 指针,void * 指针也可以转换为任意类型指针
int ap = (int *)(void *) 0x12345678;

// 非兼容,告警
int a = 1;
float *p = (float *)&a;

// 非兼容,告警
int a = 0x12345678;
short b = *((short *)&a);

int main(void)
{
	int32_t v = 0x12345678;
	struct words *pw = (struct words *)&v;  // 非兼容,告警
	fun(&v, pw);
}

指针运算
#

指针指向的值类型大小决定了指针运算的地址递进大小

  1. 减法:不是地址值直接相减的结果,而是 中间包含的元素数量 ,指针相减后的类型为 <stddef.h > 中定义的 ptrdiff_t ,使用 %td%tX 打印;(类似的 size_t 使用 %zd 或 %zX 来打印);
  2. 加法: p++; p+=n; 的结果 p 是在原来 p 值的基础上增加 n * sizeof(*p) ;
int cats[100];
// 数组名作为右值是代表指针, 故 cats + 20 表示第 20 个元素的地址
int *f = cats + 20;
int *g = cats + 60;
// 40, 即相差 40 个 int 元素
ptrdiff_t d = g - f;

int my_strlen(char *s) {
	char *p = s;
	while (*p != '\0')
		p++;
	return p - s;
}

// 使用使用前缀 t 来打印 ptrdiff_t 类型:
printf("%td\n", d);  // Print decimal: 40
printf("%tX\n", d);  // Print hex:     28


int a[] = {11, 22, 33, 44, 55, 999};
int *p = &a[0];
while (*p != 999) {
	printf("%d\n", *p);
	p++;
}

指针比较
#

  1. 指向 同一个数组或对象的不同位置 ,是可比较的。指向不同对象或数组时,比较结果未定义。
  2. 比较 不同类型的指针 时会提示警告(非错误),除非它们都转换为 void * 类型。
int arr[5] = {1, 2, 3, 4, 5};
int *p1 = &arr[1];
int *p2 = &arr[3];
if (p1 < p2) {
	// OK:p1 和 p2 指向同一个数组的不同位置
}

int x = 10;
int y = 20;
int *p3 = &x;
int *p4 = &y;
if (p3 == p4) {
	// 警告,但结果未定义:p3 和 p4 指向不同的对象
}

int *p1;
float *p2;
if (p1 == p2) {
	// 警告:不能直接比较不同类型的指针
}

if ((void*)p1 == (void*)p2) {
	// OK:将指针转换为 void* 后可以进行比较
}

// ({xx}): GNU 扩展的语句表达式语法。
// &_max1 == &_max2: 用来检测两个地址比较是否 OK,如果不 OK,编译器会给出告警:
// warning:comparison of distinct pointer types lacks a cast
// (void) (&_max1 == &_max2); 前的 void 用来消除未使用的表达式结果告警。
#define max(x, y) ({				\
	typeof(x) _max1 = (x);			\
	typeof(y) _max2 = (y);			\
	(void) (&_max1 == &_max2);      \
_max1 > _max2 ? _max1 : _max2; })

多级指针
#

  • 作为函数参数时,int **p 等效为 int *p[],p 指向一个 int * 类型的内存单元;
  • 二级指针的使用场景:在函数内修改二级指针指向的一级指针的值;
int modify(int **p)
{
	static int *state = (int *)0x88f9;
	*p = state; // 修改一级指针的值
	return *p
		}

int caller()
{
	int *p = NULL;
	modify(&p); // 在 modify 函数内修改指针 p 的值
}

const 指针
#

const 和指针结合使用时,顺序影响语义:

  1. p 可修改,但是指向的值不可修改: const int *p;int const *p;
  2. p 不可修改: int *const p; 如 p++ 报错,但是 p 指向的值可修改。
  3. p 和 p 指向的值都不可修改: const int *const p;
char a[] = "abcd";
const char *p = a;
p++;  // p 可以修改;
p[0] = 'A'; // Compiler error! Can't change what it points to

int *const p;   // We can't modify "p" with pointer arithmetic
p++;  // Compiler error!

int x = 10;
int *const p = &x;
*p = 20;   // Set "x" to 20, no problem

char **p;
p++;     // OK!
(*p)++;  // OK!

char **const p;
p++;     // Error!
(*p)++;  // OK!

char *const *p;
p++;     // OK!
(*p)++;  // Error!

char *const *const p;
p++;     // Error!
(*p)++;  // Error!

在进行 const 变量到非 const pointer 赋值时,编译器会告警:

const int x = 20;
int *p = &x;
//    ^       ^
//    |       |
//  int*    const int*

// initialization discards 'const' qualifier from pointer type target

*p = 40;  // 未定义行为

void 指针
#

void *p :可以保存任意指针类型,一般用作函数参数或返回值,具有如下限制:

  1. 不能使用指针算术运算;
  2. 不能使用 * 来 dereference void *;
  3. 不能使用 - > 运算符;
  4. 不能使用 p[N] 运算符,因为它也是 dereference 操作。

所以,在实际使用 void *p 前,需要将 p 转换为具体类型的指针:

  • 如 malloc() 返回的是 void * 类型指针,可以被赋值给任意类型指针:
// s1 和 s2 可以是任意类型的指针
void *memcpy(void *s1, void *s2, size_t n);
void *my_memcpy(void *dest, void *src, int byte_count)
{
	// 使用 src 和 dest 前,需要转换为具体类型的指针
	char *s = src, *d = dest;
	while (byte_count--) {
		*d++ = *s++;
	}
	return dest;
}

struct animal {
	char *name;
	int leg_count;
};

// malloc() 返回的是 void * 类型指针,需要转为为具体类型指针
int *p = malloc(sizeof(int));
*p = 12;
printf("%d\n", *p);  // 12
free(p);

void 类型转换
#

可以避免编译器发出 未使用变量 的警告(例如开启 -Wall 编译参数的情况):

#include <stdio.h>
#include <threads.h>
#include <stdatomic.h>

atomic_int x;

int thread1(void *arg)
{
	// 函数体内未使用 arg 变量,可以转换为 void 类型来避免编译告警。
	(void)arg;
}

struct
#

定义 struct 类型和变量:

struct point
{
	int x, y;
} first_point, second_point;

struct point
{
	int x, y;
};

// struct 关键字不可少,可以使用 typedef 简化。
struct point first_point, second_point;

初始化 struct
#

使用大括号表达式,可以指定成员初始化时,未指定的成员被自动填充为 0:

指定成员初始化的优点:对于有很多 field 的 struct 类型,可以只对自己关注的成员进行初始化,大大减少了初始化的工作量。后续 struct field 变化时,也不受影响。

struct point
{
	int x, y;
};

struct point
{
	int x, y;
} first_point = { 5, 10 };

// 全量初始化
struct point first_point = { 5, 10 };

// 部分成员初始化,未初始化的成员为 0
struct point first_point = { 5 };

// 指定成员初始化,未初始化的成员为 0
struct point first_point = { .y = 10, .x = 5 };

struct point first_point = { y: 10, x: 5 };

嵌套初始化: struct foo x = {.a.b.c=12};

struct rectangle
{
	struct point top_left, bottom_right;
};

// 大括号初始化
struct rectangle my_rectangle = { {0, 5}, {10, 0} };

// 指定初始化
struct spaceship s = {
        .manufacturer="General Products",
        .ci={
          .window_count = 8,
          .o2level = 21
        }
};

// 嵌套初始化
struct cabin_information {
	int window_count;
	int o2level;
};

struct spaceship {
	char *manufacturer;
	struct cabin_information ci;
};

int main(void)
{
	struct spaceship s = {
		.manufacturer="General Products",
		// 嵌套初始化
		.ci.window_count = 8,
		.ci.o2level = 21
	};

	printf("%s: %d seats, %d%% oxygen\n", s.manufacturer, s.ci.window_count, s.ci.o2level);
}

struct 数组初始化:

#include <stdio.h>

struct passenger {
	char *name;
	int covid_vaccinated;
};

#define MAX_PASSENGERS 8

struct spaceship {
	char *manufacturer;
  // 常量宏是编译期常量,可以作为数组的长度
	struct passenger passenger[MAX_PASSENGERS];
};

int main(void) {
	struct spaceship s = {
		.manufacturer="General Products",
		.passenger = {
			// 一次初始化一个成员
			[0].name = "Gridley, Lewis",
			[0].covid_vaccinated = 0,
			// 一次初始化所有成员
			[7] = {.name="Brown, Teela", .covid_vaccinated=1},
		}
	};
	printf("Passengers for %s ship:\n", s.manufacturer);

	for (int i = 0; i < MAX_PASSENGERS; i++)
		if (s.passenger[i].name != NULL)
			printf("    %s (%svaccinated)\n", s.passenger[i].name, s.passenger[i].covid_vaccinated? "": "not ");
}

匿名 struct 和 typedef 类型别名
#

// 匿名 struct 也代表一个类型,可以定义对应变量
struct {
	char *name;
	int leg_count, speed;
} a, b, c;

// 以下是赋值而非初始化,c.speed 的值是未知的。
a.name = "antelope";
c.leg_count = 4;

// 两种类型名都 OK
typedef struct animal {
	char *name;
	int leg_count, speed;
} animal;

struct animal y;
animal z;

// 只能使用 animal 类型
typedef struct {
	char *name;
	int leg_count, speed;
} animal;

//struct animal y;  // 错误
animal z;           // OK

函数使用 struct
#

struct 可以作为函数的参数和返回值,相同类型的 struct 变量间可以赋值,编译器会进行 bit-copy(但数组不能相互赋值)。所以,对于大的 struct 应该使用指针传参,而不是 struct 类型传参。

指针传参的另一个优点是使用一个寄存器即可(内核函数都要求函数使用 struct 指针),而使用 struct 传参时可能需要多个寄存器或 stack 来传参。

void set_price(struct car *c, float new_price) {
	(*c).price = new_price;
}

// 同类型 struct 值之间可以赋值
struct car a, b;
b = a;

struct/union 匿名成员
#

声明某个 union 或 struct 成员为匿名类型(不定义成员名),这样后续就可以像使用结构体成员一样来 直接访问匿名类型的成员

#include <stdio.h>

struct person {
	char *name;
	char gender;
	int age;
	int weight;

	// 匿名成员
	struct {
		int area_code;
		long phone_number;
	};
};

int main(void) {
	struct person p = {"jim", 'F', 28, 65, {21, 444444}};
	printf("%d\n", p.area_code);
	return 0;
}

// 匿名 union 成员
struct person {
	char *name;
	union {
		char gender;
		int id;
	};
	int age;
};

int main(void) {
	struct person jim = {"jim", 'F', 20};
	printf("jim.gender = %c, jim.id = %d\n", jim.gender, jim.id);
	return 0;
}

// 更复杂的情况
struct v
{
	union {
		struct { int i, j; };
		struct { long k, l; } w;
	};
	int m;
} v1;

v1.i = 2;   // valid
v1.k = 3;   // invalid: inner structure is not anonymous
v1.w.k = 5; // valid

自引用 struct
#

struct 内部只能使用指针来自引用类型本身:

#include <stdio.h>
#include <stdlib.h>

struct node {
	int data;
	struct node *next; // 指针 OK,但不能是 struct node next;
};

int main(void)
{
	struct node *head;

	head = malloc(sizeof(struct node));
	head->data = 11;
	head->next = malloc(sizeof(struct node));
	head->next->data = 22;
	head->next->next = malloc(sizeof(struct node));
	head->next->next->data = 33;
	head->next->next->next = NULL;

	for (struct node *cur = head; cur != NULL; cur = cur->next) {
		printf("%d\n", cur->data);
	}
}

struct 指针
#

指向 struct 的第一个成员,所以可以在两个 struct 间转换:

#include <stdio.h>

struct parent {
	int a, b;
};

struct child {
	struct parent super;  // MUST be first
	int c, d;
};

// Making the argument `void*` so we can pass any type into it (namely a struct parent or struct child)
void print_parent(void *p)
{
	// Expects a struct parent--but a struct child will also work because the pointer points to the struct parent in the first field:
	struct parent *self = p;

	printf("Parent: %d, %d\n", self->a, self->b);
}

void print_child(struct child *self)
{
	printf("Child: %d, %d\n", self->c, self->d);
}

int main(void)
{
	struct child c = {.super.a=1, .super.b=2, .c=3, .d=4};

	print_child(&c);
	print_parent(&c);  // Also works even though it's a struct child!
}

访问 struct 成员:
#

  1. 非指针类型: s.field;
  2. 指针类型: (*p).field 或者 p- >field;

可变长数组(VLA) :struct 最后一个成员 为长度可变的数组,也称为 Flexible Array Members:

  1. 传统实现方式:编译器扩展提供零长数组:char data[0]

        struct len_string {
       	int length;
       	char data[8];
        };
    
        struct len_string *s = malloc(sizeof *s + 40);
        s->length = 48;
        strcpy(s->data, "Hello, world!");
    
        // 或者使用编译器扩展的 0 长数组,这样 malloc 分配的额外空间,都可以给 data 成员用。
        struct len_string {
       	int length;
       	char data[0];
        };
    
        struct len_string *s = malloc(sizeof *s + 40);
        s->length = 40;
        strcpy(s->data, "Hello, world!");
    
  2. C99 为可变长数组增加了正式的支持(不依赖编译器扩展了),但不支持可变长数组的字面量初始化

        struct len_string {
         	int length;
          // 必须是 struct 最后一个成员,不指定大小;
         	char data[];
        };
    
        struct len_string *len_string_from_c_string(char *s)
        {
         	int len = strlen(s);
         	struct len_string *ls = malloc(sizeof *ls + len);
         	ls->length = len;
         	memcpy(ls->data, s, len);
         	return ls;
        }
    
        struct s { int n; double d[]; }; // s.d is a flexible array member
        struct s t1 = { 0 };          // OK, d is as if double d[1], but UB to access
        struct s t2 = { 1, { 4.2 } }; // error: initialization ignores flexible array
    
        // if sizeof (double) == 8
        struct s *s1 = malloc(sizeof (struct s) + 64); // as if d was double d[8]
        struct s *s2 = malloc(sizeof (struct s) + 40); // as if d was double d[5]
    
        s1 = malloc(sizeof (struct s) + 10); // now as if d was double d[1]. Two bytes excess.
        double *dp = &(s1->d[0]);    // OK
        *dp = 42;                    // OK
        s1->d[1]++;                  // Undefined behavior. 2 excess bytes can't be accessed as double.
    
        s2 = malloc(sizeof (struct s) + 6);  // same, but UB to access because 2 bytes are missing to complete 1 double
        dp = &(s2->d[0]);            //  OK, can take address just fine
        *dp = 42;                    //  undefined behavior
    
        *s1 = *s2; // only copies s.n, not any element of s.d except those caught in sizeof (struct s)
    

struct padding
#

结构体(struct)的内存对齐和填充(padding)是编译器为了 提高访问速度和兼容硬件架构的要求 而进行的。

  1. 内存对齐:内存对齐是指数据在内存中的 存储地址必须是其类型大小的整数倍 。不同的数据类型有不同的对齐要求。例如:

    1. char 类型通常对齐到 1 字节。
    2. short 类型通常对齐到 2 字节。
    3. int 类型通常对齐到 4 字节。
    4. double 类型通常对齐到 8 字节。
  2. 结构体的内存对齐和填充规则:结构体内存对齐的主要目的是 确保每个成员变量按照其对齐要求存储在内存中,从而提高访问速度 。为了实现对齐,编译器会在 结构体成员之间 插入填充字节(padding),以及 在结构体末尾 添加填充字节,以确保结构体的大小是其最大成员对齐要求的倍数。

规则

  1. 每个成员按其自身的对齐要求进行对齐:如果需要,编译器会在成员前面插入填充字节,确保成员地址是其对齐要求的整数倍。
  2. 结构体的总大小是最大对齐要求的倍数:结构体的总大小会被调整为其最大成员对齐要求的倍数,这可能会在结构体的末尾添加填充字节。

示例:考虑以下结构体定义:

struct Example {
    char a;    // 1 字节
    int b;     // 4 字节
    short c;   // 2 字节
};

编译器会对齐和填充这个结构体,使得其内存布局如下:

Offset  Member   Size
0       a        1
1-3     padding  3
4       b        4
8-9     c        2
10-11   padding  2

总大小为 12 字节,因为 int 类型的对齐要求为 4 字节,而 结构体大小需要是最大对齐要求(4 字节)的倍数

GNU #pragma pack
#

GNU C 编译器指令 #pragma pack 用来指定结构体的对齐要求。

// 设置对齐方式为 1 字节边界
#pragma pack(push, 1)
struct PackedExample {
 	char a;
 	int b;
 	short c;
};
  // 恢复默认对齐方式
#pragma pack(pop)

在这种情况下,PackedExample 结构体的内存布局如下:

Offset  Member   Size
0       a        1
1-4     b        4
5-6     c        2

总大小为 7 字节,因为所有成员按照 1 字节对齐方式排列。

GNU attribute:packed
#

GCC C 扩展值支持使用 __attribute__((packed)) 宏来关闭结构体的对齐要求,效果与 #pragma pack(1) 相同。

示例:使 PackedExample 结构体的总大小为 7 字节, 即没有任何填充字节

struct PackedExample {
    char a;
    int b;
    short c;
} __attribute__((packed));

GNU attribute:aligned
#

GNU C 扩展支持使用 __attribute__((aligned(N))) 宏来为对象或类型指定对齐要求:

struct __attribute__ ((aligned (8))) S { short f[3]; };
typedef int more_aligned_int __attribute__ ((aligned (8)));

char c1 = 3;
char c2 __attribute__((aligned(16))) = 4 ;

int main(void)
{
 	printf("c1: %p\n", &c1);
 	printf("c2: %p\n", &c2);
 	return 0;
}

/* c1: 00402000 */
/* c2: 00402010 */

struct data {
 	char a;
 	short b __attribute__((aligned(4)));
 	int c ;
};

/* size: 12 */
/* &s.a: 0028FF30 */
/* &s.b: 0028FF34 */
/* &s.c: 0028FF38 */

struct data{
 	char a;
 	short b ;
 	int c ;
} __attribute__((packed,aligned(8)));

int main(void)
{
 	struct data s;
 	printf("size: %d\n", sizeof(s));
 	printf("&s.a: %p\n", &s.a);
 	printf("&s.b: %p\n", &s.b);
 	printf("&s.c: %p\n", &s.c);
}

/* size: 8 */
/*  &s.a: 0028FF30 */
/*  &s.b: 0028FF31 */
/*  &s.c: 0028FF33 */

_Alignas 和 alignas 类型修饰符
#

在变量声明时为类型指定对齐规则的类型修饰符(type specifier)。

C11 开始,内置类型修饰符 _Alignas, C23 的标准库 <stdalign.h > 提供了更方便的 alignas 宏函数,返回值类型是 size_t。

  1. _Alignas ( expression 或 type) (since C11,语言内置类型)
  2. alignas ( expression 或 type) (since C23,需要引入 stdaligh.h 头文件)
// 按指定类型大小对齐
char alignas(int) c;

// 按指定大小(或常量表达式)值对齐
char alignas(8) c;

// 使用 <stddef.h> 中指定的类型最大对齐方式
char alignas(max_align_t) c;

例子:

#include <stdalign.h>
#include <stdio.h>

// every object of type struct sse_t will be aligned to 16-byte boundary
struct sse_t
{
 	alignas(16) float sse_data[4];
};

// every object of type struct data will be aligned to 128-byte boundary
struct data
{
 	char x;
 	alignas(128) char cacheline[128]; // over-aligned array of char, not array of over-aligned chars
};

int main(void)
{

 	printf("sizeof(data) = %zu (1 byte + 127 bytes padding + 128-byte array)\n", sizeof(struct data));
 	printf("alignment of sse_t is %zu\n", alignof(struct sse_t));

 	alignas(2048) struct data d; // this instance of data is aligned even stricter
 	(void)d; // suppresses "maybe unused" warning
}

GNU C 扩展提供了 alignas/__alignas__ 宏函数,适用于 C23 之前的版本。

#include <stdalign.h>
#include <stdio.h>

// every object of type struct sse_t will be aligned to 16-byte boundary (note: needs support for DR 444)
struct sse_t
{
    alignas(16) float sse_data[4];
};

// every object of type struct data will be aligned to 128-byte boundary
struct data
{
    char x;
    alignas(128) char cacheline[128]; // over-aligned array of char, not array of over-aligned chars
};

int main(void)
{
    printf("sizeof(data) = %zu (1 byte + 127 bytes padding + 128-byte array)\n", sizeof(struct data));

    printf("alignment of sse_t is %zu\n", alignof(struct sse_t));

    alignas(2048) struct data d; // this instance of data is aligned even stricter
    (void)d; // suppresses "maybe unused" warning
}

/* sizeof(data) = 256 (1 byte + 127 bytes padding + 128-byte array) */
/* alignment of sse_t is 16 */

Alignof 和 alignof 运算符
#

返回任意 类型(而非表达式) 的对齐字节数,参数为类型名称,返回值为 size_t 类型(位于 <stddef.h >),需要使用 %zu 打印。

  • _Alignof: since C11, 是内置操作符;
  • alignof: since C23, 由 <stdalign.h > 定义的宏函数;

示例:

#include <stdalign.h>
#include <stdio.h>     // for printf()
#include <stddef.h>    // for max_align_t

struct t {
 	int a;
 	char b;
 	float c;
};

int main(void)
{
 	printf("char       : %zu\n", alignof(char));
 	printf("short      : %zu\n", alignof(short));
 	printf("int        : %zu\n", alignof(int));
 	printf("long       : %zu\n", alignof(long));
 	printf("long long  : %zu\n", alignof(long long));
 	printf("double     : %zu\n", alignof(double));
 	printf("long double: %zu\n", alignof(long double));
 	printf("struct t   : %zu\n", alignof(struct t));
 	printf("max_align_t: %zu\n", alignof(max_align_t));
}

在 MacOS 上的执行结果:

char       : 1
short      : 2
int        : 4
long       : 8
long long  : 8
double     : 8
long double: 16
struct t   : 16
max_align_t: 16

GNU C 扩展也定义了 alignof/__alignof__ 宏函数,适用于 C23 之前的版本。

#include <stdalign.h>
#include <stddef.h>
#include <stdio.h>

int main(void)
{
 	printf("Alignment of char = %zu\n", alignof(char));
 	printf("Alignment of max_align_t = %zu\n", alignof(max_align_t));
 	printf("alignof(float[10]) = %zu\n", alignof(float[10]));
 	printf("alignof(struct{char c; int n;}) = %zu\n", alignof(struct {char c; int n;}));
}

offsetof 宏函数
#

由于 struct field 存在 padding,如果要获得 field 的实际偏移量,可以使用 C23 在 <stddef.h> 中提供的 offsetof 宏函数。

GNU C 扩展也定义了 offset 宏函数,适用于 C23 之前的版本。

#define offsetof(type, member)  __builtin_offsetof (type, member)

或者自定义 OFFSETOF 宏函数:# define OFFSETOF(type, member) ((int)(intptr_t)&amp;(((type **)(void**)0)- >member))

  • 原理:任意类型都可以和 void 之间相互转换,intptr_t 和 int 间也可以相互转换;
#include <stdio.h>
#include <stddef.h>

struct foo {
 	int a;
 	char b;
 	int c;
 	char d;
};

int main(void)
{
  // 返回 size_t 类型,使用 %zu 打印
 	printf("%zu\n", offsetof(struct foo, a)); // 0
 	printf("%zu\n", offsetof(struct foo, b)); // 4
 	printf("%zu\n", offsetof(struct foo, c)); // 8
 	printf("%zu\n", offsetof(struct foo, d));// 12
}

Bit Field
#

Bit Field 的作用是减少 struct 的空间占用,使用指定的大小而类型的标准大小:

  1. 需要成员类型为整型:int, char, long int 等.
  2. 总空间大小取决于 field 位数,编译器可能会按需插入 padding;
struct card
{
 	unsigned int suit : 2; // 可以赋值:0-3
 	unsigned int face_value : 4; // 可以赋值:0-15
};

the range of an unsigned bit field of N bits is from 0 to 2^N - 1, and the range of a signed bit field of N bits is from -(2^N) / 2 to ((2^N) / 2) - 1.

相邻 bit-field 合并:

// sizeof(struct foo) == 12,a 和 c 会合并到一个 int 的 4 Byte 中
struct foo {
 	unsigned int a:1;
 	unsigned int c:1;
 	unsigned int b;
 	unsigned int d;
};

非相邻的 bit-field 不会被合并, 但可能会被自动插入 padding 来对齐:

// sizeof(struct foo) == 16
struct foo {
 	unsigned int a:1;   // a 和 b 不会被合并,为了确保 b 对齐,a 和 b 之间插入 3 Byte padding。
 	unsigned int b;
 	unsigned int c:1;
 	unsigned int d;
};

unnamed bit-fields:有些 bit-field 并不会使用,只是为了占空间,可以不命名:

struct foo {
 	unsigned char a:2;
 	unsigned char :5;   // <-- unnamed bit-field!
 	unsigned char b:1;
};

zero-width unnamed bit-field:告诉编译器开始使用新的 int 来分配后续的 field:

struct foo { // a 和 b 使用一个 int,c 和 d 使用另一个 int
 	unsigned int a:1;
 	unsigned int b:2;
 	unsigned int :0;   // <--Zero-width unnamed bit-field
 	unsigned int c:3;
 	unsigned int d:4;
};

union
#

union 可以定义多个成员,它们都共享同一个内存空间,所以一般写入和读取同一个成员才有意义(但是也可以利用该特点来读写不同的成员)。

定义联合类型和变量值:

  • union xx 作为一个整体,是一个类型(和 struct/enum 类似)。
  • 各成员定义之间用分号分割;(和 struct/bit-field 类似,但是 enum 成员是逗号分割。)
union numbers
{
	int i;
	float f;
} first_number, second_number;

union numbers first_number, second_number;

初始化:

union numbers
{
	int i;
	float f;
};
union numbers first_number = { 5 }; // 初始化第一个成员
union numbers first_number = { f: 3.14159 }; // 初始化指定成员
union numbers first_number = { .f = 3.14159 }; // 建议的方式

union numbers
{
	int i;
	float f;
} first_number = { 5 };

访问成员:使用 . 或 ->, 和 struct 类似:

union numbers
{
	int i;
	float f;
};
union numbers first_number;
first_number.i = 5;
first_number.f = 3.9;

union numbers *second_number =&first_number;
second_number->i = 6;

union 大小为占用空间最大的成员的大小:union 同时只能使用一个成员,所以写一个成员时会覆盖以前设置的另一个成员值,可能导致后续的访问另一个成员值无效。

  • union 不需要 padding,因为只占用最大成员的空间,该成员肯定是对齐的。
// This size of a union is equal to the size of its largest member. Consider the first union example from this section:
union numbers
{
	int i;
	float f;
};

#include <stdio.h>

union foo {
	float b;
	short a;
};

int main(void)
{
	union foo x;
	x.b = 3.14159;
	printf("%f\n", x.b);  // 3.14159, fair enough
	printf("%d\n", x.a);  // But what about this?
}

// 3.141590
// 4048

GNU C 扩展: Cast to a Union Type

union foo { int i; double d; };
int x;
double y;
union foo z;

// both x and y can be cast to type union foo and the following assignments
z = (union foo) x;
z = (union foo) y;

// are shorthand equivalents of these
z = (union foo) { .i = x };
z = (union foo) { .d = y };

union 数组:

union numbers
{
	int i;
	float f;
};
union numbers number_array [3] = { {3}, {4}, {5} };

union numbers number_array [3];
number_array[0].i = 2;

union 中匿名 struct:

struct {
	int x, y;
} s;
s.x = 34;
s.y = 90;
printf("%d %d\n", s.x, s.y);

union foo {
	struct {       // unnamed!
		int x, y;
	} a;

	struct {       // unnamed!
		int z, w;
	} b;
};
union foo f;
f.a.x = 1;
f.a.y = 2;
// 或
f.b.z = 3;
f.b.w = 4;

union 指针:

#include <stdio.h>

// foo 的这些成员共享同一块内存
union foo {
	int a, b, c, d, e, f;
	float g, h;
	char i, j, k, l;
};

int main(void)
{
	union foo x;

	int *foo_int_p = (int *)&x;  // 都指向 x 内存的开始
	float *foo_float_p = (float *)&x;

	x.a = 12;
	printf("%d\n", x.a);           // 12
	printf("%d\n", *foo_int_p);    // 12, again

	x.g = 3.141592;
	printf("%f\n", x.g);           // 3.141592
	printf("%f\n", *foo_float_p);  // 3.141592, again
}

// 反向也 OK
union foo x;
int *foo_int_p = (int *)&x;             // Pointer to int field
union foo *p = (union foo *)foo_int_p;  // Back to pointer to union
p->a = 12;  // This line the same as...
x.a = 12;   // this one.

union 中公共初始化序列:If you have a union of structs, and all those structs begin with a common initial sequence, it’s valid to access members of that sequence from any of the union members.

#include <stdio.h>

struct common {
	int type;   // common initial sequence
};

struct antelope {
	int type;   // common initial sequence

	int loudness;
};

struct octopus {
	int type;   // common initial sequence

	int sea_creature;
	float intelligence;
};

union animal {
	struct common common;
	struct antelope antelope;
	struct octopus octopus;
};

#define ANTELOPE 1
#define OCTOPUS  2

void print_animal(union animal *x)
{
	switch (x->common.type) {
    case ANTELOPE:
  		printf("Antelope: loudness=%d\n", x->antelope.loudness);
  		break;
    case OCTOPUS:
  		printf("Octopus : sea_creature=%d\n", x->octopus.sea_creature);
  		printf("          intelligence=%f\n", x->octopus.intelligence);
  		break;
    default:
      printf("Unknown animal type\n");
	}
}

int main(void)
{
	union animal a = {.antelope.type=ANTELOPE, .antelope.loudness=12};
	union animal b = {.octopus.type=OCTOPUS, .octopus.sea_creature=1, .octopus.intelligence=12.8};

	print_animal(&a);
	print_animal(&b);
}

union 匿名成员:

  • 可以直接访问匿名成员;
  • Similar to struct, an unnamed member of a union whose type is a union without name is known as anonymous union. Every member of an anonymous union is considered to be a member of the enclosing struct or union keeping their union layout. This applies recursively if the enclosing struct or union is also anonymous.
struct v
{
	union // anonymous union
	{
		struct { int i, j; }; // anonymous structure
		struct { long k, l; } w;
	};

	int m;
} v1;

v1.i = 2;   // valid
v1.k = 3;   // invalid: inner structure is not anonymous
v1.w.k = 5; // valid

C 函数支持传入和返回 struct/enum/union(浅拷贝)类型及它们的指针,但是不支持返回数组类型:

#include <stdio.h>

struct foo {
	int x, y;
};

struct foo f(void)
{
	return (struct foo){.x=34, .y=90};
}

int main(void)
{
	struct foo a = f();  // Copy is made
	printf("%d %d\n", a.x, a.y);
}

enum
#

枚举成员名称占据 file 或 block scope ,故不同枚举类型的成员名称不能相同。

  • union、struct 的 field 是局限在对应的对象空间。

enum 类型的占用空间取决于最大的枚举值,一般未指定时为 unsigned int:

  • 枚举值默认为前一个成员值 + 1,第一个成员默认值为 0;
  • 两个枚举成员的值可以相同;
enum app_status {PENDING, RUNNING, CANCELD, DONE, FAILED}; // 定义枚举类型
enum app_status {PENDING, RUNNING, CANCELD=10, DONE, FAILED}; // DONE=11FAILED=12

// 值可以重复
enum {
	X=2,
	Y=2,
	Z=2
};

enum {
	A,    // 0, default starting value
	B,    // 1
	C=4,  // 4, manually set
	D,    // 5
	E,    // 6
	F=3   // 3, manually set
	G,    // 4
	H     // 5
};

// 最后一个成员后面可以加逗号
enum {
	X=2,
	Y=18,
	Z=-2,
};

// 声明枚举类型的同时定义变量
enum resource {
	SHEEP,
	WHEAT,
	WOOD,
	BRICK,
	ORE
} r = BRICK, s = WOOD;

// 匿名 enum
enum {
	SHEEP,
	WHEAT,
	WOOD,
	BRICK,
	ORE
} r = BRICK, s = WOOD;

使用:

  • 枚举值是编译时常量,所以成员一般使用大写命名。
  • 枚举值为整型值, 所以可以用在任何需要整型值的地方,如定义数组的长度,以及 switch case 的值;
    • 不支持取地址操作。
// 定义枚举类型
enum resource {
	SHEEP,
	WHEAT,
	WOOD,
	BRICK,
	ORE
};

// 声明枚举类型的变量:
enum resource r = BRICK; // 枚举成员名称位于全局作用域,所以可以直接使用。
if (r == BRICK) {
	printf("I'll trade you a brick for two sheep.\n");
}

// 定义枚举类的同时, 定义两个变量值.
enum color { RED, GREEN, BLUE } c = RED, *cp = &c;

// 枚举值可以作为编译时常量使用.
int myarr[WOOD];
enum color { RED, GREEN, BLUE } r = RED;

switch(r)
{
  case RED:
  	puts("red");
  	break;
  case GREEN:
  	puts("green");
  	break;
  case BLUE:
  	puts("blue");
  	break;
}
enum { TEN = 10 };
struct S { int x : TEN; }; // OK

// enum 值可以用于需要整型值的地方
enum { ONE = 1, TWO } e;
long n = ONE; // promotion
double d = ONE; // conversion
e = 1.2; // conversion, e is now ONE
e = e + 1; // e is now TWO

#include <stdio.h>
enum Color {
	RED,
	GREEN,
	BLUE
};
int main() {
	enum Color color = RED;
	int *pColor = &color; // 可以取枚举变量的地址
	// int *pRed = &RED; // 错误:不能取枚举成员的地址

	return 0;
}

typedef 重命名枚举类型:

typedef enum {
	SHEEP,
	WHEAT,
	WOOD,
	BRICK,
	ORE
} RESOURCE;

RESOURCE r = BRICK; // OK

enum color { RED, GREEN, BLUE };
typedef enum color color_t;

color_t x = GREEN; // OK

可以在 struct/union 成员中定义枚举类型, 但在外围也可以使用该枚举:

struct Element
{
	int z;
	enum State { SOLID, LIQUID, GAS, PLASMA } state;
} oxygen = { 8, GAS };

void foo(void)
{
	enum State e = LIQUID; // OK
	printf("%d %d %d ", e, oxygen.state, PLASMA); // prints 1 2 3
}

成员分隔符:

  • enum:逗号;
  • union/struct/bit-field:分号;

前向声明
#

前向声明,也称 Incomplete Types,用于解决源文件中类型的相互(循环)依赖问题:

  • 前向声明是一个未完成定义的类型,而不是外部变量。
  • 必须是 指针或 extern 变量, 因为即使不知道数组的定义,但是它的指针变量大小是固定的。

Incomplte Types:

  1. 声明一个 struct、union、enum 类型,但是没有指定它们的字段定义;
  2. void 类型也是 Incomplte Types;

Completing Incomplete Types:对于 Incomplte Types,可以通过 struct、union、enum 定义来完成它的定义。

只有看到了前向声明的完整定义后,函数或表达式才能使用该类型的值。

struct foo;        // incomplete type
struct foo *p;     // pointer, no problem
// struct foo f;   // Error: incomplete type!

struct foo {
	int x, y, z;
};                 // Now the struct foo is complete!
struct foo f;      // Success!

// void 类型也是 incomplete type
void *p;             // OK: pointer to incomplete type

使用场景:

  1. struct 自引用:

          struct node {
           	int val;
           	struct node *next;  // struct node is incomplete, but that's OK!
          };
    
          struct a {
         	  struct b *x;
          };
    
          struct b {
         	  struct a *x;
          };
    
  2. 头文件中声明数组变量:

          // File: bar.h
          #ifndef BAR_H
          #define BAR_H
          extern int my_array[];  // Incomplete type
          #endif
    
          // File: bar.c
          int my_array[1024];     // Complete type!
    
          // File: foo.c
          #include <stdio.h>
          #include "bar.h"    // includes the incomplete type for my_array
    
          int main(void)
          {
            my_array[0] = 10;
            printf("%d\n", my_array[0]);
          }
    
          // gcc -o foo foo.c bar.c
    
  3. 头文件中循环依赖:

    • struct/union/enum 类型全局不能重复定义, 所以它们一般在 C 源文件而非头文件中定义,源文件对应的头文件中只做类型的前向声明。
    • 对于使用这些类型的其它头文件(如函数参数类型, struct 字段类型等), 需要在头文件前部 include 包含它们前向声明的头文件,或者自己直接前向声明这些类型即可。
      // 前向声明,该类型实际在其它文件中定义。
      struct Rect;
    
      // 同一文件或其它文件可以重复前向声明。
      struct Rect;
    
      // 函数原型声明,使用前向声明的类型 struct Rect。
      bool is_in_rect(My_Point point, struct Rect rect);
    

常见的 Incomplete Type 编译错误消息:

  • invalid application of ‘sizeof’ to incomplete type
  • invalid use of undefined type
  • dereferencing pointer to incomplete type
  • you probably forgot to #include the header file that declares the type.

C99 复合字面量
#

大括号初始化表达式有一定局限性:

  • 只能用于初始化,不能用于后续赋值;
  • 只能使用常量表达式;

C99 支持复合字面量(Compound Literals),可用于对 array、struct、union、pointer 进行 =初始化和后续赋值=:

  • 格式:(COMPOUND_TYPE){xx} , 其中 COMPOUND_TYPE 是复合类型,相比大括号初始化赋值,多了前面的 (COMPOUND_TYPE);
  • 对于数组,一旦初始化后,不支持作为左值进行赋值,所以不能用复合字面量赋值。
  • struct/union 初始化后,后续可以用复合字面量进行赋值。
  • 复合字面量中的值可以是常量表达式或 变量
  • 复合字面量也支持取地址操作。

复合字面量使用场景:

  1. 全局或局部变量初始化;
  2. 变量赋值;
  3. 函数传参时,可以直接传递一个复合字面量,这时可以省去一个临时变量;
#include <stdio.h>

struct MyStruct {
	int a;
	float b;
};

int main() {
	int val1 = 10;
	float val2 = 3.14;

	// 使用大括号初始化,只能使用常量表达式,而且只能用于初始化。
	struct MyStruct myStruct1 = {.a = 1, .b = 0.1};

	// myStruct1 = {.a = 1, .b = 0.1}; // 错误:初始化后,不能再使用大括号来赋值。

	// 使用复合字面量初始化,可以使用变量,和后续赋值
	struct MyStruct myStruct2 = (struct MyStruct){ .a = val1, .b = val2 };
	// 赋值,OK
	myStruct2 = (struct MyStruct){ .a = val1, .b = 4.5 };

	printf("a: %d, b: %.2f\n", myStruct.a, myStruct.b);

	// 使用复合字面量初始化指针
	int *p = (int []){1 ,2 ,3 ,4};

	return 0;
}

struct MyStruct {
	int a;
	float b;
};
struct MyStruct globalStruct = (struct MyStruct){ .a = 10, .b = 3.14 };

union MyUnion {
	int i;
	float f;
};
union MyUnion globalUnion = (union MyUnion){ .i = 10 };

// 对于数组,只能是初始化时使用复合字面量,初始化后不能再赋值
int globalArray[] = (int[]){ 1, 2, 3, 4, 5 };

struct MyStruct {
	int a;
	float b;
};
// 复合字面量指针
struct MyStruct *globalStructPtr = &(struct MyStruct){ .a = 10, .b = 3.14 };
int *globalArrayPtr = (int[]){ 1, 2, 3, 4, 5 };

复合字面量是 block 作用域:

#include <stdio.h>

int *get3490(void)
{
	// Don't do this
	return &(int){3490};
}

int main(void)
{
	printf("%d\n", *get3490());  // INVALID: (int){3490} fell out of scope
}

int *p;
{
	p = &(int){10};
}

printf("%d\n", *p);  // INVALID: The (int){10} fell out of scope

隐式类型转换
#

隐式类型转换是编译器自动进行的类型转换。

在算术运算中,C 自动将较低精度的类型提升为较高精度的类型,以保证运算的精度和结果的正确性:

  1. 整型提升:所有的 char/short/enum 等比 int 小的类型,在参与运算时会被提升为 int 类型
    • 如果 int 类型不能表示 char 和 short 类型的所有值,则提升为 unsigned int。
  2. 浮点型提升:float 类型在参与运算时会被提升为 double 类型
int a = ~0xFF; // 0xFF 先转换为 int 类型 0x000000FF,再取反,所以结果为 0xFFFFFF00
int a = ~0; // 0 被转换为 int,然后再取反,结果为 0xFFFFFFFF

char a = 10;
short b = 20;
int result = a + b; // a 和 b 被提升为 int 类型进行运算

float x = 1.2f;
double y = 2.4;
double result2 = x + y; // x 被提升为 double 类型进行运算

#include <stdio.h>
int main()
{
	char a = 30, b = 40, c = 10;
	char d = (a * b) / c; // 自动转换为 int 再计算,所以不会计算溢出
	printf ("%d ", d);
	return 0;
}

当不同类型的操作数进行运算时,C 语言会根据一定的规则将它们转换为相同的类型:

  1. 整型与浮点型运算:整型会被提升为浮点型。
  2. 不同大小的整型运算:较小的整型会被提升为较大的整型。
    • 如果一个操作数是 long double,另一个操作数将被转换为 long double。
    • 如果一个操作数是 double,另一个操作数将被转换为 double。
    • 如果一个操作数是 float,另一个操作数将被转换为 float。
    • 如果一个操作数是 unsigned long long,另一个操作数将被转换为 unsigned long long。
    • 如果一个操作数是 long long,另一个操作数将被转换为 long long。
    • 如果一个操作数是 unsigned long,另一个操作数将被转换为 unsigned long。
    • 如果一个操作数是 long,另一个操作数将被转换为 long。
    • 如果一个操作数是 unsigned int,另一个操作数将被转换为 unsigned int。
int a = 5;
double b = 6.7;
double result = a + b; // a 被提升为 double 类型进行运算

short c = 3;
long d = 4;
long result2 = c + d; // c 被提升为 long 类型进行运算

在赋值运算中,右值会被转换为左值的类型:

double a = 3.14;
int b = a; // a 被转换为 int 类型,结果为 3

在条件表达式中,两个操作数会被转换为相同的类型:

int a = 5;
double b = 6.7;
double result = (a < b) ? a : b; // a 被提升为 double 类型进行运算

显式类型转换
#

语法:(type_name) expression

对表达式结果做类型转换:

float x;
int y = 7;
int z = 3;
x = (float) (y / z);
x = (y / (float)z);

显示类型转换只能适用于 scalar 类型,如整型、浮点型和指针,不支持对 array、struct 类型做强制类型转换,需要先将它们转换为指针类型;

struct fooTag { /* members ... */ };
struct fooTag foo;

unsigned char byteArray[8];

foo = (struct fooType) byteArray; // 错误

void * 可以和任何指针类型之间转换(故 malloc 返回该指针类型值):

int x = 10;
void *p = &x;
int *q = p;

# define OFFSETOF(type, member) ((int)(intptr_t)&(((type *)(void*)0)->member) )

  • 原理:任意类型都可以和 void 之间相互转换,intptr_t 和 int 间也可以相互转换;

数值和字符串间转换
#

数值转换为字符串: stdio.h 中的 sprintf/snprintf 函数:

#include <stdio.h>

int main(void)
{
	char s[10];
	float f = 3.14159;
	// Convert "f" to string, storing in "s", writing at most 10 characters including the NUL terminator
	snprintf(s, 10, "%f", f);
	printf("String value: %s\n", s);  // String value: 3.141590
}

字符串到数值:stdlib.h 中的 atoX 和 strtoX 函数:

Function Description
atoi String to int
atof String to float
atol String to long int
atoll String to long long int

或者(更好的方式):

Function Description
strtol String to long int
strtoll String to long long int
strtoul String to unsigned long int
strtoull String to unsigned long long int
strtof String to float
strtod String to double
strtold String to long double

atox 的问题主要是不能判断返回的 0 是否是真实值还是出错情况。strtoX 的优点:

  1. 可以指定输入数据的 base;
  2. 可以指示是否出错(传入一个 char **p):
#include <stdio.h>
#include <stdlib.h>

int main(void)
{

	char *pi = "3.14159";
	float f;
	f = atof(pi);
	printf("%f\n", f);

	int x = atoi("what");  // "What" ain't no number I ever heard of。返回值 0.
	char *s = "101010";  // What's the meaning of this number?
	// Convert string s, a number in base 2, to an unsigned long int.
	unsigned long int x = strtoul(s, NULL, 2);
	printf("%lu\n", x);  // 42

	char *s = "34x90";  // "x" is not a valid digit in base 10!
	char *badchar;  // 一个字符指针变量

	// Convert string s, a number in base 10, to an unsigned long int.
	// 传入 badchar 的地址,这样 strtoul 在出错时可以修改它的值
	unsigned long int x = strtoul(s, &badchar, 10);

	// It tries to convert as much as possible, so gets this far:
	printf("%lu\n", x);  // 34
	// But we can see the offending bad character because badchar points to it!
	printf("Invalid character: %c\n", *badchar);  // "x"

	char *s = "3490";  // "x" is not a valid digit in base 10!
	char *badchar;
	// Convert string s, a number in base 10, to an unsigned long int.
	unsigned long int x = strtoul(s, &badchar, 10);

	// Check if things went well
	if (*badchar == '\0') {
		printf("Success! %lu\n", x);
	} else  {
		printf("Partial conversion: %lu\n", x);
		printf("Invalid character: %c\n", *badchar);
	}
}

typedef 类型别名
#

typedef 用于定义重命名类型。

typedef 是文件作用域,可以在多个文件中重复定义,但是定义必须一致(一般在头文件中使用 typedef),否则编译器告警。

typedef 是编译器处理的表达式语句而非预处理语句,故需要分号结尾;

typedef int antelope;  // Make "antelope" an alias for "int"
antelope x = 10;       // Type "antelope" is the same as type "int"
typedef int antelope, bagel, mushroom;  // These are all "int"

struct animal {
    char *name;
    int leg_count, speed;
};

//  original name      new name
//            |         |
//            v         v
//      |-----------| |----|
typedef struct animal animal;

struct animal y;  // OK
animal z;         // OK

typedef float app_float;
app_float f1, f2, f3;

// 指针别名
typedef int *intptr;
int a = 10;
intptr x = &a;

// 数组别名
typedef int five_ints[5];
five_ints x = {11, 22, 33, 44, 55};

typedef struct { double hi, lo; } range;
range z, *zp;

用于给类型重命名,可以重复定义,但是需要确保定义一致:

// declares char_t to be an alias for char
// char_p to be an alias for char*
// fp to be an alias for char(*)(void)
typedef char char_t, *char_p, (*fp)(void);

// A typedef for a VLA can only appear at block scope. The length of the array is evaluated each
// time the flow of control passes over the typedef declaration, as opposed to the declaration of
// the array itself:
void copyt(int n)
{
	typedef int B[n]; // B is a VLA, its size is n, evaluated now
	n += 1;
	B a; // size of a is n from before +=1
	int b[n]; // a and b are different sizes
	for (int i = 1; i < n; i++)
		a[i-1] = b[i];
}

// array of 5 pointers to functions returning pointers to arrays of 3 ints
int (*(*callbacks[5])(void))[3]
// same with typedefs
typedef int arr_t[3]; // arr_t is array of 3 int
typedef arr_t* (*fp)(void); // pointer to function returning arr_t*
fp callbacks[5];

#if defined(_LP64)
typedef int     wchar_t;
#else
typedef long    wchar_t;
#endif

typdef name 可能是 incomplete type(如前先声明的 struct 类型):

// tnode in ordinary name space is an alias to tnode in tag name space
typedef struct tnode tnode;
// now tnode is also a complete type
struct tnode {
	int count;
	tnode *left, *right; // same as struct tnode *left, *right;
};
// same as struct tnode s, *sp;
tnode s, *sp;

typedef int A[]; // A is int[]
A a = {1, 2}, b = {3,4,5}; // type of a is int[2], type of b is int[3]

变量声明和定义
#

变量、函数都必须先声明才能使用(注意不是先定义再使用,定义意味着声明), 声明可以位于全局作用域,也可以位于函数内的局部作用域。

整个程序范围内,类型和函数只能定义一次,但是可以多次声明。

单行可以声明多个变量并初始化:

int a, b, c=0;
int a=0, b=0, c=0;

// 指针是和标识符结合的。
int *ap=NULL, b=0, *cp=&c; // ap 和 cp 是指针变量,b 是 int 类型变量

int a, b, c;
a=b=c=0; // 赋值是表达式,有结果值,故可以传递。

重复定义或声明的规则:

  1. 宏定义:可以重复定义,但重复定义必须一致,否则编译器警告;
  2. typedef 类型定义:可以重复定义,但重复定义必须一致,否则编译器报错;
  3. 函数原型声明、extern 变量或常量声明:可以重复声明,但重复声明必须一致,否则编译器报错;
  4. struct/union/enum 前向声明(incomplete types):允许重复前向声明;

C 不允许重复定义的情况(由于只能定义一次,而头文件可能被多个源文件重复包含,所以这些定义一般只在单个 C 文件而非头文件中定义)

  1. 常量、变量、函数定义;
  2. struct/union/enum 类型定义;

局部变量:可以用相同类型的任意表达式初始化;

全局变量:全局变量的初始值会被保存到编译后的可执行程序中,所以必须是编译时可定的常量表达式来初始化,该表达式有如下限制:

  1. 不允许动态内存分配;
  2. 不能调用函数;

C 常量表达式类型包括:

  1. 字面量常量:整数、浮点数、字符、字符串字面量。
  2. 枚举常量:在 enum 声明中定义的枚举值。
  3. sizeof 表达式。
  4. _Alignof 表达式。
  5. 常量组合表达式:包含常量操作数的算术或逻辑表达式。
  6. C99 支持的复合字面量。

注意:上面是对全局变量初始化值的限制(必须是编译时确定的常量表达式),但是 全局变量类型 是没有限制的,基本类型/struct/union/enum/pointer 等都是可以的。

// 宏定义常量和运算。
#define SIZE 10
#define MULTIPLIED_SIZE (SIZE * 2)
#define IS_POSITIVE (SIZE > 0)

// 有效的初始化
int globalInt1 = 10;
float globalFloat1 = 3.14;
char *globalStr1 = "Hello";

// 无效的初始化
int globalInt2 = someFunction(); // 错误:运行时计算
float globalFloat2 = globalInt1 * 2; // 错误:非常量表达式

// 全局变量:自动初始化为零值
int globalInt3; // 初始化为 0
float globalFloat3; // 初始化为 0.0f
char *globalStr2; // 初始化为 NULL

// struct 类型全局变量
struct MyStruct {
	int a;
	float b;
};
struct MyStruct globalStruct = {10, 3.14}; // 有效
struct MyStruct globalStruct2; // 初始化为 0 值: {0, 0.0f}

对于全局变量、静态变量,如果未初始化,默认值为类型 0 值,如 0/NULL/""。但是对于局部变量(自动变量),如果未初始化,值是未定义的。

函数体内未使用的局部变量会被编译器警告( 如启用 -Wall 时),可以转换为 void 类型来避免编译告警: (void)arg

变量修饰符:

  1. 4 种 Type Qualifiers:const,volatile,restrict,_Atomic
  2. 5 种 storage class: auto, extern, register, and static,_Thread_local

scope
#

scope 表示标识符的有效性或可见性:标识符从声明的位置开始有效,直到文件结束或函数返回的位置;

C 支持 4 类 Scope:

  1. 块作用域;
  2. 文件作用域;
  3. 函数作用域;
  4. 函数原型作用域

隐式的全局作用域:即多个文件作用域的组合。

最佳实践:将函数签名、typedef 定义、宏常量或函数定义、extern 类型的常量或变量、struct/union/enum 的前向声明(它们都允许在多个源文件重复声明、定义)放到头文件中,然后被其它源文件包含,这样可以实现程序全局统一的定义和声明。

int foo(double); // 声明declaration
int foo(double x){ return x; } // 定义

extern int n; // 声明
int n = 10; // 定义

struct X; // 前向声明
struct X { int n; }; // 定义

block scope:变量定义或声明可以位于函数内任意位置(C99 支持),只要在使用前声明或定义即可。

  • 外层 block 中的变量可以在内层 block 中使用,反之则不行。
#include <stdio.h>

int main(void)
{
	int a = 12;
	if  (a == 12) {
		int b = 99;
		printf("%d %d\n", a, b);
	}
	printf("%d\n", a);
	// printf("%d\n", b);  // 错误
}

变量隐藏:内层 scope 中的变量定义隐藏(非删除)外层的同名变量:

#include <stdio.h>

int main(void)
{
	int i = 10;
	{
		int i = 20;
		printf("%d\n", i);  // Inner scope i, 20 (outer i is hidden)
	}
	printf("%d\n", i);  // Outer scope i, 10
}

for-loop scope:C11 开始支持:

for (int i = 0; i < 10; i++)
	printf("%d\n", i);
printf("%d\n", i);  // 错误:i 只在 for-loop 的 block 中有效

label:函数作用域,不能跨函数。

const
#

常量必须在定义时初始化,而且后续不能被修改:

// 常量必须在定义时初始化
const int const_i = 1;
// 错误:
// const int const_i;
// const_i = 1;

const int x = 2;
// x = 4;  // 错误,常量不能被修改

void foo(const int x)
{
    printf("%d\n", x + 30);  // OK
}

const 和指针结合使用时,顺序影响语义。

  • const int *p; // Can’t modify what p points to
  • int const *p; // Can’t modify what p points to, just like the previous line
  • int *const p; // We can’t modify “p” with pointer arithmetic
  • const int *const p; // Can’t modify p or *p!
char a[] = "abcd";
const char *p = a;
p++;  // p 自身可以被修改;
p[0] = 'A'; // 错误:但是 p 指向的内层不能被修改

int *const p;   // p 自身是常量
p++;  // 错误

int x = 10;
int *const p = &x; // p 不能修改,但是指向的内存可以修改
*p = 20;   //  OK

char **p;
p++;     // OK!
(*p)++;  // OK!

char **const p;
p++;     // Error!
(*p)++;  // OK!

char *const *p;
p++;     // OK!
(*p)++;  // Error!

char *const *const p;
p++;     // Error!
(*p)++;  // Error!

将 const 变量地址赋值到非 const 类型指针变量时,编译器可能警告:

const int x = 20;
int *p = &x;
//    ^       ^
//    |       |
//  int*    const int*

// initialization discards 'const' qualifier from pointer type target

*p = 40;  // Undefined behavior--maybe it modifies "x", maybe not!

restrict
#

restrict 是 C99 引入的特性。

restrict 用于修饰指针类型, 告诉编译器只会用该指针对内存进行修改, 而不会用其它指针或修改方式, 这样编译器可以做优化,但如果用户不遵守这个约定,行为是未定义的。

如果用于数组, 表示对数组的各元素使用上面的语义。

void f(int n, int * restrict p, int * restrict q)
{
	while (n-- > 0)
		*p++ = *q++;
	// none of the objects modified through *p is the same as any of the objects read through *q
	// compiler free to optimize, vectorize, page map, etc.
}

void g(void)
{
	extern int d[100];
	f(50, d + 50, d); // OK
	f(50, d + 1, d);  // Undefined behavior: d[1] is accessed through both p and q in f
}

// restrict 类型的指针可以赋值给非 restrict 类型指针
void f(int n, float * restrict r, float * restrict s)
{
	float *p = r, *q = s; // OK
	while (n-- > 0)
		*p++ = *q++; // almost certainly optimized just like *r++ = *s++
}

volatile
#

volatile 告诉编译器,相关的读写语句不能被优化掉,主要是读写硬件寄存器场景。

例如 MEMIO 读写设备寄存器时必须每次都直接读写内存地址代表的设备寄存器来获取和设置值,编译器不能缓存或优化掉相关的读写语句。

volatile float currentTemperature = 40.0;
volatile int *p;

atomic
#

https://beej.us/guide/bgc/html/split-wide/chapter-atomics.html

Atomic 修饰符:

#include <stdio.h>
#include <stdatomic.h>

int main(void)
{
	struct point {
		float x, y;
	};

	_Atomic(struct point) p;
	struct point t;

	p = (struct point){1, 2};  // Atomic copy

	//printf("%f\n", p.x);  // Error

	t = p;   // Atomic copy

	printf("%f\n", t.x);  // OK!
}

auto
#

对于函数的 local variable,默认是 auto 的,所以一般不加该关键字。

void foo (int value)
{
  auto int x = value;
  //…
  return;
}

static
#

和 auto 相反,当用于函数内部变量时,表示函数返回后变量继续有效,后续调用该函数时值为上次设置的值,也称为 static storage duration,也即它的生命周期是整个程序而非所在的函数。

  • 函数内的 static 变量只在程序启动时初始化一次(未显式初始化时,默认初始化为 0 值),而非调用该函数时初始化。
#include <stdio.h>

void counter(void)
{
	static int count = 1;  // 只会初始化一次,后续再次调用 counter() 函数时,不再执行该初始化表达式
	static int foo;      // 缺省值为 0

	printf("This has been called %d time(s)\n", count);

	count++;
}

int main(void)
{
	counter();  // "This has been called 1 time(s)"
	counter();  // "This has been called 2 time(s)"
	counter();  // "This has been called 3 time(s)"
	counter();  // "This has been called 4 time(s)"
}

也可以在 top level(非函数内部,如 File Scope)对变量或函数声明和定义使用 static,表示该变量或函数只在这个文件内可见,不同文件的 static 类型变量或函数是不可见的(可以重名),这称为 static linkage。

extern
#

用于变量或函数声明时,表示该变量或函数的定义位于其它文件或本文件的后面,这样编译器即使没看到它们的定义(在链接时检查),也可以使用它们。

// foo.c
extern int a;

int main(void)
{
    printf("%d\n", a);  // 37, from bar.c!
    a = 99;
    printf("%d\n", a);  // Same "a" from bar.c, but it's now 99
}

// foo.c
int main(void)
{
    extern int a;
    printf("%d\n", a);  // 37, from bar.c!
    a = 99;
    printf("%d\n", a);  // Same "a" from bar.c, but it's now 99
}

register
#

告诉编译器这个变量高频使用,编译器应该尽可能(而非必须)将它保存到寄存器中。

  • register 修饰的变量不支持寻址操作。
#include <stdio.h>

int main(void)
{
    register int a;   // Make "a" as fast to use as possible.

    for (a = 0; a < 10; a++)
        printf("%d\n", a);
}

register int a;
int *p = &a;    // COMPILER ERROR! Can't take address of a register

register int a[] = {11, 22, 33, 44, 55};
int *p = a;  // COMPILER ERROR! Can't take address of a[0]

register int a[] = {11, 22, 33, 44, 55};
int a = a[2];  // COMPILER WARNING!

Thread_local
#

线程本地存储。

运算符和表达式
#

  • Expressions:
  • Assignment Operators:
  • Incrementing and Decrementing:
  • Arithmetic Operators:
  • Complex Conjugation:
  • Comparison Operators:
  • Logical Operators:
  • Bit Shifting:
  • Bitwise Logical Operators:
  • Pointer Operators:
  • The sizeof Operator:
  • Type Casts:
  • Array Subscripts:
  • Function Calls as Expressions:
  • The Comma Operator:
  • Member Access Expressions:
  • Conditional Expressions:
  • Statements and Declarations in Expressions:
  • Operator Precedence:
  • Order of Evaluation:

表达式是至少包含一个操作数+可选的运算符组成,表达式可以组合形成更复杂的表达式:

  • 运算符具有优先级和结合性规则。
  • 通过括号 () 来调整计算优先级。
47
2 + 2
cosine(3.14159) /* We presume this returns a floating point value. */
( 2 * ( ( 3 + 10 ) - ( 2 * 6 ) ) )

表达式的是一种计算逻辑,一般是为了获得计算结果,但有时不关注结果而是利用计算过程中产生的副作用(如文件读写)。

运算符:除了常规的算术、关系、逻辑、位运算外,还有:

  1. 赋值运算符:赋值运算符的结果还是值,所以可以链式赋值。
  2. 自增、自减运算符;
  3. sizeof 运算符;
  4. 类型转换运算符;
  5. 数组下标运算符;
  6. 指针运算符;
  7. 函数调用表达式
  8. 成员访问表达式
  9. 条件表达式

逗号运算符
#

在 C 语言中,逗号运算符是一个顺序点,它允许在一个表达式中执行多个操作, 并返回最后一个操作的结果 。逗号运算符的语法为 (expression1, expression2),它首先计算 expression1,然后计算 expression2,并返回 expression2 的值。

逗号运算符用于分割相关的表达式,如前一个表达式值影响后一个表达式值:

x++, y = x * x;

// 更一般的是在声明中使用逗号运算符
for (x = 1, y = 10;  x <=10 && y >=1;  x++, y--)
{
	// …
}

// 使用逗号运算符的函数调用,传给函数的第二个参数实际为 x
foo(x, (y=47, x), z);

x = (1, 2, 3);
printf("x is %d\n", x);  // Prints 3, because 3 is rightmost in the comma list

sizeof 运算符
#

sizeof 是一个运算符,可以返回类型或任意表达式的结果大小,如果操作数是类型则必须要使用括号:

size_t a = sizeof(int);
size_t b = sizeof(float);
size_t c = sizeof(5);
size_t d = sizeof(5.143);
size_t e = sizeof a;

printf("%zu\n", sizeof(2 + 7));
printf("%zu\n", sizeof 3.14);

// 定义一个指针变量 n,然后使用 sizeof *n 来获得表达式 *n 值类型的大小。
int *n = malloc(sizeof *n);

#include <stddef.h>
#include <stdio.h>

static const int values[] = { 1, 2, 48, 681 };
#define ARRAYSIZE(x) (sizeof x/sizeof x[0]) // 这两个 sizeof 运算符的参数都是表达式

int main (int argc, char *argv[])
{
	size_t i;
	for (i = 0; i < ARRAYSIZE(values); i++)
	{
		printf("%d\n", values[i]);
	}
	return 0;
}

sizeof 不能正确计算两类类型的大小:

  1. 含有 zero size array 的 struct 大小 (GCC 扩展支持);
  2. 作为函数参数的数组;

sizeof 的结果类型是 size_t (在 stddef.h 中定义),对应一个 unsigned int 类型,在 printf 中使用 %zu 进行打印。

sizeof 运算符在编译时求值, 结果是编译时常量 ,所以可以用于初始化全局变量(必须用常量表达式初始化)。

#include <stddef.h> // size_t
#include <stdio.h>

static const int values[] = { 1, 2, 48, 681 };
#define ARRAYSIZE(x) (sizeof x/sizeof x[0])  // 传入的 x 必须是数组名,而不能是它的指针

int main (int argc, char *argv[])
{
	size_t i;
	for (i = 0; i < ARRAYSIZE(values); i++)
	{
		printf("%d\n", values[i]);
	}
	return 0;
}

offsetof 运算符
#

在头文件 <stddef.h> 中定义:

#define offsetof(type, member) /*implementation-defined*/

返回值类型是 <stddef.h > 中定义的 size_t 类型, 需要使用 %zu 格式化。

#include <stdio.h>
#include <stddef.h>

struct S {
    char c;
    double d;
};

int main(void)
{
    printf("the first element is at offset %zu\n", offsetof(struct S, c));
    printf("the double is at offset %zu\n", offsetof(struct S, d));

typeof 运算符
#

typeof 返回类型或表达式结果值的类型,主要在头文件的宏定义中使用。

它是 GNU C 扩展,如果在头文件中使用且要符合 ISO C 要求,则需要使用 __typeof__ 而非 typeof

C23 开始支持 typeof、typeof_unqual 运算符:https://en.cppreference.com/w/c/language/typeof

typeof (x[0](1))
typeof (int *)

#define max(a,b)				\
 	({ typeof (a) _a = (a);			\
		typeof (b) _b = (b);		\
		_a > _b ? _a : _b; })

typeof (*x) y[4];

#define pointer(T)  typeof(T *)
#define array(T, N) typeof(T [N])
// array (pointer (char), 4) y;


typeof (int *) y;     // 把 y 定义为指向 int 类型的指针,相当于int *y;
typeof (int)  *y;     //定义一个执行 int 类型的指针变量 y
typeof (*x) y;        //定义一个指针 x 所指向类型 的指针变量y
typeof (int) y[4];    //相当于定义一个:int y[4]
typeof (*x) y[4];     //把 y 定义为指针 x 指向的数据类型的数组
typeof (typeof (char *)[4]) y;//相当于定义字符指针数组:char *y[4];
typeof(int x[4]) y;  //相当于定义:int y[4]

#define max(x, y) ({                \
    typeof(x) _max1 = (x);          \
    typeof(y) _max2 = (y);          \
    (void) (&_max1 == &_max2);      \
    _max1 > _max2 ? _max1 : _max2; })
// (void) (&_max1 == &_max2); 它主要是用来检测宏的两个参数 x 和 y 的数据类型是否相同。如果不相同,
// 编译器会给一个警告信息,提醒程序开发人员。

条件运算符
#

格式: a ? b : c

a、b、c 都是表达式,其中 b 和 c 类型必须是兼容的:

  1. arithmetic types(自动类型转换)
  2. compatible struct or union types
  3. pointers to compatible types (one of which might be the NULL pointer)

Alternatively, one operand is a pointer and the other is a void* pointer.

作为 GNU C 扩展,可以忽略 b 表达式,等效为 a:

a ? : c
// 等效于
a ? a : c

运算符优先级和结合性
#

对于一个操作数 + 运算符组成的表达式,计算顺序是由运算符的优先级决定的,即一个操作数两边有两个运算符时,先按照高优先级运算符计算,当操作数两边运算符优先级一致时,按照结合性(自左向右,或自右向左)来计算。

优先级:后缀运算符 》单目运算符 》乘性 》加性 》左右移动 》关系 》逻辑 》 位运算符 》三目 》赋值 》逗号。例如:foo = *p++; 等效于 foo = *(p++);

优先级:

  • Function calls, array subscripting, and membership access operator expressions.
  • Unary operators, including logical negation, bitwise complement, increment, decrement, unary positive, unary negative, indirection operator, address operator, type casting, and sizeof expressions. When several unary operators are consecutive, the later ones are nested within the earlier ones: !-x means !(-x).
  • Multiplication, division, and modular division expressions.
  • Addition and subtraction expressions.
  • Bitwise shifting expressions.
  • Greater-than, less-than, greater-than-or-equal-to, and less-than-or-equal-to
  • expressions.
  • Equal-to and not-equal-to expressions.
  • Bitwise AND expressions.
  • Bitwise exclusive OR expressions.
  • Bitwise inclusive OR expressions.
  • Logical AND expressions.
  • Logical OR expressions.
  • Conditional expressions (using ?:). When used as subexpressions, these are evaluated right to left.
  • All assignment expressions, including compound assignment. When multiple assignment statements appear as subexpressions in a single larger expression, they are evaluated right to left.
  • Comma operator expressions.

side effects
#

表达式计算(求值)的目录是获得计算结果,但有时表达式计算目的并不是获得结算结果,而是求值过程中的副作用(side effects):

  1. 修改一个对象;
  2. 读写一个文件;
  3. 调用其它产生上面副作用的函数;

编译器在编译程序时,可能会调整指令的顺序(不一定和源文件一致),但是需要确保副作用能符合预期的完成。

编译器为了确保副作用按照正确的顺序产生,C89/C90 定义了一些 sequence points:

  • a call to a function (after argument evaluation is complete)
  • the end of the left-hand operand of the and operator &&
  • the end of the left-hand operand of the or operator ||
  • the end of the left-hand operand of the comma operator ,
  • the end of the first operand of the ternary operator a ? b : c
  • the end of a full declarator 2
  • the end of an initialisation expression
  • the end of an expression statement (i.e. an expression followed by ;)
  • the end of the controlling expression of an if or switch statement
  • the end of the controlling expression of a while or do statement
  • the end of any of the three controlling expressions of a for statement
  • the end of the expression in a return statement
  • immediately before the return of a library function
  • after the actions associated with an item of formatted I/O (as used for example with the strftime or the printf and scanf famlies of functions).
  • immediately before and after a call to a comparison function (as called for example by qsort)

At a sequence point, all the side effects of previous expression evaluations must be complete, and no side effects of later evaluations may have taken place.

This may seem a little hard to grasp, but there is another way to consider this. Imagine you wrote a library (some of whose functions are external and perhaps others not) and compiled it, allowing someone else to call one of your functions from their code. The definitions above ensure that, at the time they call your function, the data they pass in has values which are consistent with the behaviour specified by the abstract machine , and any data returned by your function has a state which is also consistent with the abstract machine. This includes data accessible via pointers (i.e. not just function parameters and identifiers with external linkage).

The above is a slight simplification, since compilers exist that perform whole-program optimisation at link time. Importantly however, although they might perform optimisations, the visible side effects of the program must be the same as if they were produced by the abstract machine.

Between two sequence points,

  1. an object may have its stored value modified at most once by the evaluation of an expression
  2. the prior value of the object shall be read only to determine the value to be stored.

所以下面两个表达式(语句)是不允许的:

i = ++i + 1;
int x=0; foo(++x, ++x)

求值顺序未定
#

在 C 语言中,编译器对表达式中 子表达式的求值顺序没有明确的规定 。因此,你不能假设子表达式会按照你认为的自然顺序进行求值。

  1. 求值顺序的非确定性

C 标准对某些表达式的求值顺序没有明确规定,这使得不同编译器或不同编译选项可能会以不同的顺序计算子表达式。 求值顺序的非确定性意味着在一个表达式中,哪部分先求值并不总是确定的

int x = 10;
int y = (x + 1) * (x + 2);

在这个表达式中,编译器可能先计算 (x + 1),也可能先计算 (x + 2)。虽然这在这个简单的例子中并不影响最终结果, 但在更复杂的表达式中可能会导致不同的行为

  1. 副作用和求值顺序

副作用是指表达式在求值过程中对存储器状态的改变,例如变量赋值、函数调用等。如果一个表达式中包含副作用且依赖于求值顺序,结果可能会变得不可预测。

int i = 1;
int result = (i++) + (i++);

在这个例子中,i 的值在表达式计算过程中发生变化,但 C 标准并没有规定 i++ 的求值顺序。因此,result 的值可能会因编译器的不同而不同。

  1. 函数参数求值顺序

在函数调用中, 函数参数的求值顺序同样是未定义的 ,这意味着参数的求值顺序取决于编译器的实现。

void foo(int a, int b) {
    printf("a: %d, b: %d\n", a, b);
}

int main() {
    int x = 1;
    foo(x++, x++);
    return 0;
}

在这个例子中,foo(x++, x++) 中的两个 x++ 的求值顺序未定义,因此 foo 函数接收到的参数值是不可预测的。

  1. 确保确定性的方法

为了确保代码的可预测性和正确性, 应该避免在同一个表达式中使用多个具有副作用的子表达式 。可以通过拆分复杂表达式、避免依赖未定义的求值顺序来确保代码行为的一致性。

改写上面的例子,使其行为确定:

int i = 1;
int a = i++;
int b = i++;
int result = a + b;

// 或者
int x = 1;
int a = x++;
int b = x++;
foo(a, b);

总结

  1. 求值顺序未定义:C 语言标准不规定某些表达式中子表达式的求值顺序。
  2. 副作用:在同一表达式中使用多个具有副作用的子表达式可能会导致不可预测的行为。
  3. 函数参数求值顺序:函数参数的求值顺序未定义,不同编译器可能产生不同的结果。
  4. 确保确定性:通过拆分复杂表达式和避免依赖未定义求值顺序来确保代码的确定性和可读性。

Order of Evaluation
#

The correspondence between the program you write and the things the computer actually does are specified in terms of side effects and sequence points .

语句表达式
#

语句表达式是 GNU 扩展, 参考:https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html

  • Statements and Declarations in Expressions

    A compound statement enclosed in parentheses may appear as an expression in GNU C. This allows you to use loops, switches, and local variables within an expression.

    括号封装的单条或复合语句,用作表达式。

    
    // 括号封装的单条语句
    #define max(a,b) ((a) > (b) ? (a) : (b))
    
    // 括号封装的复合语句,多条语句需要用大括号 block 包围
    ({ int y = foo (); int z;
    	if (y > 0) z = y;
    	else z = - y;
    	z; })
    
    #define maxint(a,b)					\
    	({int _a = (a), _b = (b); _a > _b ? _a : _b; })
    
    #define maxint3(a, b, c)						\
    	({int _a = (a), _b = (b), _c = (c); maxint (maxint (_a, _b), _c); })
    
    #define macro(a)  ({__typeof__(a) b = (a); b + 3; })
    template<typename T> T function(T a) { T b = a; return b + 3; }
    
    void foo ()
    {
    	macro (X ());
    	function (X ());
    }
    
    int main(void)
    {
    	int sum = 0;
    	sum = ({
    			int s = 0;
    			for( int i = 0; i < 10; i++)
    				s = s + i;
    			s;
    		});
    	printf("sum = %d\n", sum);
    	return 0;
    }
    

语句
#

表达式的是一种计算逻辑,一般是为了获得计算结果,但有时不关注结果而是利用计算过程中产生的副作用(如文件读写)。

运算符:除了常规的算术、关系、逻辑、位运算外,还有:

  1. 赋值运算符:赋值运算符的结果还是值,所以可以链式赋值。
  2. 自增、自减运算符;
  3. sizeof 运算符;
  4. 类型转换运算符;
  5. 数组下标运算符;
  6. 指针运算符;
  7. 函数调用表达式
  8. 成员访问表达式
  9. 条件表达式

运算符 + 操作数 -》表达式 -》语句。

语句:执行计算、流程控制等。

  • 最常见的语句是表达式语句,即以分号结尾的表达式。
  • block 语句创建新的 scope,可以包含多个语句。

表达式语句需要以分号结尾,但是不是每个语句都以分号结尾,例如 if/switch/while 等。

// 表达式语句
5;
2 + 2;
10 >= 9;
x++;
y = x + 25; // 赋值运算符
puts ("Hello, user!");
*cucumber;

You write statements to cause actions and to control flow within your programs. You can also write statements that do not do anything at all, or do things that are uselessly trivial.

  • Expression Statements: 在表达式结尾添加分号,就是表达式语句。

剩下的这些语句用于流程控制、分支跳转等:

  • Labels:
  • The if Statement:
  • The switch Statement:
  • The while Statement:
  • The do Statement:
  • The for Statement:
  • Blocks:
  • The Null Statement:
  • The goto Statement:
  • The break Statement:
  • The continue Statement:
  • The return Statement:
  • The typedef Statement:

switch
#

switch 语法:

  • test 和各分支的 compare-xx 都是表达式;
  • 所有表达式的结果必须都是整型,而且 case 分支的 campare-x 表达式结果必须是整型常量;
switch (test)
  {
    case compare-1:
      if-equal-statement-1
    case compare-2:
      if-equal-statement-2
    default:
      default-statement
  }

匹配某个 case 分支后,默认执行该 case 和剩余 case 中的语句,除非遇到了 break:

int x = 0;
switch (x)
  {
    case 0:
      puts ("x is 0");
    case 1:
      puts ("x is 1");
    default:
      puts ("x is something else");
  }

/* 输出: */
/* x is 0 */
/* x is 1 */
/* x is something else */

// 解决办法:
switch (x)
  {
    case 0:
      puts ("x is 0");
      break;
    case 1:
      puts ("x is 1");
      break;
    default:
      puts ("x is something else");
      break;
  }

GNU C 扩展:case 支持范围匹配,如 case low ... high :

case 'A' ... 'Z':
case 1 ... 5:
// 而不是:
// case 1...5:

block
#

A block is a set of zero or more statements enclosed in braces. Blocks are also known as compound statements.

for (x = 1; x <= 10; x++)
  {
    if ((x % 2) == 0)
      {
        printf ("x is %d\n", x);
        printf ("%d is even\n", x);
      }
    else
      {
        printf ("x is %d\n", x);
        printf ("%d is odd\n", x);
      }
  }

You can declare variables inside a block; such variables are local to that block. In C89, declarations must occur before other statements, and so sometimes it is useful to introduce a block simply for this purpose:

{
  int x = 5;
  printf ("%d\n", x);
}
printf ("%d\n", x);   /* Compilation error! x exists only
                       in the preceding block. */

goto
#

goto 是函数作用域:goto label 的 label 必须位于同一个函数中,label 位置无限制,如 goto 语句之前或之后。

  • setjmp、longjmp 实现跨函数的跳转,如从深层次嵌套的调用栈跳转到先前 setjmp 定义的位置。
// continue
for (int i = 0; i < 3; i++) {
        for (int j = 0; j < 3; j++) {
		for (int k = 0; k < 3; k++) {
			printf("%d, %d, %d\n", i, j, k);

			goto continue_i;   // Now continuing the i loop!!
		}
        }
continue_i: ;
}


// break
for (int i = 0; i < 3; i++) {
        for (int j = 0; j < 3; j++) {
		printf("%d, %d\n", i, j);
		goto break_i;   // Now breaking out of the i loop!
        }
}
break_i:
printf("Done!\n");


// 重试中断的系统调用
retry:
byte_count = read(0, buf, sizeof(buf) - 1);  // Unix read() syscall
if (byte_count == -1) {            // An error occurred...
        if (errno == EINTR) {          // But it was just interrupted
		printf("Restarting...\n");
		goto retry;
        }

goto 带来的变量作用域问题:编译器告警 warning: ‘x’ is used uninitialized in this function:

goto label;
{
        int x = 12345;
label:
        printf("%d\n", x);
};

{
	int x = 10;
label:
	printf("%d\n", x);
}
goto label;

//解决办法
goto label;
{
        int x;
label:
        x = 12345;
        printf("%d\n", x);
}

函数
#

函数声明(原型):可以重复声明但是需要确保声明的函数签名是一致的(一般在头文件中声明函数签名)。

  • 函数声明缺省是 extern 的(C 函数定义不允许嵌套, 但 GNU 扩展允许),所以 extern 关键字可以省略;
  • 函数必须指定返回值类型。如果没有返回值,则必须使用 void。如果不需要输入参数,则使用 void 而不是空参数列表;
  • 函数声明可以位于任意作用域,但是函数定义只能位于全局作用域;
  • 函数声明可以不指定参数名称,但是一定要指定各参数类型;
int function(void);
void func(void);

函数定义:整个程序只能定义一次,否则编译时报错(所以不在头文件中定义函数)。

编译器不为函数声明分配存储空间,而主要用来进行编译时检查。但是编译器为函数定义分配存储空间,来保存编译函数后生成的指令,所以函数名或函数指针都是指向函数第一条指令的内容地址,和普通指针相比,函数指针可以进行函数调用。

函数指针和数据指针是不兼容的(但是可以强制转换):Function pointers and data pointers are not compatible, in the sense that you cannot expect to store the address of a function into a data pointer, and then copy that into a function pointer and call it successfully. It might work on some systems, but it’s not a portable technique.

函数调用是一个表达式,可以用在需要表达式或值的任何地方。

  • 函数参数是 paas by value 而非 paas by refer;
  • 编译器在传参时会根据形参做自动类型转换;

自动推断函数原型
#

如果先不声明函数而直接调用(不建议),则编译器自动推断出一个函数原型:

  1. 入参:根据传入的参数列表来定;
  2. 出参:固定为 int;

这个推断的原型可能和后续函数定义不一致,导致编译器告警: implicit declaration of function ‘myFunction’[-Wimplicit-function-declaration]

#include <stdio.h>

int main() {
    int result = myFunction(5);  // myFunction 未声明
    printf("Result: %d\n", result);
    return 0;
}

/*
编译时会产生如下警告或错误(具体取决于编译器和标准):
gcc -Wall -o test test.c
test.c: In function 'main':
test.c:4:16: warning: implicit declaration of function 'myFunction' [-Wimplicit-function-declaration]
    4 |     int result = myFunction(5);
      |                ^~~~~~~~~~~
*/

嵌套函数
#

GNU C 扩展支持在函数内定义嵌套函数,嵌套函数必须位于函数的开始的变量定义位置,位于其他表达式语句之前;

int factorial (int x)
{
	int factorial_helper (int a, int b)
	{
		if (a < 1)
		{
			return b;
		}
		else
		{
			return factorial_helper ((a - 1), (a * b));
		}
	}

	return factorial_helper (x, 1);
}

函数参数
#

函数参数是 paas by value。

如果函数不需要参数,需要使用 void 而不是空列表,避免编译器警告。

向函数传递多维数组时,需要为第二维及以后指定长度。

#include <stdio.h>

// This function takes no arguments and returns no value:

void hello(void)
{
    printf("Hello, world!\n");
}

int main(void)
{
    hello();  // Prints "Hello, world!"
}

函数返回值
#

  1. 支持的返回值类型:
    1. 基本类型:整型 (int, short, long, char 等),浮点型 (float, double 等);
    2. void 类型: 表示函数没有返回值。
    3. struct/enum/union 类型,对于 struct 进行的浅拷贝;
    4. 指针类型:指向基本数据类型、结构体、联合体、void 等的指针。
  2. 不支持的返回值类型:
    1. 不能返回数组类型,但可以返回指向数组的指针或通过结构体封装数组。
    2. 不能返回另一个函数类型,但可以返回函数指针。
// 可以是一个函数指针:
// 错误
int *(int, int) myfunc(int, int)
// 正确,func 是一个函数,输入参数是 intint,返回的是一个 int(*)(int, int) 类型的函数指针;
int (**func(int, int))(int, int)

// 可以是一个指向数组的指针:
// func 是一个函数,输入参数是 intint,返回的是一个 int(*)[4] 类型的指针数组;
int (**func(int, int))[4]

// 可以是一个数组指针:
// 变量定义:func 是一个函数指针变量,输入参数是 intint,返回的是指向 4  int 元素的数组指针: int (*)[4]
int (*func(int, int))[4] // 变量定义:func 是一个函数指针变量
// 变量声明:func 是一个函数指针变量
extern int (*func(int, int))[4]

// C 不支持直接返回数组类型,故报错:error: function cannot return array type 'int[4]'
// int (func(int, int))[4];

// 解决办法:返回指向数组的指针;
// 变量定义:func 是一个指向函数指针的变量,该函数输入是 intint,输出是指向 4  int 元素的数组指针:int(*)[4]
int (*func(int, int))[4];
// 变量声明
extern int (*func(int, int))[4];

// 返回一个函数指针
int add(int a, int b) { return a + b; }
int (*getAddFunc())(int, int) {
    return add;
}

函数指针
#

函数名作为右值时代表函数体的首地址指针,后续调用时就会执行函数体的指令。

  • 函数名作为右值使用时,max 和 &max 等效,都为函数体的首地址指针;
  • 定义函数指针变量时,变量名必须是一个指针类型,否则不能和函数原型声明区分开;
// 函数原型声明(可选 extern,因为函数声明不能嵌套,只能是 extern 的)
int max(int a, int b);

// 函数定义(包含函数体)
int max(int a, int b) {return a>b?a:b;}

// 函数指针变量声明:与函数原型声明的差异在于,maxb 是必须是一个指针类型。
int (*maxb)(int, int);

// 函数指针变量赋值
maxb = max; // 或者:maxb = &max

// 函数指针类型定义: maxb 也必须是一个指针类型
typedef int (*maxb)(int, int);
maxb mymax = max; // 或者:maxb mymax = &max;

#include <stdio.h>

// 函数定义,编译器分配存储空间,报错函数的指令
// foo 代表该空间的首地址
void foo (int i)
{
	printf ("foo %d!\n", i);
}

void bar (int i)
{
	printf ("%d bar!\n", i);
}

void message (void (*func)(int), int times)
{
	int j;
	for (j=0; j<times; ++j)
		func (j);  /* (*func) (j); would be equivalent. */
}

void example (int want_foo)
{
	// pf 是一个指针,指向 void (*)(int) 函数
	void (*pf)(int) = &bar; /* The & is optional. */
	if (want_foo)
		pf = foo;
	message (pf, 5);
}

函数指针数组:不支持函数数组,但支持函数指针数组;

// C 也不支持定义函数数组,error: 'fa' declared as array of functions of type 'int (int, int)'//
// int (fa[4])(int, int);

// 解决办法:使用函数指针数组;
// 变量定义:fa 是一个 4 元素的数组,数组的元素为函数指针:int (*) (int, int)
int(*fa[4])(int, int);
// 注意:上面的 fa 是一个数组定义,对于外部变量声明,需要加 extern 前缀
extern int(*fa[4])(int, int);

复杂函数声明举例: void (**signal(int, void(**)(int)))(int)

  • signal 是一个函数,输入参数为 int, void(*)(int), 其中第二个参数为函数指针类型,它的输入为 int,无输出;
  • signal 函数的输出为 void(*)(int) ,即返回一个函数指针类型,输入为 int,无输出;
  • 分析技巧:
    1. 先看标识符,如 signal 右侧如果有括号则说明是函数指针。
    2. 确认函数的输入,往右看:signal(int, void(*)(int));
    3. 确认函数的输出:往左看,将 signal 和输入去掉,获得函数的返回值:void (*)(int), 说明是函数指针;

可变参数
#

使用 stdarg.h 中函数 va_start(), va_arg(), va_end() 以及 va_list 类型:

#include <stdio.h>
#include <stdarg.h>

// 可变长参数的函数,最后一个参数值必须是 ...
int add(int count, ...)
{
	int total = 0;
	va_list va;

	va_start(va, count);   // Start with arguments after "count"

	for (int i = 0; i < count; i++) {
		int n = va_arg(va, int);   // Get the next int
		total += n;
	}

	va_end(va);  // All done

	return total;
}

int main(void)
{
	printf("%d\n", add(4, 6, 2, -4, 17));  // 6 + 2 - 4 + 17 = 21
	printf("%d\n", add(2, 22, 44));        // 22 + 44 = 66
}

标准库的 vprintf/vfprintf/vsprintf/vsnprintf() 函数支持使用 va_list 类型参数:

#include <stdio.h>
#include <stdarg.h>

// 可变长参数的函数,最后一个参数值必须是 ...
int my_printf(int serial, const char *format, ...)
{
	va_list va;

	// Do my custom work
	printf("The serial number is: %d\n", serial);

	// Then pass the rest off to vprintf()
	va_start(va, format);
	int rv = vprintf(format, va);  // vprintf 使用 va_list
	va_end(va);

	return rv;
}

int main(void)
{
	int x = 10;
	float y = 3.2;

	my_printf(3490, "x is %d, y is %f\n", x, y);
}

定义宏函数时也支持变长参数:

#define pr_info(fmt, ...)    __pr(__pr_info, fmt, ##__VA_ARGS__)
#define pr_debug(fmt, ...)    __pr(__pr_debug, fmt, ##__VA_ARGS__)

inline
#

对于频繁执行的函数,可以定义为 inline 类型,这样编译器在对该函数调用的位置会进行代码展开,从而省去了函数调用的开销(如传参,返回等),适合于对性能有要求的场景,但会增加可执行文件的大小。

inline 函数是文件作用域,不同文件可以定义同名的 inline 函数。但 inline 也只是告诉编译器尽量进行优化,但实际是否优化不一定。

inline 一般和 static 连用,这样可以确保该 inline 函数是文件作用域,否则会有一堆意想不到的问题,如编译器实际没有将该函数 inline,则可以确保该函数还是文件作用域。

static inline int add(int x, int y) {
	return x + y;
}

去掉了 static 修饰符后,如果没有开启优化编译 gcc 链接程序时会出错:

  • 不带 static 的 inline 函数必须开优化编译;
  • 不带 static 的 inline 函数不能引用 static 变量;
  • inline 函数内定义的 static 变量必须是 const 类型;
#include <stdio.h>

// 去掉了 static 声明
inline int add(int x, int y)
{
	return x + y;
}

int main(void)
{
	printf("%d\n", add(1, 2));
}


static int b = 13;
inline int add(int x, int y)
{
	return x + y + b;  // BAD -- can't refer to b
}

inline int add(int x, int y)
{
	// static int b = 13;  // BAD -- can't define static
	static const int b = 13;  // OK -- static const
	return x + y + b;
}

如果同时在不同文件中定义了同名的 inline 和不带 inline 函数,则链接时使用哪一个取决于是否开启了编译优化:

  1. 如果开启了编译优化,则定义 inline 函数的文件使用 inline 版本,其他文件使用不带 inline 修饰的版本;
  2. 如果没有开启编译优化,则只使用不带 inline 函数版本;

GNU 提供了 inline/noinline 函数属性,适用于 C99 之前未引入 inline 关键字的函数:

__attribute__((__noinline__,__noclone__))

noreturn 和 _Noreturn
#

告诉编译器函数不返回,而是通过其它机制退出,如 exit()/abrt(), 这样编译器在调用函数时可以进行优化。

  • 方式 1: 使用 _Noreturn 内置关键字(C23 Deprecated);
  • 方式 2: 使用 <stdnoreturn.h> 中的 noreturn 宏定义(建议);
#include <stdio.h>
#include <stdlib.h>
#include <stdnoreturn.h>

noreturn void foo(void)
{
	printf("Happy days\n");
	exit(1);            // And it doesn't return--it exits here!
}

int main(void)
{
	foo();
}

执行程序
#

命令行参数和环境变量
#

C 程序的执行入口是 main 函数, 该函数的返回值或退出值被执行环境所捕获, 作为程序的退出码。

从 elf 二进制或汇编的角度看, 程序真正的执行入口是 _start 标记的 .text section。gcc 在链接可执行程序时, 会在 main 函数前后插入控制例程, 也就是 C 库提供了 _start 标记, 它来调用 main() 函数, 函数返回后执行一些清理动作。

  • 参考:https://akaedu.github.io/book/ch19s02.html

可以使用 gcc -v 参数查看,gcc 在编译、链接时传递给 as、ld 的参数:

  • -Wp,-v: 编译预处理详细参数;
  • -v:编译详情, cc1 编译器的执行详情
  • -Wa,-v: 汇编详情: 显示调用的 as 命令参数;
  • -Wl,-v: 链接详情,显示 collect2 调用 ld 的详情;
alizj@ubuntu:/Users/alizj/docs/lang/c$ gcc -g -v -Wp,-v -Wa,-v -Wl,-v array.c
#...
# 链接 wrapper collect2 的命令行参数,内部调用 ld
GNU assembler version 2.42 (aarch64-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.42
COMPILER_PATH=/usr/libexec/gcc/aarch64-linux-gnu/13/:/usr/libexec/gcc/aarch64-linux-gnu/13/:/usr/libexec/gcc/aarch64-linux-gnu/:/usr/lib/gcc/aarch64-linux-gnu/13/:/usr/lib/gcc/aarch64-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/aarch64-linux-gnu/13/:/usr/lib/gcc/aarch64-linux-gnu/13/../../../aarch64-linux-gnu/:/usr/lib/gcc/aarch64-linux-gnu/13/../../../../lib/:/lib/aarch64-linux-gnu/:/lib/../lib/:/usr/lib/aarch64-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/aarch64-linux-gnu/13/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-g' '-v' '-mlittle-endian' '-mabi=lp64' '-dumpdir' 'a.'
 /usr/libexec/gcc/aarch64-linux-gnu/13/collect2 -plugin /usr/libexec/gcc/aarch64-linux-gnu/13/liblto_plugin.so -plugin-opt=/usr/libexec/gcc/aarch64-linux-gnu/13/lto-wrapper -plugin-opt=-fresolution=/tmp/ccZXDhqQ.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --build-id --eh-frame-hdr --hash-style=gnu --as-needed -dynamic-linker /lib/ld-linux-aarch64.so.1 -X -EL -maarch64linux --fix-cortex-a53-843419 -pie -z now -z relro /usr/lib/gcc/aarch64-linux-gnu/13/../../../aarch64-linux-gnu/Scrt1.o /usr/lib/gcc/aarch64-linux-gnu/13/../../../aarch64-linux-gnu/crti.o /usr/lib/gcc/aarch64-linux-gnu/13/crtbeginS.o -L/usr/lib/gcc/aarch64-linux-gnu/13 -L/usr/lib/gcc/aarch64-linux-gnu/13/../../../aarch64-linux-gnu -L/usr/lib/gcc/aarch64-linux-gnu/13/../../../../lib -L/lib/aarch64-linux-gnu -L/lib/../lib -L/usr/lib/aarch64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/aarch64-linux-gnu/13/../../.. -v /tmp/cc0XD2tu.o -lgcc --push-state --as-needed -lgcc_s --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-state /usr/lib/gcc/aarch64-linux-gnu/13/crtendS.o /usr/lib/gcc/aarch64-linux-gnu/13/../../../aarch64-linux-gnu/crtn.o
collect2 version 13.3.0

# collect 2 调用的 ld 命令
/usr/bin/ld -plugin /usr/libexec/gcc/aarch64-linux-gnu/13/liblto_plugin.so -plugin-opt=/usr/libexec/gcc/aarch64-linux-gnu/13/lto-wrapper -plugin-opt=-fresolution=/tmp/ccZXDhqQ.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --build-id --eh-frame-hdr --hash-style=gnu --as-needed -dynamic-linker /lib/ld-linux-aarch64.so.1 -X -EL -maarch64linux --fix-cortex-a53-843419 -pie -z now -z relro /usr/lib/gcc/aarch64-linux-gnu/13/../../../aarch64-linux-gnu/Scrt1.o /usr/lib/gcc/aarch64-linux-gnu/13/../../../aarch64-linux-gnu/crti.o /usr/lib/gcc/aarch64-linux-gnu/13/crtbeginS.o -L/usr/lib/gcc/aarch64-linux-gnu/13 -L/usr/lib/gcc/aarch64-linux-gnu/13/../../../aarch64-linux-gnu -L/usr/lib/gcc/aarch64-linux-gnu/13/../../../../lib -L/lib/aarch64-linux-gnu -L/lib/../lib -L/usr/lib/aarch64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/aarch64-linux-gnu/13/../../.. -v /tmp/cc0XD2tu.o -lgcc --push-state --as-needed -lgcc_s --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-state /usr/lib/gcc/aarch64-linux-gnu/13/crtendS.o /usr/lib/gcc/aarch64-linux-gnu/13/../../../aarch64-linux-gnu/crtn.o
GNU ld (GNU Binutils for Ubuntu) 2.42
COLLECT_GCC_OPTIONS='-g' '-v' '-mlittle-endian' '-mabi=lp64' '-dumpdir' 'a.'

main 函数的返回值只能是 int 类型,如果没有 return 该 int 则默认为 0:

  • argv 的最后一项是 NULL:argv[argc] == NULL
int main(void); // void 是必须的,表示没有输入参数
int main(int argc, char **argv); // int main(int argc,  char *argv[])

// stdlib.h 提供了 environ 变量声明和操作函数
extern char **environ;

// 或者, 非标的情况
int main(int argc, char **argv, char **environ) ;

environ:

#include <stdio.h>

extern char **environ;  // MUST be extern AND named "environ"

int main(void)
{
	for (char **p = environ; *p != NULL; p++) {
		printf("%s\n", *p);
	}

	// Or you could do this:
	for (int i = 0; environ[i] != NULL; i++) {
		printf("%s\n", environ[i]);
	}
}

// 或者使用 getenv()
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    char *val = getenv("FROTZ");  // Try to get the value

    // Check to make sure it exists
    if (val == NULL) {
        printf("Cannot find the FROTZ environment variable\n");
        return EXIT_FAILURE;
    }

    printf("Value: %s\n", val);
}

程序退出
#

C 各种 exit/terminal/resource cleanup 相关函数都位于 <stdlib.h> 库中。

  • atexit: registers a function to be called on exit() invocation(function)
  • exit: 调用 atexit() 注册的函数,并刷新和关闭 IO 流。
  • _exit: 不调用 atexit() 注册的函数,不刷新和关闭标准 IO 流;
  • _Exit(C99):和 POSIX 的 _exit 类似,但是 C99 标准,不刷新和关闭标准 IO 流;
  • abort: causes abnormal program termination (without cleaning up)(function)
  • quick_exit (C11): causes normal program termination without completely cleaning up (function)
  • at_quick_exit (C11): registers a function to be called on quick_exit invocation(function)

stdlib.h 定义了两个标准返回值枚举:

Status Description
EXIT_SUCCESS or 0 Program terminated successfully.
EXIT_FAILURE Program terminated with an error.
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
    if (argc != 3) {
        printf("usage: mult x y\n");
        return EXIT_FAILURE;   // Indicate to shell that it didn't work
    }

    printf("%d\n", atoi(argv[1]) * atoi(argv[2]));

    return 0;  // same as EXIT_SUCCESS, everything was good.
}

正常退出:When you exit a program normally, all open I/O streams are flushed and temporary files removed . Basically it’s a nice exit where everything gets cleaned up and handled. It’s what you want to do almost all the time unless you have reasons to do otherwise.

  1. 文件关闭 :打开的文件不会被关闭。
  2. 动态内存释放 :分配的动态内存不会被释放。
  3. 临时文件删除 :临时文件不会被删除。
  4. 自定义清理函数 :通过 `atexit()` 注册的清理函数不会被调用。

正常退出的情况:

  1. main 函数返回, 如显式 return,或函数结束,这时相当于 return 0;
  2. 调用 exit(N) 函数,可以为 exit() 注册一些 exit handlers 函数, 在 main return 或调用 exit() 时执行:
#include <stdio.h>
#include <stdlib.h>

void on_exit_1(void)
{
	printf("Exit handler 1 called!\n");
}

void on_exit_2(void)
{
	printf("Exit handler 2 called!\n");
}

int main(void)
{
	atexit(on_exit_1);
	atexit(on_exit_2);

	printf("About to exit...\n");
}

/* About to exit... */
/* Exit handler 2 called! */
/* Exit handler 1 called! */

Quicker Exits with quick_exit() : This is similar to a normal exit, except:

  • Open files might not be flushed.
  • Temporary files might not be removed.
  • atexit() handlers won’t be called.

But there is a way to register exit handlers: call at_quick_exit() analogously to how you’d call atexit().

#include <stdio.h>
#include <stdlib.h>

void on_quick_exit_1(void)
{
	printf("Quick exit handler 1 called!\n");
}

void on_quick_exit_2(void)
{
	printf("Quick exit handler 2 called!\n");
}

void on_exit(void)
{
	printf("Normal exit--I won't be called!\n");
}

int main(void)
{
	at_quick_exit(on_quick_exit_1);
	at_quick_exit(on_quick_exit_2);

	atexit(on_exit);  // This won't be called

	printf("About to quick exit...\n");

	quick_exit(0);
}

/* About to quick exit... */
/* Quick exit handler 2 called! */
/* Quick exit handler 1 called! */

直接退出,不做任何清理操作: 调用 _exit(N)/_Exit(N) 函数;

其它程序退出方式:

  1. assert() :不做任何清理操作,它会立即终止程序,并生成一个核心转储文件(如果系统配置允许)。

      goats -= 100;
      assert(goats >= 0);  // Can't have negative goats
    
  2. abort(): 等效于收到 SIGABRT 信号。

参考
#

  1. Beej’s Guide to C Programming:https://beej.us/guide/bgc/html/split-wide/index.html
  2. GNU C Language Intro and Reference Manual:https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00005.html
  3. The Development of the C Language:https://www.bell-labs.com/usr/dmr/www/chist.html
  4. An Introduction to GCC, or the GNU Compilers gcc and g++, Revised and updated :https://www.seas.upenn.edu/~ese5320/fall2022/handouts/_downloads/788d972ffe62083c2f1e3f86b7c03f5d/gccintro.pdf

C 主要版本特性
#

参考: https://en.cppreference.com/w/c/language/history

C99 新增特性
#

C99 相较于 C89 标准引入了许多新特性,以下是一个完整列表:

  1. 新数据类型
    • long long int 类型,至少 64 位。
    • _Bool 类型,用于布尔值。
    • compleximaginary 类型,用于复数。
  2. 变量声明 :允许在任何地方声明变量,而不仅限于代码块的开头。
  3. 复合字面量 :可以在任意位置创建匿名的数组或结构体实例。
  4. 可变长数组 :允许数组的长度在运行时确定。
  5. 变量声明时设定初始值 :允许变量声明时直接初始化。
  6. 单行注释 :支持 // 单行注释。
  7. 新标准库函数
    • <tgmath.h> 中的泛型数学函数。
    • <stdbool.h> 中的布尔类型和常量。
    • <complex.h> 中的复数类型和函数。
    • <stdint.h> 中的固定宽度整数类型。
  8. 内联函数 :使用 inline 关键字定义内联函数。
  9. 新预处理器指令
    • #include 指令支持通过 <...>"" 引入头文件。
    • __func__ 预定义标识符,表示当前函数的名称。
  10. 改进的浮点支持 :更好的浮点数支持、四舍五入控制和新的数学库函数。
  11. 变长宏参数 : 支持变长参数的宏定义。
  12. 指定初始化 : 允许在初始化数组和结构体时指定索引或成员。
  13. 支持 restrict 关键字 :用于指示指针是唯一访问某个数据对象的方式,以帮助优化。
  14. 支持 __STDC_VERSION__ : 用于检查标准版本。
  15. 改进的输入输出函数 : 新增的格式化输入输出函数。

示例:

#include <stdio.h>
#include <stdbool.h>
#include <complex.h>
#include <tgmath.h>
#include <stdint.h>

// 单行注释
struct Point {
	int x, y;
};

// 内联函数
inline int square(int x) {
	return x * x;
}

int main() {
	// 按需变量声明
	for (int i = 0; i < 5; i++) {
		printf("i = %d\n", i);
	}

	// 复合字面量
	struct Point p = (struct Point){.x = 1, .y = 2};
	printf("Point p: (%d, %d)\n", p.x, p.y);

	// 可变长数组
	int n = 5;
	int arr[n];
	for (int i = 0; i < n; i++) {
		arr[i] = i * i;
		printf("arr[%d] = %d\n", i, arr[i]);
	}

	// _Bool 类型和 <stdbool.h> 头文件
	_Bool flag = true;
	printf("flag = %d\n", flag);

	// 复数类型和 <complex.h> 头文件
	double complex z = 1.0 + 2.0 * I;
	printf("Complex z: %.2f + %.2fi\n", creal(z), cimag(z));

	// 泛型数学函数
	double result = sqrt(4.0);
	printf("sqrt(4.0) = %.2f\n", result);

	// 指定初始化
	int arr2[5] = {[1] = 10, [3] = 20};
	for (int i = 0; i < 5; i++) {
		printf("arr2[%d] = %d\n", i, arr2[i]);
	}

	// 变长宏参数
	#define PRINT(...) printf(__VA_ARGS__)
	PRINT("This is a test: %d\n", 123);

	return 0;
}

C11 新增特性
#

原子操作

  • 提供了原子操作和锁自由编程的支持,提供了 <stdatomic.h> 头文件。
  • 使用 _Atomic 类型说明符和相关函数。

多线程支持

  • 引入了多线程支持,提供了 <threads.h> 头文件。
  • 包括 thrd_t 类型、mtx_t 类型、cnd_t 类型等。

泛型选择

  • 使用 _Generic 关键字,实现类型安全的泛型编程。

静态断言

  • 使用 _Static_assert 关键字,在编译时进行断言检查。

对齐支持

  • 提供了对齐支持,提供了 <stdalign.h> 头文件。
  • 提供了 alignasalignof 关键字。

变长数组的改进

  • 更加严格的变长数组初始化和使用规则。

匿名结构体和联合体

  • 允许在结构体和联合体中使用匿名成员。

增强的 Unicode 支持

  • 提供了对 Unicode 字符的支持,提供了 <uchar.h> 头文件。
  • 提供了 char16_tchar32_t 类型。

内存模型

  • 定义了明确的内存模型,提供更好的并发编程支持。

关键字

  • 提供了 no_return 关键字,指示函数不会返回。

改进的标准库函数

  • 新增了一些标准库函数,如 aligned_alloc

边界检查功能

  • 提供了边界检查功能,通过 <stdckdint.h> 头文件。

K&R 函数声明的废弃

  • 废弃了旧的K&R(Kernighan和Ritchie)函数声明方式。

示例:

#include <stdio.h>
#include <stdatomic.h>
#include <threads.h>
#include <stdalign.h>
#include <uchar.h>
#include <stdlib.h>

// 原子操作示例
atomic_int atomic_var = 0;

// 多线程示例
int thread_func(void *arg) {
    atomic_fetch_add(&atomic_var, 1);
    return 0;
}

// 静态断言示例
_Static_assert(sizeof(int) == 4, "int size is not 4 bytes");

// 对齐支持示例
struct AlignedStruct {
    alignas(16) int x;
    alignas(16) int y;
};

int main() {
    // 泛型选择示例
    #define max(a, b) _Generic((a), \
        int: ((a) > (b) ? (a) : (b)), \
        double: ((a) > (b) ? (a) : (b)) \
    )
    int a = 5, b = 10;
    printf("max(a, b) = %d\n", max(a, b));

    // 对齐支持示例
    struct AlignedStruct s;
    printf("Alignment of s: %zu\n", alignof(s));

    // Unicode 支持示例
    char16_t u16_str[] = u"Hello";
    char32_t u32_str[] = U"World";
    printf("u16_str: %ls\n", (wchar_t *)u16_str);
    printf("u32_str: %ls\n", (wchar_t *)u32_str);

    // 多线程示例
    thrd_t threads[10];
    for (int i = 0; i < 10; ++i) {
        thrd_create(&threads[i], thread_func, NULL);
    }
    for (int i = 0; i < 10; ++i) {
        thrd_join(threads[i], NULL);
    }
    printf("atomic_var = %d\n", atomic_var);

    return 0;
}

C23 新增特性
#

GCC 14 开始支持部分 C23 (2024 8 月发布)特性:

  1. 二进制整数常量
int x = 0b1010; // 二进制字面量,以 0b 开头
  1. 数字分隔符
int x = 1'000'000; // 使用单引号作为分隔符提高可读性
  1. #elifdef#elifndef 预处理指令
#ifdef X
   // ...
#elifdef Y
   // ...
#endif
  1. constexpr 关键字(类似 C++ 的 constexpr)
constexpr int square(int x) {
   return x * x;
}
  1. 属性语法的统一
[[nodiscard]] int func(void);
  1. _Bool 类型的 bool 宏和 true/false 关键字
bool flag = true; // 无需包含 stdbool.h
  1. 复数运算的改进
#include <complex.h>
double complex z = 1.0 + 2.0*I;
  1. Unicode 字符串字面量支持
char *str = u8"Hello, 世界"; // UTF-8 字符串
  1. 变长数组类型(VLA)成为可选特性

  2. memccpystrdupstrndup 函数进入标准库

  3. 无符号整数环绕规则的明确化

  4. 允许在 for 循环初始化语句中声明多个变量

for (int i = 0, j = 0; i < 10; i++, j++)
  1. 对齐说明符 alignas
alignas(8) int x;
  1. 类型运算符 typeof
typeof(int) a = 34;

GNU C 扩展
#

完整列表参考:https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html

语句表达式: Statements and Declarations in Expressions
#

https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html

括号封装的单条或多条复合语句,用作表达式,在定义宏时非常有用。

// 语法:
//  ({ 表达式1; 表达式2; 表达式3; })

// 括号封装的单条语句
#define max(a,b) ((a) > (b) ? (a) : (b))

// 括号封装的多条复合语句,多条语句需要用大括号 block 包围
({ int y = foo (); int z;
  if (y > 0) z = y;
  else z = - y;
  z;
  })

#define maxint(a,b)					\
  ({int _a = (a), _b = (b); _a > _b ? _a : _b; })

#define maxint3(a, b, c)						\
  ({int _a = (a), _b = (b), _c = (c); maxint (maxint (_a, _b), _c); })

#define macro(a)  ({__typeof__(a) b = (a); b + 3; })
template<typename T> T function(T a) { T b = a; return b + 3; }

void foo ()
{
  macro (X ());
  function (X ());
}

// 语句表达式也支持 goto 跳转
int main(void)
{
      int sum = 0;
      sum =
	({
		int s = 0;
		for( int i = 0; i < 10; i++)
			s = s + i;
		goto here;
		s;
	});
      printf("sum = %d\n", sum);
here:
      printf("here:\n");
      printf("sum = %d\n", sum);
      return 0;
}

宏:

// 良好
#define MAX(x,y) ((x) > (y) ? (x) : (y))

int main(void)
{
        int i = 2;
        int j = 6;
        printf("max=%d", MAX(i++,j++)); // 展开后两次自增运算
        return 0;
}
// 解决办法: 使用语句表达式
#define MAX(x,y)({				\
			int _x = x;		\
			int _y = y;		\
			_x > _y ? _x : _y;	\
		})
int main(void)
{
        int i = 2;
        int j = 6;
        printf("max=%d", MAX(i++,j++));
        return 0;
}
// 上面的 MAX 只适用于 int 类型

// 优秀: 适用于任意类型
#define MAX(type,x,y)({				\
			type _x = x;		\
			type _y = y;		\
			_x > _y ? _x : _y;	\
		})
int main(void)
{
        int i = 2;
        int j = 6;
        printf("max=%d\n", MAX(int, i++, j++));
        printf("max=%f\n", MAX(float, 3.14, 3.15));
        return 0;
}

// 更好! 使用 GNU 提供的 typeof 扩展
#define max(x, y) ({				\
			typeof(x) _x = (x);	\
			typeof(y) _y = (y);	\
			(void) (&_x == &_y);	\
			_x > _y ? _x : _y; })

内核中应用:

#define min_t(type, x, y) ({					\
   		type __min1 = (x);			\
   		type __min2 = (y);			\
   		__min1 < __min2 ? __min1 : __min2; })
#define max_t(type, x, y) ({					\
   		type __max1 = (x);			\
   		type __max2 = (y);			\
   		__max1 > __max2 ? __max1 : __max2; })

指定初始化: Designated Initializers
#

指定初始化:

int a[6] = { [4] = 29, [2] = 15 };
// 支持数组 index 范围
int widths[] = { [0 ... 9] = 1, [10 ... 99] = 2, [100] = 3 };

union foo { int i; double d; };
union foo f = { .d = 4 };
int a[6] = { [1] = v1, v2, [4] = v4 };

struct point ptarray[10] = { [2].y = yv2, [2].x = xv2, [0].x = xv0 };

// 内核示例
static const struct file_operations ab3100_otp_operations = {
	.open        = ab3100_otp_open,
	.read        = seq_read,
	.llseek      = seq_lseek,
	.release     = single_release,
};

typeof 表达式: Referring to a Type with typeof
#

typeof__typeof__ 返回类型或表达式值的类型。

C23 标准化了 typeof 和 auto 表达式:

typeof (x[0](1))
typeof (int *)

#define max(a,b)				\
({ typeof (a) _a = (a);			\
	typeof (b) _b = (b);		\
	_a > _b ? _a : _b; })

typeof (*x) y[4];

#define pointer(T)  typeof(T *)
#define array(T, N) typeof(T [N])
// array (pointer (char), 4) y;

示例:

int main(void)
{
  int i = 2;
  typeof(i) k = 6;
  int *p = &k;
  typeof(p) q = &i;
  printf("k = %d\n", k);
  printf("*p= %d\n", *p);
  printf("i = %d\n" ,i);
  printf("*q= %d\n", *q);
  return 0;
}

/* k  = 6 */
/* *p = 6 */
/* i  = 2 */
/* *q = 2 */

typeof (int *) y;     // 把 y 定义为指向 int 类型的指针,相当于int *y;
typeof (int)  *y;     //定义一个执行 int 类型的指针变量 y
typeof (*x) y;        //定义一个指针 x 所指向类型 的指针变量y
typeof (int) y[4];    //相当于定义一个:int y[4]
typeof (*x) y[4];     //把 y 定义为指针 x 指向的数据类型的数组
typeof (typeof (char *)[4]) y;//相当于定义字符指针数组:char *y[4];
typeof(int x[4]) y;  //相当于定义:int y[4]

#define MAX(x,y)({				\
			typeof(x) _x = x;	\
			typeof(x) _y = y;	\
			_x > _y ? _x : _y;	\
		})
int main(void)
{
        int i = 2;
        int j = 6;
        printf("max: %d\n", MAX(i, j));
        printf("max: %f\n", MAX(3.14, 3.15));
        return 0;
}

#define swap(a, b)				\
	do {					\
        typeof(a) __tmp = (a);  \
(a) = (b);         \
(b) = __tmp; \
} while (0)

内核的 container_of 宏:

  1. type:结构体类型
  2. member:结构体内的成员
  3. ptr:结构体内成员member的地址
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
#define  container_of(ptr, type, member) ({				\
   		const typeof( ((type *)0)->member ) *__mptr = (ptr); \
   		(type *)( (char *)__mptr - offsetof(type,member) );})

struct student
{
       int age;
       int num;
       int math;
};
int main(void)
{
   struct student stu = { 20, 1001, 99};
   int *p = &stu.math;
   struct student *stup = NULL;
   stup = container_of( p, struct student, math);
   printf("%p\n", stup);
   printf("age: %d\n", stup->age);
   printf("num: %d\n", stup->num);
   return 0;
}

本地标签 Locally Declared Labels
#

https://gcc.gnu.org/onlinedocs/gcc/Local-Labels.html

label 默认是 func 作用域, 而该扩展定义的 label 只具有 block 作用域,在宏定义式非常有用:

// 先声明 local lable

__label__ label;
__label__ label1, label2, /* … */;

#define SEARCH(value, array, target)					\
	do {								\
		__label__ found;					\
		typeof (target) _SEARCH_target = (target);		\
		typeof (*(array)) *_SEARCH_array = (array);		\
		int i, j;						\
		int value;						\
		for (i = 0; i < max; i++)				\
			for (j = 0; j < max; j++)			\
				if (_SEARCH_array[i][j] == _SEARCH_target) \
				{ (value) = i; goto found; }		\
		(value) = -1;						\
	found:;								\
	} while (0)

// 等效的, 用语句表达式来重写:
#define SEARCH(array, target)						\
	({								\
		__label__ found;					\
		typeof (target) _SEARCH_target = (target);		\
		typeof (*(array)) *_SEARCH_array = (array);		\
		int i, j;						\
		int value;						\
		for (i = 0; i < max; i++)				\
			for (j = 0; j < max; j++)			\
				if (_SEARCH_array[i][j] == _SEARCH_target) \
				{ value = i; goto found; }		\
		value = -1;						\
	found:								\
		value;							\
	})

Labels as Values
#

https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

使用 && 运算符来获得函数的 label 地址,返回 void * 类型指针,它是常量,可以用在任何需要常量值或指针的地方:

void *ptr;
/* … */
ptr = &&foo;

使用 goto 跳转来使用改值:

goto *ptr;

示例:

// 创建一个数组
static void *array[] = { &&foo, &&bar, &&hack };
goto *array[i];

static const int array[] = { &&foo - &&foo, &&bar - &&foo, &&hack - &&foo };
goto *(&&foo + array[i]);

Nested Functions
#

C 不允许在函数中定义嵌套函数,而该扩展则允许定义嵌套函数(GNU C++ 不支持嵌套函数)。

嵌套函数名称只在所定义的 block 中有效,嵌套函数可以访问所在函数的变量(称为 lexical scoping)。

foo (double a, double b)
{
   double square (double z) { return z * z; }
   return square (a) + square (b);
}

bar (int *array, int offset, int size)
{
   int access (int *array, int index){ return array[index + offset]; }
   int i;
   /* … */
   for (i = 0; i < size; i++)
   	/* … */ access (array, i) /* … */
}

嵌套函数可以跳转到所在函数的 local label:

bar (int *array, int offset, int size)
{
	__label__ failure;
	int access (int *array, int index)
	{
		if (index > size)
			goto failure;
		return array[index + offset];
	}
	int i;
	/* … */
	for (i = 0; i < size; i++)
		/* … */ access (array, i) /* … */
			/* … */
			return 0;

	/* Control comes here from access
	   if it detects an error.  */
failure:
	return -1;
}

Conditionals with Omitted Operands
#

x ? : y
// 等效于
x ? x : y

Arrays of Length Zero
#

C99 标准化了零长数组的功能:

struct line {
	int length;
	char contents[0]; // 占用空间为 0
};

struct line *thisline = (struct line *) malloc (sizeof (struct line) + this_length);
thisline->length = this_length;

Structures with No Members
#

该 struct 的大小为 0.

struct empty {};

Unions with Flexible Array Members
#

GCC permits a C99 flexible array member (FAM) to be in a union:

union with_fam {
  int a;
  int b[];
};

Structures with only Flexible Array Members
#

GCC permits a C99 flexible array member (FAM) to be alone in a structure:

struct only_fam {
 int b[]; // 大小为 0
};

Arrays of Variable Length
#

该扩展可变长度数组(VLA),C99 标准化支持了该特性:

FILE * concat_fopen (char *s1, char *s2, char *mode)
{
	char str[strlen (s1) + strlen (s2) + 1];
	strcpy (str, s1);
	strcat (str, s2);
	return fopen (str, mode);
}

// 函数参数也可以使用可变长度
struct entry tester (int len, char data[len][len])
{
	/* … */
}

Macros with a Variable Number of Arguments.
#

// C99 标准
#define debug(format, ...) fprintf (stderr, format, __VA_ARGS__)

// GNU C 扩展
#define debug(format, args...) fprintf (stderr, format, args)
#define debug(format, ...) fprintf (stderr, format, ## __VA_ARGS__)

非常量初始化 Non-Constant Initializers
#

foo (float f, float g)
{
  float beat_freqs[2] = { f-g, f+g };
  /* … */
}

复合字面量 Compound Literals
#

struct foo {int a; char b[2];} structure;
structure = ((struct foo) {x + y, 'a', 0});

// 等效为
{
  struct foo temp = {x + y, 'a', 0};
  structure = temp;
}

Case Ranges
#

case 'A' ... 'Z':

Mixed Declarations, Labels and Code
#

https://gcc.gnu.org/onlinedocs/gcc/Mixed-Labels-and-Declarations.html

ISO C99 and ISO C++ allow declarations and code to be freely mixed within compound statements. ISO C23 allows labels to be placed before declarations and at the end of a compound statement. As an extension, GNU C also allows all this in C90 mode. For example, you could do:

int i;
/* … */
i++;
int j = i + 2;

Determining the Alignment of Functions, Types or Variables
#

__alignof__ 定义 func,obj 和 type 的对齐要求。C11 提供了 _Alignof 功能:

#include <stdalign.h>
#include <stddef.h>
#include <stdio.h>

int main(void)
{
	printf("Alignment of char = %zu\n", alignof(char));
	printf("Alignment of max_align_t = %zu\n", alignof(max_align_t));
	printf("alignof(float[10]) = %zu\n", alignof(float[10]));
	printf("alignof(struct{char c; int n;}) = %zu\n", alignof(struct {char c; int n;}));
}

An Inline Function is As Fast As a Macro
#

GNU C 扩展为函数提供了 inline、noinline 标识。如果在头文件中,可以使用 __inline__ 而非 inline:

C99 提供了内置 inline 支持。

static inline int inc (int *a)
{
  return (*a)++;
}

Getting the Return or Frame Address of a Function
#

// Built-in Function:
void * __builtin_return_address (unsigned int level);

// Built-in Function:
void * __builtin_extract_return_addr (void *addr);

// Built-in Function:
void * __builtin_frame_address (unsigned int level);

// Built-in Function:
void * __builtin_stack_address ();

Support for offsetof
#

primary:
      "__builtin_offsetof" "(" typename "," offsetof_member_designator ")"
offsetof_member_designator:
        identifier
      | offsetof_member_designator "." identifier
      | offsetof_member_designator "[" expr "]"

#define offsetof(type, member)  __builtin_offsetof (type, member)

Alternate Keywords
#

https://gcc.gnu.org/onlinedocs/gcc/Alternate-Keywords.html

-ansi and the various -std options disable certain keywords . This causes trouble when you want to use GNU C extensions, or a general-purpose header file that should be usable by all programs, including ISO C programs.

The keywords asm, typeof and inline are not available in programs compiled with -ansi or -std (although inline can be used in a program compiled with -std=c99 or a later standard). The ISO C99 keyword restrict is only available when -std=gnu99 (which will eventually be the default) or -std=c99 (or the equivalent -std=iso9899:1999), or an option for a later standard version, is used.

The way to solve these problems is to put ‘__’ at the beginning and end of each problematical keyword. For example, use __asm__ instead of asm, and __inline__ instead of inline.

Other C compilers won’t accept these alternative keywords; if you want to compile with another compiler, you can define the alternate keywords as macros to replace them with the customary keywords. It looks like this:

#ifndef __GNUC__
#define __asm__ asm
#endif

-pedantic and other options cause warnings for many GNU C extensions. You can suppress such warnings using the keyword __extension__. Specifically:

  1. Writing __extension__ before an expression prevents warnings about extensions within that expression. In C, writing: [[__extension__ …]],suppresses warnings about using ‘[[]]’ attributes in C versions that predate C23.

__extension__ has no effect aside from this.

Function Names as Strings
#

https://gcc.gnu.org/onlinedocs/gcc/Function-Names.html

C99 提供预定义的 __func__ 常量表达式,它代表当前函数名字符串:

static const char __func__[] = "function-name";

在此之前,GNU C 扩展提供了 __FUNCTION__ 宏常量,也表示当前函数名字符串:

extern "C" int printf (const char *, ...);

class a {
 public:
  void sub (int i)
    {
      printf ("__FUNCTION__ = %s\n", __FUNCTION__);
      printf ("__PRETTY_FUNCTION__ = %s\n", __PRETTY_FUNCTION__);
    }
};

int
main (void)
{
  a ax;
  ax.sub (0);
  return 0;
}

/* __FUNCTION__ = sub */
/* __PRETTY_FUNCTION__ = void a::sub(int) */

Binary Constants using the ‘0b’ Prefix
#

https://gcc.gnu.org/onlinedocs/gcc/Binary-constants.html

i =       42;
i =     0x2a;
i =      052;
i = 0b101010;

C 标准预定义的 Macros
#

https://en.cppreference.com/w/c/preprocessor/replace

STDC : expands to the integer constant 1. This macro is intended to indicate a conforming implementation (macro constant) STDC_VERSION (C95):expands to an integer constant of type long whose value increases with each version of the C standard:

199409L (C95)
199901L (C99)
201112L (C11)
201710L (C17)
202311L (C23)
(macro constant)

STDC_HOSTED (C99): expands to the integer constant 1 if the implementation is hosted (runs under an OS), ​0​ if freestanding (runs without an OS) (macro constant) FILE: expands to the name of the current file, as a character string literal, can be changed by the #line directive (macro constant) LINE : expands to the source file line number, an integer constant, can be changed by the #line directive (macro constant) DATE : expands to the date of translation, a character string literal of the form “Mmm dd yyyy”. The name of the month is as if generated by asctime and the first character of “dd” is a space if the day of the month is less than 10 (macro constant) TIME : expands to the time of translation, a character string literal of the form “hh:mm:ss”, as in the time generated by asctime() (macro constant) STDC_UTF_16 (C23) expands to 1 to indicate that char16_t use UTF-16 encoding (macro constant) STDC_UTF_32 (C23) expands to 1 to indicate that char32_t use UTF-32 encoding (macro constant) STDC_EMBED_NOT_FOUND____STDC_EMBED_FOUND____STDC_EMBED_EMPTY (C23) expand to ​0​, 1, and 2, respectively (macro constant)

The following additional macro names may be predefined by an implementation: STDC_ISO_10646 (C99) expands to an integer constant of the form yyyymmL, if wchar_t uses Unicode; the date indicates the latest revision of Unicode supported (macro constant) STDC_IEC_559 (C99) expands to 1 if IEC 60559 is supported (deprecated)(since C23) (macro constant) STDC_IEC_559_COMPLEX (C99) expands to 1 if IEC 60559 complex arithmetic is supported (deprecated)(since C23)(macro constant) STDC_UTF_16 (C11) expands to 1 if char16_t use UTF-16 encoding (macro constant) STDC_UTF_32 (C11) expands to 1 if char32_t use UTF-32 encoding (macro constant) STDC_MB_MIGHT_NEQ_WC (C99) expands to 1 if ‘x’ == L’x’ might be false for a member of the basic character set, such as on EBCDIC-based systems that use Unicode for wchar_t (macro constant) STDC_ANALYZABLE (C11) expands to 1 if analyzability is supported (macro constant) STDC_LIB_EXT1 (C11) expands to an integer constant 201112L if bounds-checking interfaces are supported (macro constant) STDC_NO_ATOMICS (C11) expands to 1 if atomic types and atomic operations library are not supported (macro constant) STDC_NO_COMPLEX (C11) expands to 1 if complex types and complex math library are not supported (macro constant) STDC_NO_THREADS(C11) expands to 1 if multithreading is not supported(macro constant) STDC_NO_VLA(C11) expands to 1 if variable-length arrays and variably-modified types(until C23)of automatic storage duration(since C23) are not supported(macro constant) STDC_IEC_60559_BFP(C23) expands to 202311L if IEC 60559 binary floating-point arithmetic is supported(macro constant) STDC_IEC_60559_DFP(C23) expands to 202311L if IEC 60559 decimal floating-point arithmetic is supported(macro constant) STDC_IEC_60559_COMPLEX(C23) expands to 202311L if IEC 60559 complex arithmetic is supported(macro constant) STDC_IEC_60559_TYPES(C23) expands to 202311L if IEC 60559 interchange and extended types are supported (macro constant)

The values of these macros (except for FILE and LINE) remain constant throughout the translation unit. Attempts to redefine or undefine these macros result in undefined behavior.

The predefined variable __func__ (see function definition for details) is not a preprocessor macro, even though it is sometimes used together with FILE and LINE, e.g. by assert.

Other Built-in Functions Provided by GCC
#

https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

所有 GNU 内置函数都以 __builtin 开头,如 __builtin_fabsfn

Pragmas Accepted by GCC
#

https://gcc.gnu.org/onlinedocs/gcc/Pragmas.html

示例
#

// 零长数组(可变长数组)
struct line {
   int length;
   char contents[0];
};

struct line *thisline = (struct line *)malloc (sizeof (struct line) + this_length);
thisline->length = this_length;

// 变长数组,数组长度可以是变量或表达式而非常量值;
FILE * concat_fopen (char *s1, char *s2, char *mode)
{
   char str[strlen (s1) + strlen (s2) + 1];
   strcpy (str, s1);
   strcat (str, s2);
   return fopen (str, mode);
}

// 数组成员的初始化可以是非常量表达式
void foo (float f, float g)
{
   float beat_freqs[2] = { f-g, f+g };
/* . . . */
}

// 复合字面量(非 GNU C 扩展的情况下,只允许在声明变量时使用字面量来初始化)
struct foo {int a; char b[2];} structure;
structure = (struct foo) {x + y, 'a', 0};
// 等效为
{
   struct foo temp = {x + y, 'a', 0};
   structure = temp;
}

// 也可以对数组使用复合字面量初始化
char **foo = (char *[]) { "x", "y", "z" };

// Compound literals for scalar types and union types are also allowed.
int i = ++(int) { 1 };

// C99 要求静态变量必须用常量来初始化, 但是 GNU C 允许使用复合字面量来初始化;
static struct foo x = (struct foo) {1, 'a', 'b'};
static int y[] = (int []) {1, 2, 3};
static int z[] = (int [3]) {1};

// 复合类型的数组初始化:
// 结构体数组
struct point
{
   int x, y;
};
struct point point_array[2] = { {4, 5}, {8, 9} };
point_array[0].x = 3;

// 使用变量作为成员初始值
struct point ptarray[10] = { [2].y = yv2, [2].x = xv2, [0].x = xv0 };

// 多维数组
int two_dimensions[2][5] = { {1, 2, 3, 4, 5}, {6, 7, 8, 9, 10} };

// 联合数组
union numbers
{
   int i;
   float f;
};
union numbers number_array [3] = { {3}, {4}, {5} };


// case range,... 前后的空格是必须的。
case 'A' ... 'Z':
case 1 ... 5:

// 条件运算符忽略操作数
x?:y // 约等于 x?x:y, 但是 x 值会被求值一次

GNU C attribute :noexport:
#

GNU C 支持为 Function、Variable、Type 指定 attribute:

  • GNU 和 IBM 扩展语法(不以分号结尾): attribute ((attribute-list))
  • 可以在整个声明语句之前, 声明的标识符名称之后, 在函数完整的声明之前或之后指定 attribute;

GNU C 支持使用 atttribute((ATTRIBUTE)) 来声明一个函数、变量或类型的特殊属性。主要用途就是指导编译器在编译程序时进行特定方面的优化或代码检查。比如,我们可以通过使用属性声明指定某个变量的数据边界对齐方式。

atttribute 后面必须是两对小括号,括号里面的 ATTRIBUTE 代表要声明的属性,例如:

  • section
  • aligned
  • packed
  • format
  • weak
  • alias
  • noinline
  • always_inline
  • ……

在这些属性中,aligned 和 packed 用来显式指定一个变量的存储边界对齐方式:

char c2 __attribute__((aligned(8)) = 4;
int global_val __attribute__((section(".data")));

char c2 __attribute__((packed,aligned(4)));
char c2 __attribute__((packed,aligned(4))) = 4;
__attribute__((packed,aligned(4))) char c2 = 4;
char c2 = 4 __attribute__((packed,aligned(4)));

int global_val = 8;
int uninit_val __attribute__((section(".data")));
int main(void)
{
   return 0;
}

char __image_copy_start[0] __attribute__((section(".__image_copy_start")));
char __image_copy_end[0] __attribute__((section(".__image_copy_end")));

int a = 1;
int b = 2;
char c1 = 3;
char c2 __attribute__((aligned(4))) = 4;

int main(void) {
   printf("a: %p\n", &a);
   printf("b: %p\n", &b);
   printf("c1:%p\n", &c1);
   printf("c2:%p\n", &c2);
   return 0;
}

struct data {
 char a;
 short b __attribute__((aligned(4)));
 int c ;
};

struct data{
 char a;
 short b;
 int c ;
}__attribute__((aligned(16)));

https://www.zhaixue.cc/c-arm/c-arm-align.html

通过 aligned 属性,我们可以显式指定一个变量的对齐方式,那么,编译器就一定会按照我们指定的大小对齐吗?非也!

我们通过这个属性声明,其实只是建议编译器按照这种大小地址对齐,但不能超过编译器允许的最大值。

char c1 = 3;
char c2 __attribute__((aligned(16))) = 4 ;
int main(void)
{
   printf("c1: %p\n", &c1);
   printf("c2: %p\n", &c2);
   return 0;
}

在这个程序中,我们指定 char 型的变量 c2 以16字节对齐,然后运行结果为:

c1: 00402000
c2: 00402010

我们可以看到,编译器给 c2 分配的地址就是16字节地址对齐的,如果我们继续修改 c2 变量按32字节对齐,你会发现程序的运行结果不再会有变化,编译器还会分配一个16字节对齐的地址,因为已经超过编译器允许的最大值了。

属性声明:packed

aligned 属性一般用来增大变量的地址对齐,元素之间因为地址对齐会造成一定的内存空洞。而 packed 属性则与之相反,用来减少地址对齐,用来指定变量或类型使用最可能小的地址对齐方式。

struct data{
     char a;
     short b __attribute__((packed));
     int c __attribute__((packed));
 };
 int main(void)
 {
     struct data s;
     printf("size: %d\n",sizeof(s));
     printf("&s.a: %p\n",&s.a);
     printf("&s.b: %p\n",&s.b);
     printf("&s.c: %p\n",&s.c);
 }

在这个程序中,我们将结构体的成员 b 和 c 使用 packed 属性声明,就是告诉编译器,尽量使用最可能小的地址对齐给它们分配地址,尽可能地减少内存空洞。程序的运行结果如下。

 size: 7
 &s.a: 0028FF30
 &s.b: 0028FF31
 &s.c: 0028FF33

通过结果我们看到,结构体内各个成员地址的分配,使用最小1字节的对齐方式,导致整个结构体的大小只有7个字节。

这个特性在底层驱动开发中还是非常有用的。比如,你想定义一个结构体,封装一个 IP 控制器的各种寄存器。在 ARM 芯片中,每一个控制器的寄存器地址空间一般是连续存在的。如果考虑数据对齐,结构体内有空洞,这样就跟实际连续的寄存器地址不一致了,使用 packed 就可以避免这个问题,结构体的每个成员都紧挨着依次分配存储地址,这样就避免了各个成员元素因地址对齐而造成的内存空洞。

 struct data{
     char a;
     short b ;
     int c ;
 }__attribute__((packed));

我们对整个结构体添加 packed 属性,和分别对每个成员添加 packed 属性,效果是一样的。修 改结构体后,程序的运行结果跟上面程序运行结果相同——结构体的大小为7,结构体内各成员地址 相同。

Linux内核中 aligned、packed 属性声明

在 Linux 内核中,我们经常看到 aligned 和 packed 一起使用,即对一个变量或类型同时使用 aligned 和 packed 属性声明。这样做的好处是,既避免了结构体内因地址对齐产生的内存空洞,又指定了整个结构体的对齐方式。

 struct data{
     char a;
     short b ;
     int c ;
 } __attribute__((packed,aligned(8)));
 int main(void)
 {
     struct data s;
     printf("size: %d\n", sizeof(s));
     printf("&s.a: %p\n", &s.a);
     printf("&s.b: %p\n", &s.b);
     printf("&s.c: %p\n", &s.c);
 }

程序运行结果如下。

 size: 8
 &s.a: 0028FF30
 &s.b: 0028FF31
 &s.c: 0028FF33

在这个程序中,结构体 data 虽然使用 packed 属性声明,整个长度变为7,但是我们同时又使用了 aligned(8) 指定其按8字节地址对齐,所以编译器要在结构体后面填充1个字节,这样整个结构体的大小就变为8字节,按8字节地址对齐。

  1. 函数属性:
// 函数完整声明之前,注意:__attribute__ 不以分号结尾。
__attribute__ ((access (read_only, 1)))
int puts (const char*);

// 函数完整声明之后
void f () __attribute__ ((weak, alias ("__f")));
void* my_memalign (size_t, size_t) __attribute__ ((alloc_align (1)));

#define StrongAlias(TargetFunc, AliasDecl)				\
   extern __typeof__ (TargetFunc) AliasDecl			\
   __attribute__ ((alias (#TargetFunc), copy (TargetFunc)));

extern __attribute__ ((alloc_size (1), malloc, nothrow))
void* allocate (size_t);

StrongAlias (allocate, alloc);

int old_fn () __attribute__ ((deprecated));
void fatal () __attribute__ ((noreturn));
  1. 类型属性:
// struct 属性位于标识符之前
struct __attribute__ ((aligned)) S { short f[3]; };
struct __attribute__ ((aligned (8))) S { short f[3]; };
struct my_unpacked_struct
{
   char c;
   int i;
};

struct __attribute__ ((__packed__)) my_packed_struct
{
   char c;
   int i;
   struct my_unpacked_struct s;
};

// typdef 属性位于新类型名之后
typedef int more_aligned_int __attribute__ ((aligned (8)));

typedef int T1 __attribute__ ((deprecated));
T1 x;

typedef T1 T2;
T2 y;

typedef T1 T3 __attribute__ ((deprecated));

// 变量属性位于变量名之后
T3 z __attribute__ ((deprecated));
  1. 变量属性:位于标识符之后
int var_target;
extern int __attribute__ ((alias ("var_target"))) var_alias;

int x __attribute__ ((aligned (16))) = 0;

struct foo { int x[2] __attribute__ ((aligned (8))); };

struct __attribute__ ((aligned (16))) foo
{
   int i1;
   int i2;
   unsigned long long x __attribute__ ((warn_if_not_aligned (16)));
};

struct foo
{
   char a;
   int x[2] __attribute__ ((packed));
};


struct duart a __attribute__ ((section ("DUART_A"))) = { 0 };
struct duart b __attribute__ ((section ("DUART_B"))) = { 0 };

char stack[10000] __attribute__ ((section ("STACK"))) = { 0 };
int init_data __attribute__ ((section ("INITDATA")));

C23 开始支持 [[xxx]] 格式的 attribe 定义,使用该语法时, GCC 相关的 attribute 必须使用 gnu:: 前缀(带前缀的表示 C23 标准的 attribe)

[[gnu::always_inline]] [[gnu::hot]] [[gnu::const]] [[nodiscard]]
inline int f(); // declare f with four attributes

[[noreturn]] void f()
{
   // Some code that does not return
   // back the control to the caller
   // In this case the function returns
   // back to the caller without a value
   // This is the reason why the
   // warning "noreturn' function does return' arises
}

[[deprecated("Reason for deprecation")]]

// For Class/Struct/Union
struct [[deprecated]] S;

// For Functions
[[deprecated]] void f();

// For namespaces
namespace [[deprecated]] ns{}

// For variables (including static data members)
[[deprecated]] int x;

// Set debug mode in compiler or 'R'
[[maybe_unused]] char mg_brk = 'D';

void process_alert(Alert alert)
{
   switch (alert) {
   case Alert::Red:
   	evacuate();
   	// Compiler emits a warning here
   	// thinking it is done by mistake
   case Alert::Orange:
   	trigger_alarm();
   	// this attribute needs semicolon
   	[[fallthrough]];
   	// Warning suppressed by [[fallthrough]]
   case Alert::Yellow:
   	record_alert();
   	return;
   case Alert::Green:
   	return;
   }
}

Attribute Syntax
#

https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html

GCC provides two different ways to specify attributes: the standard C and C++ syntax using double square brackets, and the older GNU extension syntax using the attribute keyword, which predates the adoption of the standard syntax and is still widely used in older code.

The standard ‘[[]]’ attribute syntax is recognized by GCC’s default language dialect for both C and C++. More specifically, this syntax was first introduced in the C++11 language standard (see Language Standards Supported by GCC), and is supported by GCC in C++ code with -std=c++11 or -std=gnu++11 or later. It is also part of the C23 language standard and is supported when compiling C code with -std=c23 or -std=gnu17 or later.

When using GNU-specific attributes in the standard syntax, you must prefix their names with ‘gnu::’, such as gnu::section. Refer to the relevant language standards for exact details on the placement of ‘[[]]’ attributes within your code, as they differ in some details from the rules for the GNU attribute syntax.

The remainder of this section describes the details of the GNU extension attribute syntax, and the constructs to which attribute specifiers bind, for the C language. Some details may vary for C++ and Objective-C. Because of limitations in the grammar for attributes, some forms described here may not be successfully parsed in all cases.

There are some problems with the semantics of attributes in C++. For example, there are no manglings for attributes, although they may affect code generation, so problems may arise when attributed types are used in conjunction with templates or overloading. Similarly, typeid does not distinguish between types with different attributes. Support for attributes in C++ may be restricted in future to attributes on declarations only, but not on nested declarators.

See Declaring Attributes of Functions, for details of the semantics of attributes applying to functions. See Specifying Attributes of Variables, for details of the semantics of attributes applying to variables. See Specifying Attributes of Types, for details of the semantics of attributes applying to structure, union and enumerated types. See Label Attributes, for details of the semantics of attributes applying to labels. See Enumerator Attributes, for details of the semantics of attributes applying to enumerators. See Statement Attributes, for details of the semantics of attributes applying to statements.

An attribute specifier is of the form attribute ((attribute-list)). An attribute list is a possibly empty comma-separated sequence of attributes, where each attribute is one of the following:

Empty. Empty attributes are ignored.
An attribute name (which may be an identifier such as unused, or a reserved word such as const).
An attribute name followed by a parenthesized list of parameters for the attribute. These parameters take one of the following forms:
    An identifier. For example, mode attributes use this form.
    An identifier followed by a comma and a non-empty comma-separated list of expressions. For example, format attributes use this form.
    A possibly empty comma-separated list of expressions. For example, format_arg attributes use this form with the list being a single integer constant expression, and alias attributes use this form with the list being a single string constant.

An attribute specifier list is a sequence of one or more attribute specifiers, not separated by any other tokens.

You may optionally specify attribute names with ‘__’ preceding and following the name. This allows you to use them in header files without being concerned about a possible macro of the same name. For example, you may use the attribute name noreturn instead of noreturn. Label Attributes

In GNU C, an attribute specifier list may appear after the colon following a label, other than a case or default label. GNU C++ only permits attributes on labels if the attribute specifier is immediately followed by a semicolon (i.e., the label applies to an empty statement). If the semicolon is missing, C++ label attributes are ambiguous, as it is permissible for a declaration, which could begin with an attribute list, to be labelled in C++. Declarations cannot be labelled in C90 or C99, so the ambiguity does not arise there. Enumerator Attributes

In GNU C, an attribute specifier list may appear as part of an enumerator. The attribute goes after the enumeration constant, before ‘=’, if present. The optional attribute in the enumerator appertains to the enumeration constant. It is not possible to place the attribute after the constant expression, if present. Statement Attributes

In GNU C, an attribute specifier list may appear as part of a null statement. The attribute goes before the semicolon. Some attributes in new style syntax are also supported on non-null statements. Type Attributes

An attribute specifier list may appear as part of a struct, union or enum specifier. It may go either immediately after the struct, union or enum keyword, or after the closing brace. The former syntax is preferred. Where attribute specifiers follow the closing brace, they are considered to relate to the structure, union or enumerated type defined, not to any enclosing declaration the type specifier appears in, and the type defined is not complete until after the attribute specifiers. All other attributes

Otherwise, an attribute specifier appears as part of a declaration, counting declarations of unnamed parameters and type names, and relates to that declaration (which may be nested in another declaration, for example in the case of a parameter declaration), or to a particular declarator within a declaration. Where an attribute specifier is applied to a parameter declared as a function or an array, it should apply to the function or array rather than the pointer to which the parameter is implicitly converted, but this is not yet correctly implemented.

Any list of specifiers and qualifiers at the start of a declaration may contain attribute specifiers, whether or not such a list may in that context contain storage class specifiers. (Some attributes, however, are essentially in the nature of storage class specifiers, and only make sense where storage class specifiers may be used; for example, section.) There is one necessary limitation to this syntax: the first old-style parameter declaration in a function definition cannot begin with an attribute specifier, because such an attribute applies to the function instead by syntax described below (which, however, is not yet implemented in this case). In some other cases, attribute specifiers are permitted by this grammar but not yet supported by the compiler. All attribute specifiers in this place relate to the declaration as a whole. In the obsolescent usage where a type of int is implied by the absence of type specifiers, such a list of specifiers and qualifiers may be an attribute specifier list with no other specifiers or qualifiers.

At present, the first parameter in a function prototype must have some type specifier that is not an attribute specifier; this resolves an ambiguity in the interpretation of void f(int (attribute((foo)) x)), but is subject to change. At present, if the parentheses of a function declarator contain only attributes then those attributes are ignored, rather than yielding an error or warning or implying a single parameter of type int, but this is subject to change.

An attribute specifier list may appear immediately before a declarator (other than the first) in a comma-separated list of declarators in a declaration of more than one identifier using a single list of specifiers and qualifiers. Such attribute specifiers apply only to the identifier before whose declarator they appear. For example, in

attribute((noreturn)) void d0 (void), attribute((format(printf, 1, 2))) d1 (const char *, …), d2 (void);

the noreturn attribute applies to all the functions declared; the format attribute only applies to d1.

An attribute specifier list may appear immediately before the comma, ‘=’, or semicolon terminating the declaration of an identifier other than a function definition. Such attribute specifiers apply to the declared object or function. Where an assembler name for an object or function is specified (see Controlling Names Used in Assembler Code), the attribute must follow the asm specification.

An attribute specifier list may, in future, be permitted to appear after the declarator in a function definition (before any old-style parameter declarations or the function body).

Attribute specifiers may be mixed with type qualifiers appearing inside the [] of a parameter array declarator, in the C99 construct by which such qualifiers are applied to the pointer to which the array is implicitly converted. Such attribute specifiers apply to the pointer, not to the array, but at present this is not implemented and they are ignored.

An attribute specifier list may appear at the start of a nested declarator. At present, there are some limitations in this usage: the attributes correctly apply to the declarator, but for most individual attributes the semantics this implies are not implemented. When attribute specifiers follow the * of a pointer declarator, they may be mixed with any type qualifiers present. The following describes the formal semantics of this syntax. It makes the most sense if you are familiar with the formal specification of declarators in the ISO C standard.

Consider (as in C99 subclause 6.7.5 paragraph 4) a declaration T D1, where T contains declaration specifiers that specify a type Type (such as int) and D1 is a declarator that contains an identifier ident. The type specified for ident for derived declarators whose type does not include an attribute specifier is as in the ISO C standard.

If D1 has the form ( attribute-specifier-list D ), and the declaration T D specifies the type “derived-declarator-type-list Type” for ident, then T D1 specifies the type “derived-declarator-type-list attribute-specifier-list Type” for ident.

If D1 has the form * type-qualifier-and-attribute-specifier-list D, and the declaration T D specifies the type “derived-declarator-type-list Type” for ident, then T D1 specifies the type “derived-declarator-type-list type-qualifier-and-attribute-specifier-list pointer to Type” for ident.

For example,

void (attribute((noreturn)) ****f) (void);

specifies the type “pointer to pointer to pointer to pointer to non-returning function returning void”. As another example,

char *attribute((aligned(8))) *f;

specifies the type “pointer to 8-byte-aligned pointer to char”. Note again that this does not work with most attributes; for example, the usage of ‘aligned’ and ‘noreturn’ attributes given above is not yet supported.

For compatibility with existing code written for compiler versions that did not implement attributes on nested declarators, some laxity is allowed in the placing of attributes. If an attribute that only applies to types is applied to a declaration, it is treated as applying to the type of that declaration. If an attribute that only applies to declarations is applied to the type of a declaration, it is treated as applying to that declaration; and, for compatibility with code placing the attributes immediately before the identifier declared, such an attribute applied to a function return type is treated as applying to the function type, and such an attribute applied to an array element type is treated as applying to the array type. If an attribute that only applies to function types is applied to a pointer-to-function type, it is treated as applying to the pointer target type; if such an attribute is applied to a function return type that is not a pointer-to-function type, it is treated as applying to the function type.

Declaring Attributes of Functions
#

https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html

In GNU C and C++, you can use function attributes to specify certain function properties that may help the compiler optimize calls or check code more carefully for correctness. For example, you can use attributes to specify that a function never returns (noreturn), returns a value depending only on the values of its arguments (const), or has printf-style arguments (format).

GCC provides two different ways to specify attributes: the traditional GNU syntax using attribute ((…))’ annotations, and the newer standard C and C++ syntax using ‘[[…]]’ with the ‘gnu::’ prefix on attribute names. Note that the exact rules for placement of attributes in your source code are different depending on which syntax you use. See Attribute Syntax, for details.

Common Function Attributes: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html

access (access-mode, ref-index) access (access-mode, ref-index, size-index)

__attribute__ ((access (read_only, 1)))
int puts (const char*);

__attribute__ ((access (read_only, 2, 3)))
void* memcpy (void*, const void*, size_t);

alias (“target”)

The alias attribute causes the declaration to be emitted as an alias for another
symbol, which must have been previously declared with the same type, and for
variables, also the same size and alignment. Declaring an alias with a different
type than the target is undefined and may be diagnosed. As an example, the
following declarations:
 void __f () { /* Do something. */; }
 void f () __attribute__ ((weak, alias ("__f")));
define ‘f’ to be a weak alias for ‘__f’. In C++, the mangled name for the target
must be used. It is an error if ‘__f’ is not defined in the same translation
unit.

This attribute requires assembler and object file support, and may not be
available on all targets.

aligned aligned (alignment)

The aligned attribute specifies a minimum alignment for the first instruction of
the function, measured in bytes. When specified, alignment must be an integer
constant power of 2. Specifying no alignment argument implies the ideal alignment
for the target. The __alignof__ operator can be used to determine what that is (see
Determining the Alignment of Functions, Types or Variables). The attribute has no
effect when a definition for the function is not provided in the same translation
unit.

The attribute cannot be used to decrease the alignment of a function previously
declared with a more restrictive alignment; only to increase it. Attempts to do
otherwise are diagnosed. Some targets specify a minimum default alignment for
functions that is greater than 1. On such targets, specifying a less restrictive
alignment is silently ignored. Using the attribute overrides the effect of the
-falign-functions (see Options That Control Optimization) option for this
function.

Note that the effectiveness of aligned attributes may be limited by inherent
limitations in the system linker and/or object file format. On some systems, the
linker is only able to arrange for functions to be aligned up to a certain
maximum alignment. (For some linkers, the maximum supported alignment may be very
very small.) See your linker documentation for further information.

The aligned attribute can also be used for variables and fields (see Specifying
Attributes of Variables.)

alloc_align (position)

The alloc_align attribute may be applied to a function that returns a pointer and takes at least one argument of an integer or enumerated type. It indicates that the returned pointer is aligned on a boundary given by the function argument at position. Meaningful alignments are powers of 2 greater than one. GCC uses this information to improve pointer alignment analysis.

The function parameter denoting the allocated alignment is specified by one constant integer argument whose number is the argument of the attribute. Argument numbering starts at one.

For instance,

void* my_memalign (size_t, size_t) __attribute__ ((alloc_align (1)));

declares that my_memalign returns memory with minimum alignment given by parameter 1.

alloc_size (position) alloc_size (position-1, position-2)

The alloc_size attribute may be applied to a function that returns a pointer and takes at least one argument of an integer or enumerated type. It indicates that the returned pointer points to memory whose size is given by the function argument at position-1, or by the product of the arguments at position-1 and position-2. Meaningful sizes are positive values less than PTRDIFF_MAX. GCC uses this information to improve the results of __builtin_object_size.

The function parameter(s) denoting the allocated size are specified by one or two integer arguments supplied to the attribute. The allocated size is either the value of the single function argument specified or the product of the two function arguments specified. Argument numbering starts at one for ordinary functions, and at two for C++ non-static member functions.

For instance,

void* my_calloc (size_t, size_t) __attribute__ ((alloc_size (1, 2)));
void* my_realloc (void*, size_t) __attribute__ ((alloc_size (2)));

declares that my_calloc returns memory of the size given by the product of parameter 1 and 2 and that my_realloc returns memory of the size given by parameter 2.

always_inline

Generally, functions are not inlined unless optimization is specified. For functions declared inline, this attribute inlines the function independent of any restrictions that otherwise apply to inlining. Failure to inline such a function is diagnosed as an error. Note that if such a function is called indirectly the compiler may or may not inline it depending on optimization level and a failure to inline an indirect call may or may not be diagnosed.

cold

The cold attribute on functions is used to inform the compiler that the function
is unlikely to be executed.

const

Calls to functions whose return value is not affected by changes to the observable state of the program and that have no observable effects on such state other than to return a value may lend themselves to optimizations such as common subexpression elimination. Declaring such functions with the const attribute allows GCC to avoid emitting some calls in repeated invocations of the function with the same argument values.

For example,

int square (int) __attribute__ ((const));

deprecated deprecated (msg)

The deprecated attribute results in a warning if the function is used anywhere in the source file. This is useful when identifying functions that are expected to be removed in a future version of a program. The warning also includes the location of the declaration of the deprecated function, to enable users to easily find further information about why the function is deprecated, or what they should do instead. Note that the warnings only occurs for uses:

int old_fn () __attribute__ ((deprecated));
int old_fn ();
int (*fn_ptr)() = old_fn;

error (“message”) warning (“message”)

If the error or warning attribute is used on a function declaration and a call to
such a function is not eliminated through dead code elimination or other
optimizations,

format (archetype, string-index, first-to-check)

The format attribute specifies that a function takes printf, scanf, strftime or strfmon style arguments that should be type-checked against a format string. For example, the declaration:

extern int
my_printf (void *my_object, const char *my_format, ...)
      __attribute__ ((format (printf, 2, 3)));

gnu_inline

This attribute should be used with a function that is also declared with the
inline keyword. It directs GCC to treat the function as if it were defined in
gnu90 mode even when compiling in C99 or gnu99 mode.

hot

The hot attribute on a function is used to inform the compiler that the function
is a hot spot of the compiled program.

noclone

This function attribute prevents a function from being considered for cloning—a mechanism that produces specialized copies of functions and which is (currently) performed by interprocedural constant propagation.

noinline

This function attribute prevents a function from being considered for inlining. It also disables some other interprocedural optimizations; it’s preferable to use the more comprehensive noipa attribute instead if that is your goal.

Even if a function is declared with the noinline attribute, there are optimizations other than inlining that can cause calls to be optimized away if it does not have side effects, although the function call is live. To keep such calls from being optimized away, put

asm ("");

noreturn

A few standard library functions, such as abort and exit, cannot return. GCC knows this automatically. Some programs define their own functions that never return. You can declare them noreturn to tell the compiler this fact. For example,

void fatal () __attribute__ ((noreturn));

void
fatal (/* … */)
{
  /* … */ /* Print error message. */ /* … */
  exit (1);
}

optimize (level, …) optimize (string, …)

The optimize attribute is used to specify that a function is to be compiled with
different optimization options than specified on the command line.

section (“section-name”)

Normally, the compiler places the code it generates in the text section. Sometimes, however, you need additional sections, or you need certain particular functions to appear in special sections. The section attribute specifies that a function lives in a particular section. For example, the declaration:

extern void foobar (void) __attribute__ ((section ("bar")));

puts the function foobar in the bar section.

Some file formats do not support arbitrary sections so the section attribute is

not available on all platforms. If you need to map the entire contents of a module to a particular section, consider using the facilities of the linker instead.

target (string, …)

Multiple target back ends implement the target attribute to specify that a function is to be compiled with different target options than specified on the command line. The original target command-line options are ignored. One or more strings can be provided as arguments. Each string consists of one or more comma-separated suffixes to the -m prefix jointly forming the name of a machine-dependent option. See Machine-Dependent Options.

The target attribute can be used for instance to have a function compiled with a different ISA (instruction set architecture) than the default. ‘#pragma GCC target’ can be used to specify target-specific options for more than one function. See Function Specific Option Pragmas, for details about the pragma.

For instance, on an x86, you could declare one function with the target("sse4.1,arch=core2") attribute and another with target("sse4a,arch=amdfam10"). This is equivalent to compiling the first function with -msse4.1 and -march=core2 options, and the second function with -msse4a and -march=amdfam10 options. It is up to you to make sure that a function is only invoked on a machine that supports the particular ISA it is compiled for (for example by using cpuid on x86 to determine what feature bits and architecture family are used).

int core2_func (void) __attribute__ ((__target__ ("arch=core2")));
int sse3_func (void) __attribute__ ((__target__ ("sse3")));

unused

This attribute, attached to a function, means that the function is meant to be possibly unused. GCC does not produce a warning for this function.

used

This attribute, attached to a function, means that code must be emitted for the
function even if it appears that the function is not referenced. This is useful,
for example, when the function is referenced only in inline assembly.

weak

The weak attribute causes a declaration of an external symbol to be emitted as a weak symbol rather than a global. This is primarily useful in defining library functions that can be overridden in user code, though it can also be used with non-function declarations. The overriding symbol must have the same type as the weak symbol. In addition, if it designates a variable it must also have the same size and alignment as the weak symbol. Weak symbols are supported for ELF targets, and also for a.out targets when using the GNU assembler and linker.

weakref weakref (“target”)

The weakref attribute marks a declaration as a weak reference. Without arguments, it should be accompanied by an alias attribute naming the target symbol. Alternatively, target may be given as an argument to weakref itself, naming the target definition of the alias. The target must have the same type as the declaration. In addition, if it designates a variable it must also have the same size and alignment as the declaration. In either form of the declaration weakref implicitly marks the declared symbol as weak. Without a target given as an argument to weakref or to alias, weakref is equivalent to weak (in that case the declaration may be extern).

/* Given the declaration: */
extern int y (void);

/* the following... */
static int x (void) __attribute__ ((weakref ("y")));

/* is equivalent to... */
static int x (void) __attribute__ ((weakref, alias ("y")));

/* or, alternatively, to... */
static int x (void) __attribute__ ((weakref));
static int x (void) __attribute__ ((alias ("y")));

A weak reference is an alias that does not by itself require a definition to be given for the target symbol. If the target symbol is only referenced through weak references, then it becomes a weak undefined symbol. If it is directly referenced, however, then such strong references prevail, and a definition is required for the symbol, not necessarily in the same translation unit.

The effect is equivalent to moving all references to the alias to a separate translation unit, renaming the alias to the aliased symbol, declaring it as weak, compiling the two separate translation units and performing a link with relocatable output (i.e. ld -r) on them.

A declaration to which weakref is attached and that is associated with a named target must be static.

*** Common Variable Attributes

https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html

alias (“target”)

The alias variable attribute causes the declaration to be emitted as an alias for another symbol known as an alias target. Except for top-level qualifiers the alias target must have the same type as the alias. For instance, the following
 int var_target;
 extern int __attribute__ ((alias ("var_target"))) var_alias;
defines var_alias to be an alias for the var_target variable.

It is an error if the alias target is not defined in the same translation unit as the alias.

Note that in the absence of the attribute GCC assumes that distinct declarations with external linkage denote distinct objects. Using both the alias and the alias target to access the same object is undefined in a translation unit without a declaration of the alias with the attribute.

This attribute requires assembler and object file support, and may not be
available on all targets.

aligned aligned (alignment)

The aligned attribute specifies a minimum alignment for the variable or structure field, measured in bytes. When specified, alignment must be an integer constant power of 2. Specifying no alignment argument implies the maximum alignment for the target, which is often, but by no means always, 8 or 16 bytes.

For example, the declaration:

int x __attribute__ ((aligned (16))) = 0;

causes the compiler to allocate the global variable x on a 16-byte boundary. On a 68040, this could be used in conjunction with an asm expression to access the move16 instruction which requires 16-byte aligned operands.

You can also specify the alignment of structure fields. For example, to create a double-word aligned int pair, you could write:

struct foo { int x[2] __attribute__ ((aligned (8))); };

This is an alternative to creating a union with a double member, which forces the union to be double-word aligned.

As in the preceding examples, you can explicitly specify the alignment (in bytes) that you wish the compiler to use for a given variable or structure field. Alternatively, you can leave out the alignment factor and just ask the compiler to align a variable or field to the default alignment for the target architecture you are compiling for. The default alignment is sufficient for all scalar types, but may not be enough for all vector types on a target that supports vector operations. The default alignment is fixed for a particular target ABI.

GCC also provides a target specific macro __BIGGEST_ALIGNMENT__, which is the largest alignment ever used for any data type on the target machine you are compiling for. For example, you could write:

short array[3] __attribute__ ((aligned (__BIGGEST_ALIGNMENT__)));

The compiler automatically sets the alignment for the declared variable or field to __BIGGEST_ALIGNMENT__. Doing this can often make copy operations more efficient, because the compiler can use whatever instructions copy the biggest chunks of memory when performing copies to or from the variables or fields that you have aligned this way. Note that the value of __BIGGEST_ALIGNMENT__ may change depending on command-line options.

When used on a struct, or struct member, the aligned attribute can only increase the alignment; in order to decrease it, the packed attribute must be specified as well. When used as part of a typedef, the aligned attribute can both increase and decrease alignment, and specifying the packed attribute generates a warning.

Note that the effectiveness of aligned attributes for static variables may be limited by inherent limitations in the system linker and/or object file format. On some systems, the linker is only able to arrange for variables to be aligned up to a certain maximum alignment. (For some linkers, the maximum supported alignment may be very very small.) If your linker is only able to align variables up to a maximum of 8-byte alignment, then specifying aligned(16) in an __attribute__ still only provides you with 8-byte alignment. See your linker documentation for further information.

Stack variables are not affected by linker restrictions; GCC can properly align them on any target.

The aligned attribute can also be used for functions (see Common Function
Attributes.)

counted_by (count)

The counted_by attribute may be attached to the C99 flexible array member of a structure. It indicates that the number of the elements of the array is given by the field "count" in the same structure as the flexible array member.

This attribute is available only in C for now. In C++ this attribute is ignored.

GCC may use this information to improve detection of object size information for such structures and provide better results in compile-time diagnostics and runtime features like the array bound sanitizer and the __builtin_dynamic_object_size.

For instance, the following code:

struct P {
  size_t count;
  char other;
  char array[] __attribute__ ((counted_by (count)));
} *p;

specifies that the array is a flexible array member whose number of elements is given by the field count in the same structure.

The field that represents the number of the elements should have an integer type. Otherwise, the compiler reports an error and ignores the attribute.

When the field that represents the number of the elements is assigned a negative integer value, the compiler treats the value as zero.

An explicit counted_by annotation defines a relationship between two objects, p->array and p->count, and there are the following requirementthat on the relationship between this pair:

    p->count must be initialized before the first reference to p->array;
    p->array has at least p->count number of elements available all the time. This relationship must hold even after any of these related objects are updated during the program.

It’s the user’s responsibility to make sure the above requirements to be kept all the time. Otherwise the compiler reports warnings, at the same time, the results of the array bound sanitizer and the __builtin_dynamic_object_size is undefined.

One important feature of the attribute is, a reference to the flexible array member field uses the latest value assigned to the field that represents the number of the elements before that reference. For example,

  p->count = val1;
  p->array[20] = 0;  // ref1 to p->array
  p->count = val2;
  p->array[30] = 0;  // ref2 to p->array

in the above, ref1 uses val1 as the number of the elements in p->array, and ref2
uses val2 as the number of elements in p->array.

alloc_size (position) alloc_size (position-1, position-2)

The alloc_size variable attribute may be applied to the declaration of a pointer to a function that returns a pointer and takes at least one argument of an integer type. It indicates that the returned pointer points to an object whose size is given by the function argument at position, or by the product of the arguments at position-1 and position-2. Meaningful sizes are positive values less than PTRDIFF_MAX. Other sizes are diagnosed when detected. GCC uses this information to improve the results of __builtin_object_size.

For instance, the following declarations

typedef __attribute__ ((alloc_size (1, 2))) void*
  (*calloc_ptr) (size_t, size_t);
typedef __attribute__ ((alloc_size (1))) void*
  (*malloc_ptr) (size_t);

specify that calloc_ptr is a pointer of a function that, like the standard C
function calloc, returns an object whose size is given by the product of
arguments 1 and 2, and similarly, that malloc_ptr, like the standard C function
malloc, returns an object whose size is given by argument 1 to the function.

cleanup (cleanup_function)

The cleanup attribute runs a function when the variable goes out of scope. This attribute can only be applied to auto function scope variables; it may not be applied to parameters or variables with static storage duration. The function must take one parameter, a pointer to a type compatible with the variable. The return value of the function (if any) is ignored.

When multiple variables in the same scope have cleanup attributes, at exit from the scope their associated cleanup functions are run in reverse order of definition (last defined, first cleanup).

If -fexceptions is enabled, then cleanup_function is run during the stack
unwinding that happens during the processing of the exception. Note that the
cleanup attribute does not allow the exception to be caught, only to perform an
action. It is undefined what happens if cleanup_function does not return
normally.

common nocommon

The common attribute requests GCC to place a variable in “common” storage. The nocommon attribute requests the opposite—to allocate space for it directly.

These attributes override the default chosen by the -fno-common and -fcommon
flags respectively.

copy copy (variable)

The copy attribute applies the set of attributes with which variable has been
declared to the declaration of the variable to which the attribute is
applied. The attribute is designed for libraries that define aliases that are
expected to specify the same set of attributes as the aliased symbols. The copy
attribute can be used with variables, functions or types. However, the kind of
symbol to which the attribute is applied (either varible or function) must match
the kind of symbol to which the argument refers. The copy attribute copies only
syntactic and semantic attributes but not attributes that affect a symbol’s
linkage or visibility such as alias, visibility, or weak. The deprecated
attribute is also not copied. See Common Function Attributes. See Common Type
Attributes.

deprecated deprecated (msg)

The deprecated attribute results in a warning if the variable is used anywhere in the source file. This is useful when identifying variables that are expected to be removed in a future version of a program. The warning also includes the location of the declaration of the deprecated variable, to enable users to easily find further information about why the variable is deprecated, or what they should do instead. Note that the warning only occurs for uses:

extern int old_var __attribute__ ((deprecated));
extern int old_var;
int new_fn () { return old_var; }

results in a warning on line 3 but not line 2. The optional msg argument, which must be a string, is printed in the warning if present.

The deprecated attribute can also be used for functions and types (see Common Function Attributes, see Common Type Attributes).

The message attached to the attribute is affected by the setting of the
-fmessage-length option.

mode (mode)

This attribute specifies the data type for the declaration—whichever type corresponds to the mode mode. This in effect lets you request an integer or floating-point type according to its width.

See Machine Modes in GNU Compiler Collection (GCC) Internals, for a list of the
possible keywords for mode. You may also specify a mode of byte or __byte__ to
indicate the mode corresponding to a one-byte integer, word or __word__ for the
mode of a one-word integer, and pointer or __pointer__ for the mode used to
represent pointers.

no_icf

This variable attribute prevents a variable from being merged with another
equivalent variable.

noinit

Any data with the noinit attribute will not be initialized by the C runtime startup code, or the program loader. Not initializing data in this way can reduce program startup times.

This attribute is specific to ELF targets and relies on the linker script to
place sections with the .noinit prefix in the right location.

nonstring

The nonstring variable attribute specifies that an object or member declaration with type array of char, signed char, or unsigned char, or pointer to such a type is intended to store character arrays that do not necessarily contain a terminating NUL. This is useful in detecting uses of such arrays or pointers with functions that expect NUL-terminated strings, and to avoid warnings when such an array or pointer is used as an argument to a bounded string manipulation function such as strncpy. For example, without the attribute, GCC will issue a warning for the strncpy call below because it may truncate the copy without appending the terminating NUL character. Using the attribute makes it possible to suppress the warning. However, when the array is declared with the attribute the call to strlen is diagnosed because when the array doesn’t contain a NUL-terminated string the call is undefined. To copy, compare, of search non-string character arrays use the memcpy, memcmp, memchr, and other functions that operate on arrays of bytes. In addition, calling strnlen and strndup with such arrays is safe provided a suitable bound is specified, and not diagnosed.

struct Data
{
  char name [32] __attribute__ ((nonstring));
};

int f (struct Data *pd, const char *s)
{
  strncpy (pd->name, s, sizeof pd->name);
  …
  return strlen (pd->name);   // unsafe, gets a warning
}

objc_nullability (nullability kind) (Objective-C and Objective-C++ only)

This attribute applies to pointer variables only. It allows marking the pointer with one of four possible values describing the conditions under which the pointer might have a nil value. In most cases, the attribute is intended to be an internal representation for property and method nullability (specified by language keywords); it is not recommended to use it directly.

When nullability kind is "unspecified" or 0, nothing is known about the conditions in which the pointer might be nil. Making this state specific serves to avoid false positives in diagnostics.

When nullability kind is "nonnull" or 1, the pointer has no meaning if it is nil and thus the compiler is free to emit diagnostics if it can be determined that the value will be nil.

When nullability kind is "nullable" or 2, the pointer might be nil and carry meaning as such.

When nullability kind is "resettable" or 3 (used only in the context of property
attribute lists) this describes the case in which a property setter may take the
value nil (which perhaps causes the property to be reset in some manner to a
default) but for which the property getter will never validly return nil.

packed

The packed attribute specifies that a structure member should have the smallest possible alignment—one bit for a bit-field and one byte otherwise, unless a larger value is specified with the aligned attribute. The attribute does not apply to non-member objects.

For example in the structure below, the member array x is packed so that it immediately follows a with no intervening padding:

struct foo
{
  char a;
  int x[2] __attribute__ ((packed));
};

Note: The 4.1, 4.2 and 4.3 series of GCC ignore the packed attribute on
bit-fields of type char. This has been fixed in GCC 4.4 but the change can lead
to differences in the structure layout. See the documentation of
-Wpacked-bitfield-compat for more information.

persistent

Any data with the persistent attribute will not be initialized by the C runtime startup code, but will be initialized by the program loader. This enables the value of the variable to ‘persist’ between processor resets.

This attribute is specific to ELF targets and relies on the linker script to
place the sections with the .persistent prefix in the right
location. Specifically, some type of non-volatile, writeable memory is required.

section (“section-name”)

Normally, the compiler places the objects it generates in sections like data and bss. Sometimes, however, you need additional sections, or you need certain particular variables to appear in special sections, for example to map to special hardware. The section attribute specifies that a variable (or function) lives in a particular section. For example, this small program uses several specific section names:

struct duart a __attribute__ ((section ("DUART_A"))) = { 0 };
struct duart b __attribute__ ((section ("DUART_B"))) = { 0 };
char stack[10000] __attribute__ ((section ("STACK"))) = { 0 };
int init_data __attribute__ ((section ("INITDATA")));

main()
{
  /* Initialize stack pointer */
  init_sp (stack + sizeof (stack));

  /* Initialize initialized data */
  memcpy (&init_data, &data, &edata - &data);

  /* Turn on the serial ports */
  init_duart (&a);
  init_duart (&b);
}

Use the section attribute with global variables and not local variables, as shown in the example.

You may use the section attribute with initialized or uninitialized global variables but the linker requires each object be defined once, with the exception that uninitialized variables tentatively go in the common (or bss) section and can be multiply “defined”. Using the section attribute changes what section the variable goes into and may cause the linker to issue an error if an uninitialized variable has multiple definitions. You can force a variable to be initialized with the -fno-common flag or the nocommon attribute.

Some file formats do not support arbitrary sections so the section attribute is
not available on all platforms. If you need to map the entire contents of a
module to a particular section, consider using the facilities of the linker
instead.

strict_flex_array (level)

The strict_flex_array attribute should be attached to the trailing array field of a structure. It controls when to treat the trailing array field of a structure as a flexible array member for the purposes of accessing the elements of such an array. level must be an integer betwen 0 to 3.

level=0 is the least strict level, all trailing arrays of structures are treated as flexible array members. level=3 is the strictest level, only when the trailing array is declared as a flexible array member per C99 standard onwards (‘[]’), it is treated as a flexible array member.

There are two more levels in between 0 and 3, which are provided to support older codes that use GCC zero-length array extension (‘[0]’) or one-element array as flexible array members (‘[1]’). When level is 1, the trailing array is treated as a flexible array member when it is declared as either ‘[]’, ‘[0]’, or ‘[1]’; When level is 2, the trailing array is treated as a flexible array member when it is declared as either ‘[]’, or ‘[0]’.

This attribute can be used with or without the -fstrict-flex-arrays command-line option. When both the attribute and the option are present at the same time, the level of the strictness for the specific trailing array field is determined by the attribute.

The strict_flex_array attribute interacts with the -Wstrict-flex-arrays option. See Options to Request or Suppress Warnings, for more information.

tls_model (“tls_model”)

The tls_model attribute sets thread-local storage model (see Thread-Local Storage) of a particular __thread variable, overriding -ftls-model= command-line switch on a per-variable basis. The tls_model argument should be one of global-dynamic, local-dynamic, initial-exec or local-exec.

Not all targets support this attribute.

unavailable unavailable (msg)

The unavailable attribute indicates that the variable so marked is not available, if it is used anywhere in the source file. It behaves in the same manner as the deprecated attribute except that the compiler will emit an error rather than a warning.

It is expected that items marked as deprecated will eventually be withdrawn from interfaces, and then become unavailable. This attribute allows for marking them appropriately.

The unavailable attribute can also be used for functions and types (see Common Function Attributes, see Common Type Attributes).

unused

This attribute, attached to a variable or structure field, means that the variable or field is meant to be possibly unused. GCC does not produce a warning for this variable or field.

used

This attribute, attached to a variable with static storage, means that the variable must be emitted even if it appears that the variable is not referenced.

When applied to a static data member of a C++ class template, the attribute also means that the member is instantiated if the class itself is instantiated.

retain

For ELF targets that support the GNU or FreeBSD OSABIs, this attribute will save the variable from linker garbage collection. To support this behavior, variables that have not been placed in specific sections (e.g. by the section attribute, or the -fdata-sections option), will be placed in new, unique sections.

This additional functionality requires Binutils version 2.36 or later.

uninitialized

This attribute, attached to a variable with automatic storage, means that the variable should not be automatically initialized by the compiler when the option -ftrivial-auto-var-init presents.

With the option -ftrivial-auto-var-init, all the automatic variables that do not have explicit initializers will be initialized by the compiler. These additional compiler initializations might incur run-time overhead, sometimes dramatically. This attribute can be used to mark some variables to be excluded from such automatical initialization in order to reduce runtime overhead.

This attribute has no effect when the option -ftrivial-auto-var-init does not present.

vector_size (bytes)

This attribute specifies the vector size for the type of the declared variable, measured in bytes. The type to which it applies is known as the base type. The bytes argument must be a positive power-of-two multiple of the base type size. For example, the declaration:

int foo __attribute__ ((vector_size (16)));

causes the compiler to set the mode for foo, to be 16 bytes, divided into int sized units. Assuming a 32-bit int, foo’s type is a vector of four units of four bytes each, and the corresponding mode of foo is V4SI. See Using Vector Instructions through Built-in Functions, for details of manipulating vector variables.

This attribute is only applicable to integral and floating scalars, although arrays, pointers, and function return values are allowed in conjunction with this construct.

Aggregates with this attribute are invalid, even if they are of the same size as a corresponding scalar. For example, the declaration:

struct S { int a; };
struct S  __attribute__ ((vector_size (16))) foo;

is invalid even if the size of the structure is the same as the size of the int.

visibility (“visibility_type”)

This attribute affects the linkage of the declaration to which it is attached. The visibility attribute is described in Common Function Attributes.

warn_if_not_aligned (alignment)

This attribute specifies a threshold for the structure field, measured in bytes. If the structure field is aligned below the threshold, a warning will be issued. For example, the declaration:

struct foo
{
  int i1;
  int i2;
  unsigned long long x __attribute__ ((warn_if_not_aligned (16)));
};

causes the compiler to issue an warning on struct foo, like ‘warning: alignment 8 of 'struct foo' is less than 16’. The compiler also issues a warning, like ‘warning: 'x' offset 8 in 'struct foo' isn't aligned to 16’, when the structure field has the misaligned offset:

struct __attribute__ ((aligned (16))) foo
{
  int i1;
  int i2;
  unsigned long long x __attribute__ ((warn_if_not_aligned (16)));
};

This warning can be disabled by -Wno-if-not-aligned. The warn_if_not_aligned attribute can also be used for types (see Common Type Attributes.)

weak

The weak attribute is described in Common Function Attributes.

*** Common Type Attributes

https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html

aligned aligned (alignment)

The aligned attribute specifies a minimum alignment (in bytes) for variables of the specified type. When specified, alignment must be a power of 2. Specifying no alignment argument implies the maximum alignment for the target, which is often, but by no means always, 8 or 16 bytes. For example, the declarations:

struct __attribute__ ((aligned (8))) S { short f[3]; };
typedef int more_aligned_int __attribute__ ((aligned (8)));

force the compiler to ensure (as far as it can) that each variable whose type is struct S or more_aligned_int is allocated and aligned at least on a 8-byte boundary. On a SPARC, having all variables of type struct S aligned to 8-byte boundaries allows the compiler to use the ldd and std (doubleword load and store) instructions when copying one variable of type struct S to another, thus improving run-time efficiency.

Note that the alignment of any given struct or union type is required by the ISO C standard to be at least a perfect multiple of the lowest common multiple of the alignments of all of the members of the struct or union in question. This means that you can effectively adjust the alignment of a struct or union type by attaching an aligned attribute to any one of the members of such a type, but the notation illustrated in the example above is a more obvious, intuitive, and readable way to request the compiler to adjust the alignment of an entire struct or union type.

As in the preceding example, you can explicitly specify the alignment (in bytes) that you wish the compiler to use for a given struct or union type. Alternatively, you can leave out the alignment factor and just ask the compiler to align a type to the maximum useful alignment for the target machine you are compiling for. For example, you could write:

struct __attribute__ ((aligned)) S { short f[3]; };

Whenever you leave out the alignment factor in an aligned attribute specification, the compiler automatically sets the alignment for the type to the largest alignment that is ever used for any data type on the target machine you are compiling for. Doing this can often make copy operations more efficient, because the compiler can use whatever instructions copy the biggest chunks of memory when performing copies to or from the variables that have types that you have aligned this way.

In the example above, if the size of each short is 2 bytes, then the size of the entire struct S type is 6 bytes. The smallest power of two that is greater than or equal to that is 8, so the compiler sets the alignment for the entire struct S type to 8 bytes.

Note that although you can ask the compiler to select a time-efficient alignment for a given type and then declare only individual stand-alone objects of that type, the compiler’s ability to select a time-efficient alignment is primarily useful only when you plan to create arrays of variables having the relevant (efficiently aligned) type. If you declare or use arrays of variables of an efficiently-aligned type, then it is likely that your program also does pointer arithmetic (or subscripting, which amounts to the same thing) on pointers to the relevant type, and the code that the compiler generates for these pointer arithmetic operations is often more efficient for efficiently-aligned types than for other types.

Note that the effectiveness of aligned attributes may be limited by inherent limitations in your linker. On many systems, the linker is only able to arrange for variables to be aligned up to a certain maximum alignment. (For some linkers, the maximum supported alignment may be very very small.) If your linker is only able to align variables up to a maximum of 8-byte alignment, then specifying aligned (16) in an __attribute__ still only provides you with 8-byte alignment. See your linker documentation for further information.

When used on a struct, or struct member, the aligned attribute can only increase the alignment; in order to decrease it, the packed attribute must be specified as well. When used as part of a typedef, the aligned attribute can both increase and decrease alignment, and specifying the packed attribute generates a warning.

alloc_size (position) alloc_size (position-1, position-2)

The alloc_size type attribute may be applied to the definition of a type of a function that returns a pointer and takes at least one argument of an integer type. It indicates that the returned pointer points to an object whose size is given by the function argument at position-1, or by the product of the arguments at position-1 and position-2. Meaningful sizes are positive values less than PTRDIFF_MAX. Other sizes are disagnosed when detected. GCC uses this information to improve the results of __builtin_object_size.

For instance, the following declarations

typedef __attribute__ ((alloc_size (1, 2))) void*
  calloc_type (size_t, size_t);
typedef __attribute__ ((alloc_size (1))) void*
  malloc_type (size_t);

specify that calloc_type is a type of a function that, like the standard C function calloc, returns an object whose size is given by the product of arguments 1 and 2, and that malloc_type, like the standard C function malloc, returns an object whose size is given by argument 1 to the function.

copy copy (expression)

The copy attribute applies the set of attributes with which the type of the expression has been declared to the declaration of the type to which the attribute is applied. The attribute is designed for libraries that define aliases that are expected to specify the same set of attributes as the aliased symbols. The copy attribute can be used with types, variables, or functions. However, the kind of symbol to which the attribute is applied (either varible or function) must match the kind of symbol to which the argument refers. The copy attribute copies only syntactic and semantic attributes but not attributes that affect a symbol’s linkage or visibility such as alias, visibility, or weak. The deprecated attribute is also not copied. See Common Function Attributes. See Common Variable Attributes.

For example, suppose struct A below is defined in some third party library header to have the alignment requirement N and to force a warning whenever a variable of the type is not so aligned due to attribute packed. Specifying the copy attribute on the definition on the unrelated struct B has the effect of copying all relevant attributes from the type referenced by the pointer expression to struct B.

struct __attribute__ ((aligned (N), warn_if_not_aligned (N)))
A { /* … */ };
struct __attribute__ ((copy ( (struct A *)0)) B { /* … */ };

deprecated deprecated (msg)

The deprecated attribute results in a warning if the type is used anywhere in the source file. This is useful when identifying types that are expected to be removed in a future version of a program. If possible, the warning also includes the location of the declaration of the deprecated type, to enable users to easily find further information about why the type is deprecated, or what they should do instead. Note that the warnings only occur for uses and then only if the type is being applied to an identifier that itself is not being declared as deprecated.

typedef int T1 __attribute__ ((deprecated));
T1 x;
typedef T1 T2;
T2 y;
typedef T1 T3 __attribute__ ((deprecated));
T3 z __attribute__ ((deprecated));

results in a warning on line 2 and 3 but not lines 4, 5, or 6. No warning is issued for line 4 because T2 is not explicitly deprecated. Line 5 has no warning because T3 is explicitly deprecated. Similarly for line 6. The optional msg argument, which must be a string, is printed in the warning if present. Control characters in the string will be replaced with escape sequences, and if the -fmessage-length option is set to 0 (its default value) then any newline characters will be ignored.

The deprecated attribute can also be used for functions and variables (see Declaring Attributes of Functions, see Specifying Attributes of Variables.)

The message attached to the attribute is affected by the setting of the -fmessage-length option.

designated_init

This attribute may only be applied to structure types. It indicates that any initialization of an object of this type must use designated initializers rather than positional initializers. The intent of this attribute is to allow the programmer to indicate that a structure’s layout may change, and that therefore relying on positional initialization will result in future breakage.

GCC emits warnings based on this attribute by default; use -Wno-designated-init to suppress them.

flag_enum

This attribute may be applied to an enumerated type to indicate that its enumerators are used in bitwise operations, so e.g. -Wswitch should not warn about a case that corresponds to a bitwise combination of enumerators.

hardbool hardbool (false_value) hardbool (false_value, true_value)

This attribute may only be applied to integral types in C, to introduce hardened boolean types. It turns the integral type into a boolean-like type with the same size and precision, that uses the specified values as representations for false and true. Underneath, it is actually an enumerated type, but its observable behavior is like that of _Bool, except for the strict internal representations, verified by runtime checks.

If true_value is omitted, the bitwise negation of false_value is used. If false_value is omitted, zero is used. The named representation values must be different when converted to the original integral type. Narrower bitfields are rejected if the representations become indistinguishable.

Values of such types automatically decay to _Bool, at which point, the selected representation values are mapped to the corresponding _Bool values. When the represented value is not determined, at compile time, to be either false_value or true_value, runtime verification calls __builtin_trap if it is neither. This is what makes them hardened boolean types.

When converting scalar types to such hardened boolean types, implicitly or explicitly, behavior corresponds to a conversion to _Bool, followed by a mapping from false and true to false_value and true_value, respectively.

typedef char __attribute__ ((__hardbool__ (0x5a))) hbool;
hbool first = 0;       /* False, stored as (char)0x5a.  */
hbool second = !first; /* True, stored as ~(char)0x5a.  */

static hbool zeroinit; /* False, stored as (char)0x5a.  */
auto hbool uninit;     /* Undefined, may trap.  */

When zero-initializing a variable or field of hardened boolean type (presumably held in static storage) the implied zero initializer gets converted to _Bool, and then to the hardened boolean type, so that the initial value is the hardened representation for false. Using that value is well defined. This is not the case when variables and fields of such types are uninitialized (presumably held in automatic or dynamic storage): their values are indeterminate, and using them invokes undefined behavior. Using them may trap or not, depending on the bits held in the storage (re)used for the variable, if any, and on optimizations the compiler may perform on the grounds that using uninitialized values invokes undefined behavior.

Users of -ftrivial-auto-var-init should be aware that the bit patterns used as initializers are not converted to hardbool types, so using a hardbool variable that is implicitly initialized by the -ftrivial-auto-var-init may trap if the representations values chosen for false and true do not match the initializer.

Since this is a language extension only available in C, interoperation with other languages may pose difficulties. It should interoperate with Ada Booleans defined with the same size and equivalent representation clauses, and with enumerations or other languages’ integral types that correspond to C’s chosen integral type.

may_alias

Accesses through pointers to types with this attribute are not subject to type-based alias analysis, but are instead assumed to be able to alias any other type of objects. In the context of section 6.5 paragraph 7 of the C99 standard, an lvalue expression dereferencing such a pointer is treated like having a character type. See -fstrict-aliasing for more information on aliasing issues. This extension exists to support some vector APIs, in which pointers to one vector type are permitted to alias pointers to a different vector type.

Note that an object of a type with this attribute does not have any special semantics.

Example of use:

typedef short __attribute__ ((__may_alias__)) short_a;

int
main (void)
{
  int a = 0x12345678;
  short_a *b = (short_a *) &a;

  b[1] = 0;

  if (a == 0x12345678)
    abort();

  exit(0);
}

If you replaced short_a with short in the variable declaration, the above program would abort when compiled with -fstrict-aliasing, which is on by default at -O2 or above.

mode (mode)

This attribute specifies the data type for the declaration—whichever type corresponds to the mode mode. This in effect lets you request an integer or floating-point type according to its width.

See Machine Modes in GNU Compiler Collection (GCC) Internals, for a list of the possible keywords for mode. You may also specify a mode of byte or __byte__ to indicate the mode corresponding to a one-byte integer, word or __word__ for the mode of a one-word integer, and pointer or __pointer__ for the mode used to represent pointers.

objc_root_class (Objective-C and Objective-C++ only)

This attribute marks a class as being a root class, and thus allows the compiler to elide any warnings about a missing superclass and to make additional checks for mandatory methods as needed.

packed

This attribute, attached to a struct, union, or C++ class type definition, specifies that each of its members (other than zero-width bit-fields) is placed to minimize the memory required. This is equivalent to specifying the packed attribute on each of the members.

When attached to an enum definition, the packed attribute indicates that the smallest integral type should be used. Specifying the -fshort-enums flag on the command line is equivalent to specifying the packed attribute on all enum definitions.

In the following example struct my_packed_struct’s members are packed closely together, but the internal layout of its s member is not packed—to do that, struct my_unpacked_struct needs to be packed too.

struct my_unpacked_struct
 {
    char c;
    int i;
 };

struct __attribute__ ((__packed__)) my_packed_struct
  {
     char c;
     int  i;
     struct my_unpacked_struct s;
  };

You may only specify the packed attribute on the definition of an enum, struct, union, or class, not on a typedef that does not also define the enumerated type, structure, union, or class.

scalar_storage_order (“endianness”)

When attached to a union or a struct, this attribute sets the storage order, aka endianness, of the scalar fields of the type, as well as the array fields whose component is scalar. The supported endiannesses are big-endian and little-endian. The attribute has no effects on fields which are themselves a union, a struct or an array whose component is a union or a struct, and it is possible for these fields to have a different scalar storage order than the enclosing type.

Note that neither pointer nor vector fields are considered scalar fields in this context, so the attribute has no effects on these fields.

This attribute is supported only for targets that use a uniform default scalar storage order (fortunately, most of them), i.e. targets that store the scalars either all in big-endian or all in little-endian.

Additional restrictions are enforced for types with the reverse scalar storage order with regard to the scalar storage order of the target:

    Taking the address of a scalar field of a union or a struct with reverse scalar storage order is not permitted and yields an error.
    Taking the address of an array field, whose component is scalar, of a union or a struct with reverse scalar storage order is permitted but yields a warning, unless -Wno-scalar-storage-order is specified.
    Taking the address of a union or a struct with reverse scalar storage order is permitted.

These restrictions exist because the storage order attribute is lost when the address of a scalar or the address of an array with scalar component is taken, so storing indirectly through this address generally does not work. The second case is nevertheless allowed to be able to perform a block copy from or to the array.

Moreover, the use of type punning or aliasing to toggle the storage order is not supported; that is to say, if a given scalar object can be accessed through distinct types that assign a different storage order to it, then the behavior is undefined.

strub

This attribute defines stack-scrubbing properties of functions and variables, so that functions that access sensitive data can have their stack frames zeroed-out upon returning or propagating exceptions. This may be enabled explicitly, by selecting certain strub modes for specific functions, or implicitly, by means of strub variables.

Being a type attribute, it attaches to types, even when specified in function and variable declarations. When applied to function types, it takes an optional string argument. When applied to a pointer-to-function type, if the optional argument is given, it gets propagated to the function type.

/* A strub variable.  */
int __attribute__ ((strub)) var;
/* A strub variable that happens to be a pointer.  */
__attribute__ ((strub)) int *strub_ptr_to_int;
/* A pointer type that may point to a strub variable.  */
typedef int __attribute__ ((strub)) *ptr_to_strub_int_type;

/* A declaration of a strub function.  */
extern int __attribute__ ((strub)) foo (void);
/* A pointer to that strub function.  */
int __attribute__ ((strub ("at-calls"))) (*ptr_to_strub_fn)(void) = foo;

A function associated with at-calls strub mode (strub("at-calls"), or just strub) undergoes interface changes. Its callers are adjusted to match the changes, and to scrub (overwrite with zeros) the stack space used by the called function after it returns. The interface change makes the function type incompatible with an unadorned but otherwise equivalent type, so every declaration and every type that may be used to call the function must be associated with this strub mode.

A function associated with internal strub mode (strub("internal")) retains an unmodified, type-compatible interface, but it may be turned into a wrapper that calls the wrapped body using a custom interface. The wrapper then scrubs the stack space used by the wrapped body. Though the wrapped body has its stack space scrubbed, the wrapper does not, so arguments and return values may remain unscrubbed even when such a function is called by another function that enables strub. This is why, when compiling with -fstrub=strict, a strub context is not allowed to call internal strub functions.

/* A declaration of an internal-strub function.  */
extern int __attribute__ ((strub ("internal"))) bar (void);

int __attribute__ ((strub))
baz (void)
{
  /* Ok, foo was declared above as an at-calls strub function.  */
  foo ();
  /* Not allowed in strict mode, otherwise allowed.  */
  bar ();
}

An automatically-allocated variable associated with the strub attribute causes the (immediately) enclosing function to have strub enabled.

A statically-allocated variable associated with the strub attribute causes functions that read it, through its strub data type, to have strub enabled. Reading data by dereferencing a pointer to a strub data type has the same effect. Note: The attribute does not carry over from a composite type to the types of its components, so the intended effect may not be obtained with non-scalar types.

When selecting a strub-enabled mode for a function that is not explicitly associated with one, because of strub variables or data pointers, the function must satisfy internal mode viability requirements (see below), even when at-calls mode is also viable and, being more efficient, ends up selected as an optimization.

/* zapme is implicitly strub-enabled because of strub variables.
   Optimization may change its strub mode, but not the requirements.  */
static int
zapme (int i)
{
  /* A local strub variable enables strub.  */
  int __attribute__ ((strub)) lvar;
  /* Reading strub data through a pointer-to-strub enables strub.  */
  lvar = * (ptr_to_strub_int_type) &i;
  /* Writing to a global strub variable does not enable strub.  */
  var = lvar;
  /* Reading from a global strub variable enables strub.  */
  return var;
}

A strub context is the body (as opposed to the interface) of a function that has strub enabled, be it explicitly, by at-calls or internal mode, or implicitly, due to strub variables or command-line options.

A function of a type associated with the disabled strub mode (strub("disabled") will not have its own stack space scrubbed. Such functions cannot be called from within strub contexts.

In order to enable a function to be called from within strub contexts without having its stack space scrubbed, associate it with the callable strub mode (strub("callable")).

When a function is not assigned a strub mode, explicitly or implicitly, the mode defaults to callable, except when compiling with -fstrub=strict, that causes strub mode to default to disabled.

extern int __attribute__ ((strub ("callable"))) bac (void);
extern int __attribute__ ((strub ("disabled"))) bad (void);
 /* Implicitly disabled with -fstrub=strict, otherwise callable.  */
extern int bah (void);

int __attribute__ ((strub))
bal (void)
{
  /* Not allowed, bad is not strub-callable.  */
  bad ();
  /* Ok, bac is strub-callable.  */
  bac ();
  /* Not allowed with -fstrub=strict, otherwise allowed.  */
  bah ();
}

Function types marked callable and disabled are not mutually compatible types, but the underlying interfaces are compatible, so it is safe to convert pointers between them, and to use such pointers or alternate declarations to call them. Interfaces are also interchangeable between them and internal (but not at-calls!), but adding internal to a pointer type will not cause the pointed-to function to perform stack scrubbing.

void __attribute__ ((strub))
bap (void)
{
  /* Assign a callable function to pointer-to-disabled.
     Flagged as not quite compatible with -Wpedantic.  */
  int __attribute__ ((strub ("disabled"))) (*d_p) (void) = bac;
  /* Not allowed: calls disabled type in a strub context.  */
  d_p ();

  /* Assign a disabled function to pointer-to-callable.
     Flagged as not quite compatible with -Wpedantic.  */
  int __attribute__ ((strub ("callable"))) (*c_p) (void) = bad;
  /* Ok, safe.  */
  c_p ();

  /* Assign an internal function to pointer-to-callable.
     Flagged as not quite compatible with -Wpedantic.  */
  c_p = bar;
  /* Ok, safe.  */
  c_p ();

  /* Assign an at-calls function to pointer-to-callable.
     Flaggged as incompatible.  */
  c_p = bal;
  /* The call through an interface-incompatible type will not use the
     modified interface expected by the at-calls function, so it is
     likely to misbehave at runtime.  */
  c_p ();
}

Strub contexts are never inlined into non-strub contexts. When an internal-strub function is split up, the wrapper can often be inlined, but the wrapped body never is. A function marked as always_inline, even if explicitly assigned internal strub mode, will not undergo wrapping, so its body gets inlined as required.

inline int __attribute__ ((strub ("at-calls")))
inl_atc (void)
{
  /* This body may get inlined into strub contexts.  */
}

inline int __attribute__ ((strub ("internal")))
inl_int (void)
{
  /* This body NEVER gets inlined, though its wrapper may.  */
}

inline int __attribute__ ((strub ("internal"), always_inline))
inl_int_ali (void)
{
  /* No internal wrapper, so this body ALWAYS gets inlined,
     but it cannot be called from non-strub contexts.  */
}

void __attribute__ ((strub ("disabled")))
bat (void)
{
  /* Not allowed, cannot inline into a non-strub context.  */
  inl_int_ali ();
}

Some -fstrub=* command-line options enable strub modes implicitly where viable. A strub mode is only viable for a function if the function is eligible for that mode, and if other conditions, detailed below, are satisfied. If it’s not eligible for a mode, attempts to explicitly associate it with that mode are rejected with an error message. If it is eligible, that mode may be assigned explicitly through this attribute, but implicit assignment through command-line options may involve additional viability requirements.

A function is ineligible for at-calls strub mode if a different strub mode is explicitly requested, if attribute noipa is present, or if it calls __builtin_apply_args. At-calls strub mode, if not requested through the function type, is only viable for an eligible function if the function is not visible to other translation units, if it doesn’t have its address taken, and if it is never called with a function type overrider.

/* bar is eligible for at-calls strub mode,
   but not viable for that mode because it is visible to other units.
   It is eligible and viable for internal strub mode.  */
void bav () {}

/* setp is eligible for at-calls strub mode,
   but not viable for that mode because its address is taken.
   It is eligible and viable for internal strub mode.  */
void setp (void) { static void (*p)(void); = setp; }

A function is ineligible for internal strub mode if a different strub mode is explicitly requested, or if attribute noipa is present. For an always_inline function, meeting these requirements is enough to make it eligible. Any function that has attribute noclone, that uses such extensions as non-local labels, computed gotos, alternate variable argument passing interfaces, __builtin_next_arg, or __builtin_return_address, or that takes too many (about 64Ki) arguments is ineligible, unless it is always_inline. For internal strub mode, all eligible functions are viable.

/* flop is not eligible, thus not viable, for at-calls strub mode.
   Likewise for internal strub mode.  */
__attribute__ ((noipa)) void flop (void) {}

/* flip is eligible and viable for at-calls strub mode.
   It would be ineligible for internal strub mode, because of noclone,
   if it weren't for always_inline.  With always_inline, noclone is not
   an obstacle, so it is also eligible and viable for internal strub mode.  */
inline __attribute__ ((noclone, always_inline)) void flip (void) {}

transparent_union

This attribute, attached to a union type definition, indicates that any function parameter having that union type causes calls to that function to be treated in a special way.

First, the argument corresponding to a transparent union type can be of any type in the union; no cast is required. Also, if the union contains a pointer type, the corresponding argument can be a null pointer constant or a void pointer expression; and if the union contains a void pointer type, the corresponding argument can be any pointer expression. If the union member type is a pointer, qualifiers like const on the referenced type must be respected, just as with normal pointer conversions.

Second, the argument is passed to the function using the calling conventions of the first member of the transparent union, not the calling conventions of the union itself. All members of the union must have the same machine representation; this is necessary for this argument passing to work properly.

Transparent unions are designed for library functions that have multiple interfaces for compatibility reasons. For example, suppose the wait function must accept either a value of type int * to comply with POSIX, or a value of type union wait * to comply with the 4.1BSD interface. If wait’s parameter were void *, wait would accept both kinds of arguments, but it would also accept any other pointer type and this would make argument type checking less useful. Instead, <sys/wait.h> might define the interface as follows:

typedef union __attribute__ ((__transparent_union__))
  {
    int *__ip;
    union wait *__up;
  } wait_status_ptr_t;

pid_t wait (wait_status_ptr_t);

This interface allows either int * or union wait * arguments to be passed, using the int * calling convention. The program can call wait with arguments of either type:

int w1 () { int w; return wait (&w); }
int w2 () { union wait w; return wait (&w); }

With this interface, wait’s implementation might look like this:

pid_t wait (wait_status_ptr_t p)
{
  return waitpid (-1, p.__ip, 0);
}

unavailable unavailable (msg)

The unavailable attribute behaves in the same manner as the deprecated one, but emits an error rather than a warning. It is used to indicate that a (perhaps previously deprecated) type is no longer usable.

The unavailable attribute can also be used for functions and variables (see Declaring Attributes of Functions, see Specifying Attributes of Variables.)

unused

When attached to a type (including a union or a struct), this attribute means that variables of that type are meant to appear possibly unused. GCC does not produce a warning for any variables of that type, even if the variable appears to do nothing. This is often the case with lock or thread classes, which are usually defined and then not referenced, but contain constructors and destructors that have nontrivial bookkeeping functions.

vector_size (bytes)

This attribute specifies the vector size for the type, measured in bytes. The type to which it applies is known as the base type. The bytes argument must be a positive power-of-two multiple of the base type size. For example, the following declarations:

typedef __attribute__ ((vector_size (32))) int int_vec32_t ;
typedef __attribute__ ((vector_size (32))) int* int_vec32_ptr_t;
typedef __attribute__ ((vector_size (32))) int int_vec32_arr3_t[3];

define int_vec32_t to be a 32-byte vector type composed of int sized units. With int having a size of 4 bytes, the type defines a vector of eight units, four bytes each. The mode of variables of type int_vec32_t is V8SI. int_vec32_ptr_t is then defined to be a pointer to such a vector type, and int_vec32_arr3_t to be an array of three such vectors. See Using Vector Instructions through Built-in Functions, for details of manipulating objects of vector types.

This attribute is only applicable to integral and floating scalar types. In function declarations the attribute applies to the function return type.

For example, the following:

__attribute__ ((vector_size (16))) float get_flt_vec16 (void);

declares get_flt_vec16 to be a function returning a 16-byte vector with the base type float.

visibility

In C++, attribute visibility (see Declaring Attributes of Functions) can also be applied to class, struct, union and enum types. Unlike other type attributes, the attribute must appear between the initial keyword and the name of the type; it cannot appear after the body of the type.

Note that the type visibility is applied to vague linkage entities associated with the class (vtable, typeinfo node, etc.). In particular, if a class is thrown as an exception in one shared object and caught in another, the class must have default visibility. Otherwise the two shared objects are unable to use the same typeinfo node and exception handling will break.

warn_if_not_aligned (alignment)

This attribute specifies a threshold for the structure field, measured in bytes. If the structure field is aligned below the threshold, a warning will be issued. For example, the declaration:

typedef unsigned long long __u64
   __attribute__((aligned (4), warn_if_not_aligned (8)));

struct foo
{
  int i1;
  int i2;
  __u64 x;
};

causes the compiler to issue an warning on struct foo, like ‘warning: alignment 4 of 'struct foo' is less than 8’. It is used to define struct foo in such a way that struct foo has the same layout and the structure field x has the same alignment when __u64 is aligned at either 4 or 8 bytes. Align struct foo to 8 bytes:

struct __attribute__ ((aligned (8))) foo
{
  int i1;
  int i2;
  __u64 x;
};

silences the warning. The compiler also issues a warning, like ‘warning: 'x' offset 12 in 'struct foo' isn't aligned to 8’, when the structure field has the misaligned offset:

struct __attribute__ ((aligned (8))) foo
{
  int i1;
  int i2;
  int i3;
  __u64 x;
};

This warning can be disabled by -Wno-if-not-aligned.

To specify multiple attributes, separate them by commas within the double parentheses: for example, ‘attribute ((aligned (16), packed))’.

GNU C asm
#

GNU C 支持两种类型的内联汇编:

  1. Basic asm:Assembler Instructions Without Operands,只能在文件全局(top-level)使用,不支持输入、输出参数。
    asm asm-qualifiers ( AssemblerInstructions )
  2. Extended Asm:Assembler Instructions with C Expression Operands
asm asm-qualifiers ( AssemblerTemplate
: OutputOperands
[ : InputOperands
[ : Clobbers ] ])

asm asm-qualifiers ( AssemblerTemplate
: OutputOperands
: InputOperands
: Clobbers
: GotoLabels)

示例:

__asm__ ("some instructions"
: /* No outputs. */
: "r" (Offset / 8))


int src = 1;
int dst;
asm ("mov %1, %0\n\t" // %1 %0 引用后面的变量
"add $1, %0"
: "=r" (dst) // 输出
: "r" (src)); // 输入

printf("%d\n", dst);

为汇编中引用的变量定义名称:

uint32_t Mask = 1234;
uint32_t Index;
asm ("bsfl %[aMask], %[aIndex]" // %[aIndex] 引用变量名
: [aIndex] "=r" (Index) // [aIndex] 为变量别名
: [aMask] "r" (Mask)
: "cc");

Clang 支持的 attribute: https://clang.llvm.org/docs/AttributeReference.html

clang 支持的 C 和 GNU C 扩展 :noexport:
#

参考:

  1. [[https://docs.kernel.org/kbuild/llvm.html][Building Linux with Clang/LLVM]]
  2. [[https://releases.llvm.org/3.1/tools/clang/docs/LanguageExtensions.html][Clang Language Extensions]]

stdio.h 文件读写
#

标准文件类型:

FILE* name Description stdin Standard Input, generally the keyboard by default stdout Standard Output, generally the screen by default stderr Standard Error, generally the screen by default, as well

printf("Hello, world!\n");
fprintf(stdout, "Hello, world!\n");  // printf to a file

文件读写:

#include <stdio.h>

int main(void)
{
    FILE *fp;                      // Variable to represent open file

    fp = fopen("hello.txt", "r");  // Open file for reading

    int c = fgetc(fp);             // Read a single character, 需要是 int 类型
    printf("%c\n", c);             // Print char to stdout

    fclose(fp);                    // Close the file when done
}

EOF:

#include <stdio.h>

int main(void)
{
    FILE *fp;
    int c;

    fp = fopen("hello.txt", "r");

    while ((c = fgetc(fp)) != EOF)
        printf("%c", c);

    fclose(fp);
}

一次读取一样:

#include <stdio.h>

int main(void)
{
    FILE *fp;
    char s[1024];  // Big enough for any line this program will encounter
    int linecount = 0;

    fp = fopen("quote.txt", "r");

    while (fgets(s, sizeof s, fp) != NULL)
        printf("%d: %s", ++linecount, s);

    fclose(fp);
}

写文件:

#include <stdio.h>

int main(void)
{
    FILE *fp;
    int x = 32;

    fp = fopen("output.txt", "w");

    fputc('B', fp);
    fputc('\n', fp);   // newline
    fprintf(fp, "x = %d\n", x);
    fputs("Hello, world!\n", fp);

    fclose(fp);
}

stdlib.h 内存分配
#

// Allocate space for a single int (sizeof(int) bytes-worth):

// malloc() 返回的是 void * 类型,自动转换为 int *p
int *p = malloc(sizeof(int));
*p = 12;  // Store something there

printf("%d\n", *p);  // Print it: 12

free(p);  // All done with that memory

//*p = 3490;  // ERROR: undefined behavior! Use after free()!

int *x;

if ((x = malloc(sizeof(int) * 10)) == NULL)
    printf("Error allocating 10 ints\n");
    // do something here to handle it
}

相关文章

Makefile-个人参考手册
··8575 字
Make Makefile Tools
这是我个人的 Makefile 参考手册。
C 预处理器-个人参考手册
··8475 字
Gnu Cpp
这是我个人的 C 预处理器参考手册文档。
链接器 ld
··6277 字
Gnu Gcc Ld
eBPF 介绍
··8299 字
Ebpf
本文档介绍 Linux 内核的各种追踪技术,然后介绍 eBPF 的发展历程、开发和执行流程、开发框架选择和 Demo 示例。