gnu-toolchain - 这篇文章属于一个选集。

§ 4: 本文

gas：GNU as 汇编器
#

gcc 调用 binutil 包提供的汇编器命令 as，将汇编代码转换为机器指令。

# Using Homebrew
brew install gcc
brew install binutils

# 使用 gcc 来调用 as 命令， -Wa, 向 as 传递参数
gcc -c -g -O -Wa,-alh,-L file.c

gas 帮助
#

comment: Anything from ‘/’ through the next ‘/’ is a comment. 或者 # 开头的内容到行尾

symbol： A symbol is one or more characters chosen from the set of all letters (both upper and lower case), digits and the three characters ‘_.$’. On most machines, you can also use $ in symbol names;

If the symbol begins with a dot ‘.’ then the statement is an assembler directive: typically valid for any computer.
If the symbol begins with a letter the statement is an assembly language instruction: it assembles into a machine language instruction.

Statement: A statement ends at a newline character (‘\n’) or a line separator character.

label:     .directive    followed by something
another_label:           # This is an empty statement.
           instruction   operand_1, operand_2, …

Constants: A constant is a number, written so that its value is known by inspection, without knowing any context.

.byte  74, 0112, 092, 0x4A, 0X4a, 'J, '\J # All the same value.
.ascii "Ring the bell\7"                  # A string constant.
.octa  0x123456789abcdef0123456789ABCDEF0 # A bignum.
.float 0f-314159265358979323846264338327\
95028841971.693993751E-40                 # - pi, a flonum.

字符常量： ‘a 字符串常量：“a” 均支持转义字符。

Section: Roughly, a section is a range of addresses , with no gaps; all data “in” those addresses is treated the same for some particular purpose. For example there may be a “read only” section.

The linker ld reads many object files (partial programs) and combines their contents to form a runnable program . When as emits an object file, the partial program is assumed to start at address 0 . ld assigns the final addresses for the partial program, so that different partial programs do not overlap. This is actually an oversimplification, but it suffices to explain how as uses sections.

ld moves blocks of bytes of your program to their run-time addresses. These blocks slide to their run-time addresses as rigid units; their length does not change and neither does the order of bytes within them. Such a rigid unit is called a section. Assigning run-time addresses to sections is called relocation. It includes the task of adjusting mentions of object-file addresses so they refer to the proper run-time addresses.

An object file written by as has at least three sections, any of which may be empty. These are named text, data and bss sections.

When it generates COFF or ELF output, as can also generate whatever other named sections you specify using the ‘.section’ directive (see .section). If you do not use any directives that place output in the ‘.text’ or ‘.data’ sections, these sections still exist, but are empty.

Within the object file, the text section starts at address 0, the data section follows, and the bss section follows the data section.

To let ld know which data changes when the sections are relocated, and how to change that data, as also writes to the object file details of the relocation needed. To perform relocation ld must know, each time an address in the object file is mentioned:

In fact, every address as ever uses is expressed as

(section) + (offset into section)

Further, most expressions as computes have this section-relative nature.

In this manual we use the notation {secname N} to mean “offset N into section secname.”

AT&T 汇编
#

GNU Linux 使用是 AT&T 汇编语法(GAS)，而 Windows、Intel 手册使用 Intel 语法：

特性	AT&T 语法	Intel 语法
操作数顺序	源在左，目标在右	源在右，目标在左
立即数	带 $ 前缀	无前缀
寄存器	带 % 前缀	无前缀
内存寻址	`disp(base,index,scale)`	`[base + index*scale + disp]`
操作数大小	指令带后缀 (l,w,b)	用关键字 (BYTE PTR 等)

示例：

# AT&T 语法
movl $42, %eax              # 立即数到寄存器
movl $42, -4(%rbp)          # 立即数到内存
movl -4(%rbp), %eax         # 内存到寄存器
movl (%rbx,%rcx,4), %eax    # 复杂寻址

数据定义：

# 举例
.section .data
message: .string "Hello"

.section .text
.global _start
_start:
    movq $1, %rax       # 源操作数在左，目标在右
    movq $1, %rdi
    movq $message, %rsi
    movq $5, %rdx
    syscall

gdb、objdump 和 linux kenel 默认都使用 AT&T 汇编语法，但是也可以通过配置参数来指定 Intel 语法风格。

gdb:

# 默认 AT&T 语法
(gdb) disas main
Dump of assembler code for function main:
   0x0000555555555129 <+0>:     push   %rbp
   0x000055555555512a <+1>:     mov    %rsp,%rbp
   0x000055555555512d <+4>:     sub    $0x10,%rsp

# 切换到 Intel 语法
(gdb) set disassembly-flavor intel
(gdb) disas main
Dump of assembler code for function main:
   0x0000555555555129 <+0>:     push    rbp
   0x000055555555512a <+1>:     mov     rbp,rsp
   0x000055555555512d <+4>:     sub     rsp,0x10

objdump:

# 默认 AT&T 语法
$ objdump -d program

0000000000001129 <main>:
    1129:       55                      push   %rbp
    112a:       48 89 e5                mov    %rsp,%rbp
    112d:       48 83 ec 10             sub    $0x10,%rsp
    1131:       89 7d fc                mov    %edi,-0x4(%rbp)

# 切换到 Intel 语法
$ objdump -M intel -d program
0000000000001129 <main>:
    1129:       55                      push    rbp
    112a:       48 89 e5                mov     rbp,rsp
    112d:       48 83 ec 10             sub     rsp,0x10
    1131:       89 7d fc                mov     DWORD PTR [rbp-0x4],edi

数据定义
#

.byte 42         # 8-bit value
.byte 'A'        # Character literal
.word 1000       # 16-bit integer
.long 65536      # 32-bit integer
.quad 1000000    # 64-bit integer

.comm buffer, 100  # Reserve 100 bytes of uninitialized memory

符号
#

# GAS 段定义
.section .data
.section .text

.globl main      # Make symbol visible globally
.extern printf   # Import external function

寄存器
#

x86_64 提供了 16 个通用寄存器，每个寄存器为 64 位。

64位: %rax, %rbx, %rcx, %rdx, %rsi, %rdi, %rbp, %rsp, %r8-%r15
32位: %eax, %ebx, %ecx, %edx, %esi, %edi, %ebp, %esp, %r8d-%r15d
16位: %ax, %bx, %cx, %dx, %si, %di, %bp, %sp, %r8w-%r15w
8位: %al, %bl, %cl, %dl, %sil, %dil, %bpl, %spl, %r8b-%r15b

特殊用途寄存器：

RSP：栈指针，指向当前栈顶。
RBP：基址指针，用于栈帧基址。
RIP：指令指针，指向下一条将执行的指令。
RFLAGS：状态标志寄存器，存储算术运算结果和 CPU 状态。

段寄存器：分为代码段、数据段、栈段等，段寄存器（如 CS、DS）已在现代架构中弱化。

%cs - 代码段
%ds - 数据段
%ss - 栈段
%es - 扩展段
%fs - 额外段
%gs - 额外段

数据传送指令
#

字节序：x86_64 使用小端序，低字节存储在低地址。

操作数大小：支持 8 位（字节）、16 位（字）、32 位（双字）、64 位（四字）。

# mov 指令
movq $60, %rax        # 立即数到寄存器
movq %rax, %rbx       # 寄存器到寄存器
movq (%rax), %rbx     # 内存到寄存器
movq %rax, (%rbx)     # 寄存器到内存

# 零扩展传送
movzbq %al, %rax          # 字节到四字，零扩展
movzwq %ax, %rax          # 字到四字，零扩展

# 符号扩展传送
movsbq %al, %rax          # 字节到四字，符号扩展
movswq %ax, %rax          # 字到四字，符号扩展

movq $42, %rax   # Move immediate to 64-bit register
movl $100, %eax  # Move immediate to 32-bit register

内存寻址
#

基本寻址格式：

AT&T 语法格式： offset(base, index, scale)

等价于计算公式：最终地址 = offset + base + (index * scale)

offset: 偏移量（可选），常数
base: 基址寄存器（可选）
index: 索引寄存器（可选）
scale: 比例因子（可选），必须是 1、2、4 或 8

直接寻址：直接访问指定内存地址

movq value, %rax      # value 是一个标签
movq $0x123456, %rax  # 直接地址

寄存器间接寻址：通过寄存器中的地址访问内存

movq (%rax), %rbx     # 使用 rax 中的地址
movq (%rsp), %rax     # 访问栈顶元素

基址寻址：base + offset

movq 8(%rbp), %rax    # rbp + 8
movq -8(%rbp), %rax   # rbp - 8
movq 16(%rsp), %rax   # rsp + 16

变址寻址：(index * scale) + offset

movq array(, %rcx, 8), %rax   # array + rcx * 8
movq (%rcx, %rdx, 4), %rax    # rcx + rdx * 4

基址变址寻址：base + (index * scale) + offset

movq 8(%rbx, %rcx, 4), %rax   # rbx + rcx * 4 + 8
movq data(%rip, %rcx, 8), %rax # RIP相对寻址

示例：

数组访问：

# int array[10];
# 访问 array[i]，假设 i 在 %rcx 中

# 方式1：基址变址寻址
leaq array(%rip), %rax      # 获取数组基址
movl (%rax, %rcx, 4), %edx  # 访问 array[i]

# 方式2：直接使用标签
movl array(, %rcx, 4), %edx # 直接访问 array[i]

结构体访问：

# struct Point {
#     int x;   // offset 0
#     int y;   // offset 4
#     long z;  // offset 8
# };

# 假设结构体指针在 %rax
movl (%rax), %edx          # 访问 p->x
movl 4(%rax), %edx         # 访问 p->y
movq 8(%rax), %rdx         # 访问 p->z

多维数组访问：

# int matrix[3][4];
# 访问 matrix[i][j]
# i 在 %rcx，j 在 %rdx

# 计算偏移：i * 4 * 4 + j * 4
movq %rcx, %rax
imulq $16, %rax           # i * 16 (4 * 4)
leaq (%rax, %rdx, 4), %rax
movl matrix(%rax), %edx

栈帧访问：

# 访问局部变量和参数
    pushq %rbp
    movq %rsp, %rbp
    subq $16, %rsp           # 分配栈空间

# 访问局部变量
movq $1, -8(%rbp)       # 第一个局部变量
movq $2, -16(%rbp)      # 第二个局部变量

# 访问参数（假设是通过栈传递）
movq 16(%rbp), %rax     # 第一个栈参数
movq 24(%rbp), %rdx     # 第二个栈参数

RIP和基址寻址
#

RIP 相对寻址是特殊的寻址模式：

# 格式：label(%rip)
# 计算方式：目标地址 = 下一条指令地址 + 偏移量

# 位置无关代码中常用
leaq message(%rip), %rdi    # 加载字符串地址
movq value(%rip), %rax      # 加载全局变量

基址寻址：

# 格式：offset(base_register)
# 计算方式：目标地址 = 基址寄存器内容 + 偏移量

movq 8(%rbp), %rax         # 访问栈帧参数
movq -8(%rbp), %rax        # 访问局部变量
movq 16(%rsp), %rax        # 访问栈上数据

# 多重间接访问
    movq (%rax), %rbx           # 一级间接
    movq (%rbx), %rcx           # 二级间接

# 复杂偏移计算
    movq 8(%rax, %rcx, 8), %rdx # base + index * scale + offset

RIP 相对寻址:

用于位置无关代码 (PIC)
基于当前指令位置
适合访问全局数据
支持 ASLR
偏移量在编译时确定

基址寻址:

用于局部数据访问
基于寄存器内容
适合访问栈数据
运行时动态计算
偏移量是固定的

内存布局示例：

RIP 相对寻址

代码段:
0x1000: leaq message(%rip), %rdi
0x1007: ...                     # 下一条指令

数据段:
0x2000: message: "Hello"

计算: 0x1007 + (0x2000 - 0x1007) = 0x2000

基址寻址

栈:
高地址   参数 2        [%rbp + 24]
        参数 1        [%rbp + 16]
        返回地址      [%rbp + 8]
%rbp ->  旧 %rbp      [%rbp]
        局部变量 1    [%rbp - 8]
        局部变量 2    [%rbp - 16]
%rsp ->  ...
低地址

使用场景对比：

适合 RIP 相对寻址的场景

# 1. 访问全局变量
movq global_var(%rip), %rax

# 2. 加载字符串常量
leaq string_const(%rip), %rdi

# 3. 访问静态数组
leaq static_array(%rip), %rsi

# 4. 跳转表
leaq jump_table(%rip), %rax

适合基址寻址的场景

# 1. 函数参数访问
movq 16(%rbp), %rax    # 第一个栈参数

# 2. 局部变量访问
movq -8(%rbp), %rax    # 局部变量

# 3. 数组索引
movq (%rbx,%rcx,8), %rax  # 访问数组元素

# 4. 结构体成员访问
movq 8(%rdi), %rax     # 访问结构体字段

示例：

# RIP 相对寻址：全局数据访问
.section .data
global_var:
    .quad 123

.section .text
func:
    movq global_var(%rip), %rax   # 访问全局变量

# 基址寻址：局部变量访问
func:
    pushq %rbp
    movq %rsp, %rbp
    subq $16, %rsp

    movq $1, -8(%rbp)      # 访问局部变量
    movq 16(%rbp), %rax    # 访问参数

LEAQ
#

leaq（Load Effective Address Quadword）指令计算地址但不访问内存，常用于：

地址计算
简单算术运算
指针操作

例子：

# 基本用法
leaq (%rdi), %rax           # 相当于 move %rdi, %rax
leaq 8(%rdi), %rax         # rax = rdi + 8
leaq (%rdi,%rsi), %rax     # rax = rdi + rsi
leaq (%rdi,%rsi,4), %rax   # rax = rdi + rsi * 4

# 用于算术运算
leaq (%rdi,%rdi,2), %rax   # rax = rdi * 3
leaq (%rdi,%rdi,4), %rax   # rax = rdi * 5

# 数组寻址
leaq array(,%rdi,4), %rax  # rax = &array[rdi]

# RIP相对寻址
leaq message(%rip), %rdi   # 加载字符串地址

算术运算指令
#

# 加法
addq $1, %rax         # rax = rax + 1
addq %rbx, %rax       # rax = rax + rbx

# 减法
subq $1, %rax         # rax = rax - 1
subq %rbx, %rax       # rax = rax - rbx

# 乘法
imulq $2, %rax        # rax = rax * 2
imulq %rbx, %rax      # rax = rax * rbx

# 除法
idivq %rbx            # rdx:rax / rbx，商在rax，余数在rdx

PUSH/POP 栈指令
#

# 栈操作
pushq %rax                # 压栈
popq %rbx                 # 出栈

# 多寄存器操作
pushq %rbp
pushq %rbx
# ...
popq %rbx
popq %rbp

另外，函数调用的 call，leave，ret 也对栈进行操作：

call: 将当前 EIP 寄存器值压栈，然后跳转到被调用函数地址执行；
leave: 相当于 movq %rbp, %rsp；popq %rbp；
ret: 从栈弹出值保存到 EIP ，然后调整到对应函数地址处执行。

逻辑运算指令
#

# 与运算
andq $0xF, %rax       # rax = rax & 0xF

# 或运算
orq $0xF, %rax        # rax = rax | 0xF

# 异或运算
xorq %rax, %rax       # 清零rax

# 移位
shlq $1, %rax         # 左移1位
shrq $1, %rax         # 右移1位

比较和跳转指令
#

# 比较
cmpq $10, %rax        # 比较rax与10
cmpq %rbx, %rax       # 比较rax与rbx

testq %rax, %rax         # 测试 rax（常用于检查零）

# 设置条件码
setl %al                 # 如果小于则设置
setg %al                 # 如果大于则设置
sete %al                 # 如果相等则设置

# 无条件跳转
jmp label             # 跳转到label

# 条件跳转
je  label             # 相等时跳转
jne label             # 不相等时跳转
jg  label             # 大于时跳转
jge label             # 大于等于时跳转
jl  label             # 小于时跳转
jle label             # 小于等于时跳转

位运算
#

# 与运算
andq %rbx, %rax          # rax &= rbx

# 或运算
orq %rbx, %rax           # rax |= rbx

# 异或运算
xorq %rax, %rax          # 清零 rax
xorq %rbx, %rax          # rax ^= rbx

# 移位
shlq $1, %rax            # 左移1位 (乘2)
shrq $1, %rax            # 右移1位 (除2)
sarq $1, %rax            # 算术右移

字符串拷贝
#

rep movsb        # Repeat move string byte

CALL/RET 函数调用和返回
#

CALL 指令执行两个主要操作：

将下一条指令的地址（返回地址）压入栈
跳转到目标函数的地址

执行前：
%rip -> 当前指令(call)
%rsp -> 栈顶

执行后：
%rip -> 目标函数地址
%rsp -> 栈顶 - 8
[%rsp] = 返回地址

调用前:
                   |              |
                   |--------------|
            %rsp ->|              |
                   |--------------|

调用后:
                   |              |
                   |--------------|
                   |  返回地址    |  <-- 压入函数返回地址，及 call 指令下一条指令地址
            %rsp ->|              |
                   |--------------|

CALL 指令的类型：

# 1. 直接调用（使用标签）
call function_name

# 2. 间接调用（通过寄存器）
call *%rax

# 3. 间接调用（通过内存地址）
call *(%rax)

RET 指令执行两个主要操作：

从栈中弹出返回地址
跳转到返回地址

执行前：
%rip -> ret指令
%rsp -> 返回地址

执行后：
%rip -> 返回地址
%rsp -> 栈顶 + 8

返回前:
                   |              |
                   |--------------|
            %rsp ->|  返回地址    |
                   |--------------|
                   |              |

返回后:
                   |              |
                   |--------------|
            %rsp ->|              |  <-- 弹出返回地址
                   |--------------|
                   |              |

示例：

# 函数调用
call function          # 调用函数，将当前 EIP 压栈
ret                    # 返回，从栈弹出压栈返回地址到 EIP

# 调用示例
call *%rax             # 间接调用
call *(%rax)           # 通过内存中的地址调用

示例：计算斐波那契数列
#

.section .text
.globl fib
fib:
    pushq %rbp
    movq %rsp, %rbp

    cmpq $2, %rdi           # 检查n是否小于2
    jge .L2
    movq %rdi, %rax         # 返回n
    jmp .L3

.L2:
    pushq %rbx              # 保存被调用者保存的寄存器

    movq %rdi, %rbx        # 保存n
    subq $1, %rdi          # 计算fib(n-1)
    call fib

    movq %rax, %rsi        # 保存fib(n-1)的结果
    movq %rbx, %rdi
    subq $2, %rdi          # 计算fib(n-2)
    call fib

    addq %rsi, %rax        # fib(n-1) + fib(n-2)

    popq %rbx              # 恢复寄存器

.L3:
    movq %rbp, %rsp
    popq %rbp
    ret

系统调用
#

参考：

Linux x86_64 的系统调用号和参数传递约定:

系统调用号放在 %rax
参数依次放在 %rdi, %rsi, %rdx, %r10, %r8, %r9
使用 syscall 指令进行系统调用

常用系统调用:

sys_read    = 0
sys_write   = 1
sys_open    = 2
sys_close   = 3
sys_exit    = 60

示例：

.section .data
msg:
    .ascii "Hello, World!\n"
    len = . - msg

.section .text
.globl _start
_start:
    # write(1, msg, len)
    movq $1, %rax            # sys_write
    movq $1, %rdi            # stdout
    movq $msg, %rsi          # message
    movq $len, %rdx          # length
    syscall

    # exit(0)
    movq $60, %rax          # sys_exit
    xorq %rdi, %rdi         # status = 0
    syscall

宏定义
#

.macro print_msg
    movq $1, %rax
    movq $1, %rdi
    movq $message, %rsi
    movq $14, %rdx
    syscall
.endm

条件汇编
#

#ifdef DEBUG
    # Debug-specific code
#endif

函数调用和栈帧
#

栈帧（Stack Frame）是为每个函数调用分配的一块连续内存区域，用于存储：

局部变量
保存的寄存器值
函数参数
返回地址

栈帧涉及两个寄存器：

%rsp: 始终指向栈顶
%rbp: 指向当前栈帧的基址，用于定位局部变量和参数

Intel 是小端模式，对于 32 位值，低地址保存低部分值。

栈帧布局
#

调用者栈帧： %rbp 寄存器指向的内存地址及以上的地址。

当前栈帧：%rsp 寄存器指向的内存地址到 %rbp 指向的内存内置。

高地址
                   |                                |
                   |--------------------------------|
                   |      栈参数 (第7个及以后)      |
                   |--------------------------------|
                   |      函数返回地址              |
                   |--------------------------------|
                   |          保存的 %rbp           |  <-- %rbp
                   |--------------------------------|
                   |          局部变量              |
                   |          ...                   |
                   |--------------------------------|
                   |       临时数据/spill区域       |
                   |--------------------------------|
                   |     被调用者保存的寄存器       |  <-- %rsp
低地址

栈帧创建过程
#

# 函数序言(Function Prologue)
pushq %rbp           # 保存栈基址
movq %rsp, %rbp      # 设置新的栈基址
subq $16, %rsp       # 分配栈空间，保存函数内变量

# ... 函数体 ...

# 函数结尾(Function Epilogue)
movq %rbp, %rsp      # 恢复栈指针
popq %rbp            # 恢复栈基址
ret                  # 返回

详细过程：

1. 初始状态:
   %rbp -> |  旧帧指针  |
           |    ...     |
   %rsp -> |           |

2. pushq %rbp 后:

   %rbp -> |  旧帧指针  |
           |    ...     |
           |  旧 %rbp   | <-- %rsp

3. movq %rsp, %rbp 后:

           |  旧帧指针  |
           |    ...     |
   %rbp -> |  旧 %rbp   | <-- %rsp

4. subq $16, %rsp 后:

           |  旧帧指针  |
           |    ...     |
   %rbp -> |  旧 %rbp   |
           |  局部变量  |
           |  局部变量  |
   %rsp -> |           |

调用者职责
#

调用者在调用函数前需要：

pushq %r10              # 保存调用者保存的寄存器
pushq %r11

设置参数，有两种方式 …

通过寄存器传参
寄存器+压栈传参：当 6 个寄存器用完时，从第 7 个参数开始使用压栈传参。

调用函数:

call function

函数返回后：

addq $16, %rsp         # 可选：清理压栈传递的参数，$16 的值取决于压栈传参的大小
popq %r11              # 可选：恢复调用者保存的寄存器, 如 r11 和 r10
popq %r10

被调用者职责
#

function:
    # 函数序言
    pushq %rbp
    movq %rsp, %rbp

    # 保存被调用者保存的寄存器
    pushq %rbx
    pushq %r12
    pushq %r13
    pushq %r14
    pushq %r15

    # ... 函数体 ...

    # 恢复寄存器和返回
    popq %r15
    popq %r14
    popq %r13
    popq %r12
    popq %rbx

    # 函数尾声
    movq %rbp, %rsp
    popq %rbp
    ret

寄存器传参
#

System V AMD64 ABI 调用约定，参数按顺序使用以下寄存器：

第1个参数：%rdi
第2个参数：%rsi
第3个参数：%rdx
第4个参数：%rcx
第5个参数：%r8
第6个参数：%r9
更多参数：通过栈传递（从右向左压栈）

浮点数参数：按照顺序使用 XMM 寄存器：

第1个参数：%xmm0
第2个参数：%xmm1
第3个参数：%xmm2
第4个参数：%xmm3
第5个参数：%xmm4
第6个参数：%xmm5
第7个参数：%xmm6
第8个参数：%xmm7
额外参数：通过栈传递

返回值存放在 %rax

调用者保存: %rcx, %rdx, %rsi, %rdi, %r8-r11

被调用者保存: %rbx, %rbp, %r12-r15

示例：

# 调用函数 func(1, 2, 3, 4, 5, 6, 7, 8)
movq $1, %rdi        # 第1个参数
movq $2, %rsi        # 第2个参数
movq $3, %rdx        # 第3个参数
movq $4, %rcx        # 第4个参数
movq $5, %r8         # 第5个参数
movq $6, %r9         # 第6个参数
pushq $8             # 第8个参数（先压栈，压栈的顺序是从右至左）
pushq $7             # 第7个参数
call func            # 调用函数
addq $16, %rsp       # 清理压栈传递的参数

# 另一个例子：
# void func(int a, double b, long c, float d)
# a in %rdi
# b in %xmm0
# c in %rdx
# d in %xmm1

.globl func
func:
    pushq %rbp
    movq %rsp, %rbp

    # 使用参数
    movq %rdi, -8(%rbp)    # 保存整数参数 a
    movsd %xmm0, -16(%rbp) # 保存双精度浮点数 b
    movq %rdx, -24(%rbp)   # 保存长整型 c
    movss %xmm1, -28(%rbp) # 保存单精度浮点数 d

结构体传递示例：

# struct Point { int x; int y; };
# void func(struct Point p)

func:
    pushq %rbp
    movq %rsp, %rbp
    subq $16, %rsp

    # 结构体通过寄存器传递（如果大小<=16字节）
    movq %rdi, -16(%rbp)  # 保存整个结构体

    # 访问结构体成员
    movl -16(%rbp), %eax  # 访问 x
    movl -12(%rbp), %edx  # 访问 y

参数访问：一般通过相对于 %rbp 的偏移量来访问（基址寻址）：

(%rbp): 保存旧 %rbp 寄存器值；
8(%rbp): 保存函数返回地址

# 访问第一个参数
movq 16(%rbp), %rax   # 从 rbp+16 的位置读取

# 访问第二个参数
movq 24(%rbp), %rax   # 从 rbp+24 的位置读取

可变参数：

# int sum(int count, ...)
sum:
    pushq %rbp
    movq %rsp, %rbp

    # 可变参数通过寄存器和栈传递
    # 需要保存 %al 中的 XMM 寄存器数量

    # 访问可变参数
    movq %rsi, -8(%rbp)   # 第一个可变参数
    movq %rdx, -16(%rbp)  # 第二个可变参数
    # ...

Red Zone：

函数可以使用返回地址之下的128字节而无需调整栈指针
不适用于信号处理程序

func:
    # 可以直接使用 red zone
    movq %rdi, -8(%rsp)   # 安全
    movq %rsi, -16(%rsp)  # 安全
    # ... 最多使用到 -128(%rsp)

局部变量
#

参考：https://akaedu.github.io/book/ch19s03.html

局部变量在栈上分配，使用 %rbp 相对寻址：

局部变量分配示例：

.globl function
function:
    pushq %rbp
    movq %rsp, %rbp

    # 分配局部变量空间
    subq $32, %rsp        # 分配32字节的局部变量空间

    # 初始化局部变量
    movq $0, -8(%rbp)     # 第1个8字节变量
    movq $0, -16(%rbp)    # 第2个8字节变量
    movl $0, -20(%rbp)    # 第3个4字节变量
    movb $0, -21(%rbp)    # 第4个1字节变量

    # 对齐到16字节
    andq $-16, %rsp

一般使用相对于 %rbp 寄存器的偏移来访问：

# 访问第一个局部变量（假设是8字节）
movq -8(%rbp), %rax   # 从 rbp-8 的位置读取

# 访问第二个局部变量
movq -16(%rbp), %rax  # 从 rbp-16 的位置读取

返回值
#

返回值约定：

整数/指针返回值：%rax
浮点数返回值： %xmm0
大型结构体（超过 16 Bytes）：通过栈或指针返回

示例：

# 返回两个值的函数
.globl get_two_values
get_two_values:
    movq $1, %rax        # 第一个返回值
    movq $2, %rdx        # 第二个返回值（约定）
    ret

小于等于16字节的结构体，返回值通过 rax:rdx 传递：

# struct SmallStruct { long a; long b; };
func:
    movq $1, %rax    # a
    movq $2, %rdx    # b
    ret

大于16字节的结构体，通过隐藏参数（指针）返回，caller 分配空间，地址作为第一个参数传入

# long calc(int a, int b)
.globl calc
calc:
    pushq %rbp
    movq %rsp, %rbp

    # 计算 a + b
    movl %edi, %eax   # 第一个参数
    addl %esi, %eax   # 加上第二个参数

    # 返回值已在 %eax 中
    movq %rbp, %rsp
    popq %rbp
    ret

# 返回结构体示例
# struct Point { int x, y; } get_point()
get_point:
    pushq %rbp
    movq %rsp, %rbp

    # 返回 {1, 2}
    movq $0, %rax     # 清零 rax
    movl $1, %eax     # 设置 x
    movl $2, %edx     # 设置 y

    movq %rbp, %rsp
    popq %rbp
    ret

示例
#

.globl example_function
example_function:
    # 函数序言
    pushq %rbp              # 保存旧的帧指针
    movq  %rsp, %rbp        # 设置新的帧指针
    subq  $16, %rsp         # 分配16字节的局部变量空间（不含后续保存的寄存器）

    # 保存被调用者保存的寄存器
    pushq %rbx
    pushq %r12

    # 函数主体
    # ... 函数代码 ...

    # 恢复寄存器
    popq %r12
    popq %rbx

    # 函数尾声
    movq %rbp, %rsp         # 恢复栈指针
    popq %rbp               # 恢复帧指针
    ret                     # 返回

递归函数示例（计算斐波那契数）：

.globl fib
fib:
    # 函数序言
    pushq %rbp
    movq %rsp, %rbp
    pushq %rbx            # 保存被调用者保存的寄存器

    # 检查基础情况
    cmpq $2, %rdi
    jge .L2

    # n < 2 的情况，直接返回n
    movq %rdi, %rax
    jmp .L3

.L2:
    # 保存n
    movq %rdi, %rbx

    # 计算fib(n-1)
    decq %rdi
    call fib
    movq %rax, %r12      # 保存fib(n-1)的结果

    # 计算fib(n-2)
    movq %rbx, %rdi
    subq $2, %rdi
    call fib

    # 计算结果
    addq %r12, %rax

.L3:
    # 函数尾声
    popq %rbx
    movq %rbp, %rsp
    popq %rbp
    ret

带局部变量的函数示例：

.globl calculate
calculate:
    # 函数序言
    pushq %rbp
    movq %rsp, %rbp
    subq $16, %rsp        # 分配两个局部变量的栈空间

    # 保存通过寄存器传递的参数到栈局部变量
    movq %rdi, -8(%rbp)   # 第一个局部变量
    movq %rsi, -16(%rbp)  # 第二个局部变量

    # 计算过程
    movq -8(%rbp), %rax
    addq -16(%rbp), %rax  # 函数返回值保存在 rax 中

    # 函数尾声
    movq %rbp, %rsp
    popq %rbp
    ret # 此时 %rsp 指向的地址保存有函数返回地址

简单函数调用：

.section .text
.globl main
main:
    # 调用前的准备
    pushq %rbp          # 保存旧的帧指针
    movq %rsp, %rbp     # 设置新的帧指针

    call function       # 调用函数，此时返回地址被压入栈

    # 函数返回后继续执行
    movq %rbp, %rsp
    popq %rbp
    ret

function:
    pushq %rbp
    movq %rsp, %rbp

    # 函数体

    movq %rbp, %rsp
    popq %rbp
    ret                # 返回到调用点

带参数的函数调用：

# 函数调用示例：func(1, 2)
    movq $1, %rdi       # 第一个参数，前六个参数通过寄存器传参
    movq $2, %rsi       # 第二个参数
    call func           # 调用函数
    # 返回值在 %rax 中

func:
    pushq %rbp
    movq %rsp, %rbp

    # 使用 %rdi 和 %rsi 中的参数

    movq %rbp, %rsp
    popq %rbp
    ret

多层函数调用示例：

调用链：main -> func1 -> func2

栈的状态:
高地址
                   |     main参数     |
                   |------------------|
                   |   main返回地址   |
                   |     保存的rbp    |
                   |------------------|
                   |   func1返回地址  |
                   |     保存的rbp    |
                   |------------------|
                   |   func2返回地址  |
                   |     保存的rbp    |
            %rsp ->|                  |
低地址

多函数调用示例：

# 定义结构体 (仅作参考)
# struct LargeStruct {
#     long a[8];      # 64字节
#     int b;          # 4字节
# };
#
# struct Point {
#     int x, y;       # 8字节
# };
#
# 函数原型:
# int main();
# int func1(int a, char b, const char* str);
# long func2(int a, struct LargeStruct big, struct Point* p,
#           char c, double d, const char* str);

.section .data
str1:
    .string "Hello"
str2:
    .string "World"

.section .text
.globl main
main:
    # 函数序言
    pushq %rbp
    movq %rsp, %rbp
    subq $144, %rsp           # 分配栈空间，包括对齐和调用 func2 时压栈传递的参数

    # 保存被调用者保存的寄存器
    pushq %rbx
    pushq %r12
    pushq %r13

    # 为 func1 准备参数
    movl $42, %edi          # int a = 42
    movb $'A', %sil         # char b = 'A'
    leaq str1(%rip), %rdx   # const char* str = "Hello"

    # 调用 func1
    call func1
    movl %eax, %r12d        # 保存 func1 的返回值

    # 为 func2 准备参数
    # 首先在栈上构建 LargeStruct
    movq $1, -72(%rbp)      # big.a[0]
    movq $2, -64(%rbp)      # big.a[1]
    movq $3, -56(%rbp)      # big.a[2]
    movq $4, -48(%rbp)      # big.a[3]
    movq $5, -40(%rbp)      # big.a[4]
    movq $6, -32(%rbp)      # big.a[5]
    movq $7, -24(%rbp)      # big.a[6]
    movq $8, -16(%rbp)      # big.a[7]
    movl $9, -12(%rbp)      # big.b

    # 构建 Point 结构体
    subq $16, %rsp          # 为 Point 分配栈空间
    movl $10, (%rsp)        # p->x = 10
    movl $20, 4(%rsp)       # p->y = 20
    movq %rsp, %r13         # 保存 Point 指针

    # 准备 func2 的参数
    movl $100, %edi         # int a = 100

    # LargeStruct 通过引用传递（隐式）
    leaq -72(%rbp), %rsi    # struct LargeStruct*

    movq %r13, %rdx         # struct Point* p
    movb $'B', %cl          # char c = 'B'
    movsd .LC0(%rip), %xmm0 # double d = 3.14
    leaq str2(%rip), %r9    # const char* str = "World"

    # 调用 func2
    call func2

    # 恢复栈和寄存器
    addq $16, %rsp          # 清理 Point 结构体空间
    popq %r13
    popq %r12
    popq %rbx

    # 函数尾声
    movq %rbp, %rsp
    popq %rbp
    ret

# 第一个函数
.globl func1
func1:
    pushq %rbp
    movq %rsp, %rbp

    # 函数体 (简单示例)
    movl %edi, %eax        # 返回第一个参数

    movq %rbp, %rsp
    popq %rbp
    ret

# 第二个函数
.globl func2
func2:
    pushq %rbp
    movq %rsp, %rbp
    subq $16, %rsp         # 分配本地变量空间

    # 访问参数示例
    # %edi 已经包含 int a
    # %rsi 包含 LargeStruct 的指针
    # %rdx 包含 Point 的指针
    # %cl 包含 char c
    # %xmm0 包含 double d
    # %r9 包含 string 指针

    # 函数体 (简单示例)
    movq (%rdx), %rax      # 返回 Point 的 x 值

    movq %rbp, %rsp
    popq %rbp
    ret

.section .rodata
.LC0:
    .double 3.14159        # 双精度浮点常量

详细说明：

参数传递说明：
- func1 使用常规寄存器传递
- func2 的参数较复杂：
  - int a 通过 %edi 传递
  - LargeStruct 通过引用传递（指针在 %rsi）
  - Point* 通过 %rdx 传递
  - char c 通过 %cl 传递
  - double d 通过 %xmm0 传递
  - const char* 通过 %r9 传递
结构体处理：
- LargeStruct 因为超过 64 字节，所以隐式通过引用传递
- Point 结构体指针直接通过寄存器传递
栈帧管理：
- main 函数分配足够空间用于本地变量和临时结构体
- 保持16字节对齐
- 正确保存和恢复被调用者保存的寄存器
内存布局：
- 字符串常量在 .data 段
- 浮点常量在 .rodata 段
- 临时结构体在栈上构建
寄存器使用：
- 遵循 System V AMD64 ABI 调用约定
- 正确处理参数传递
- 适当保存和恢复寄存器
安全性考虑：
- 正确的栈对齐
- 适当的栈空间分配
- 寄存器的保存和恢复

调用过程图解：

初始状态（程序刚启动）

高地址
                   |-------------------|
                   |    环境变量等     |
                   |-------------------|
                   |    命令行参数     |
                   |-------------------|
                   |    返回地址       |
            %rsp ->|-------------------|
低地址

main 函数开始

高地址
                   |-------------------|
                   |    返回地址       |
                   |-------------------|
                   |    旧 %rbp        |  <-- %rbp
                   |-------------------|
                   |     %rbx          |
                   |     %r12          |
                   |     %r13          |
                   |-------------------|
                   |                   |
                   |   LargeStruct     |
                   |    (72字节)       |
                   |                   |
                   |-------------------|
                   |     对齐填充      |
            %rsp ->|-------------------|
低地址

调用 func1 前的准备

                   |-------------------|
                   |    main栈帧       |
                   |-------------------|
                   |  参数1: %edi=42   |
                   |  参数2: %sil='A'  |
                   |  参数3: %rdx=str1 |
            %rsp ->|-------------------|

func1 执行时

高地址
                   |-------------------|
                   |    main栈帧       |  # 包含通过 stack 为 fun1 传递的参数
                   |-------------------|
                   |    返回地址       |
                   |-------------------|
                   |    旧 %rbp        |  <-- %rbp
            %rsp ->|-------------------|
低地址

寄存器状态：
%edi = 42
%sil = 'A'
%rdx = str1的地址

func1 返回后，准备调用 func2

高地址
                   |-------------------|
                   |    main栈帧      |
                   |-------------------|
                   |   LargeStruct    |
                   |    a[0] = 1      |
                   |    a[1] = 2      |
                   |    a[2] = 3      |
                   |    a[3] = 4      |
                   |    a[4] = 5      |
                   |    a[5] = 6      |
                   |    a[6] = 7      |
                   |    a[7] = 8      |
                   |    b = 9         |
                   |------------------|
                   |    Point struct  |
                   |    x = 10        |
                   |    y = 20        |
            %rsp ->|------------------|
低地址

寄存器状态：
%edi = 100                 # int a
%rsi = LargeStruct指针     # struct LargeStruct
%rdx = Point指针           # struct Point*
%cl = 'B'                 # char c
%xmm0 = 3.14159          # double d
%r9 = str2的地址          # const char*

func2 执行时

高地址
                   |-------------------|
                   |    main栈帧       |
                   |-------------------|
                   |    返回地址       |
                   |-------------------|
                   |    旧 %rbp        |  <-- %rbp
                   |-------------------|
                   |   本地变量空间    |
            %rsp ->|   (16字节)        |
                   |-------------------|
低地址

完整的内存布局

高地址
                   |-------------------|
                   |    环境变量等     |
                   |-------------------|
                   |    命令行参数     |
                   |-------------------|
                   |  main返回地址     |
                   |-------------------|
                   |  main旧%rbp       |
                   |-------------------|
                   |    保存的寄存器   |
                   |     %rbx          |
                   |     %r12          |
                   |     %r13          |
                   |-------------------|
                   |   LargeStruct     |
                   |    (72字节)       |
                   |-------------------|
                   |   Point struct    |
                   |    (8字节)        |
                   |-------------------|
                   | func2返回地址     |
                   |-------------------|
                   | func2旧%rbp       |
                   |-------------------|
                   | func2本地变量     |
            %rsp ->|-------------------|
低地址

.section .data:
str1: "Hello\0"
str2: "World\0"

.section .rodata:
LC0: 3.14159 (双精度浮点数)

数据流图

main()
   |
   |---> func1(42, 'A', "Hello")
   |      返回值在 %eax
   |
   |---> func2(100, largeStruct, &point, 'B', 3.14, "World")
         返回值在 %rax

参数传递流程：
main -> func1:
%edi <-- 42
%sil <-- 'A'
%rdx <-- str1地址

main -> func2:
%edi <-- 100
%rsi <-- LargeStruct指针
%rdx <-- Point指针
%cl  <-- 'B'
%xmm0 <-- 3.14159
%r9  <-- str2地址