介绍 bpftrace 工具的使用方式、局限性和问题。
bpf 不支持 DWARF unwinding #
由于内核不支持 DWARF,而 BFP 依赖内核能力,故不支持使用 DWARF unwinding 来 unwind 用户态程序的 stack,而只能使用 FP 来做 userspace 程序的 stack unwinding:
- 对于 BPF 程序,如果要 ustack() 函数正常工作,需要编译时开启 FP,即设置 -fno-omit-frame-pointer 或 –enable-frame-pointer 来编译程序。
- BFP 提供了 bpf_get_stackid()/bpf_get_stack() help func 来获取 userspace stack,但是它依赖于 userspace program 编译时开启了 frame pointer 的支持。
不像 perf、systemtap 支持使用 DWARF 来 unwiding,目前 bpftrace/bcc 都不支持使用 DWARF 来 unwinding,即它们不使用 ELF 中的 .eh_frame 或 debuginfo 文件的 .debug_frame 的 DWARF CFI 信息来进行 unwind。
- Comparing SystemTap and bpftrace:https://lwn.net/Articles/852112/
- User-space backtrace support for programs built without frame pointers #1744 :https://github.com/iovisor/bpftrace/issues/1744
bpftrace 虽然不支持使用 DWARF 进行 unwinding,但是支持使用 DWARF 来对用户函数的参数进行解析。也即使用 =bpftrace -lv ‘uprobe:/bin/bash:readline’= 来显示 readline 函数参数列表时,也是从调试符号表中解析函数名称和参数信息,如果 bpftrace 查不到调试符号表,则会报错: =No DWARF found for XX,cannot show parameter info=
- 参考:https://github.com/iovisor/bpftrace/blob/master/src/dwarf_parser.cpp
安装 bpftrace #
RPM/Deb 包安装:
apt install bpftrace-dbgsym
echo "deb http://ddebs.ubuntu.com $(lsb_release -cs) main restricted universe multiverse
deb http://ddebs.ubuntu.com $(lsb_release -cs)-updates main restricted universe multiverse
deb http://ddebs.ubuntu.com $(lsb_release -cs)-proposed main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/ddebs.list
# ubuntu 需要安装 bpftrace-dbgsym 包:
apt install bpftrace-dbgsym
bpftrace -e 'BEGIN { printf("hello world\n"); }'
手动编译安装 bpftrace
- 参考: [[https://github.com/iovisor/bpftrace/blob/master/INSTALL.md#building-bpftrace][bpftrace Install]]
- 将 bpftrace 安装到
/usr/local/bin目录下, tools 安装到/usr/local/share/bpftrace/tools/目录下:
# apt remove bpftrace
# cd /root
# rm -rf bpftrace/
# git clone --recurse-submodules https://github.com/iovisor/bpftrace # 顺带下载 vendored 的 bcc/libbpf
# mkdir bpftrace/build
# cd bpftrace/build
# unset_proxy
# apt-get update
# apt-get install -y bison cmake flex g++ git libelf-dev zlib1g-dev libfl-dev systemtap-sdt-dev binutils-dev libcereal-dev llvm-dev llvm-runtime libclang-dev clang libpcap-dev libgtest-dev libgmock-dev asciidoctor libdw-dev pahole
# ../build-libs.sh
# cmake -DCMAKE_BUILD_TYPE=Release ..
# make -j8
# make install
# /usr/local/bin/bpftrace --version
# ls /usr/local/share/bpftrace/tools/
bashreadline.bt biosnoop.bt capable.bt doc killsnoop.bt naptime.bt opensnoop.bt runqlen.bt sslsnoop.bt syncsnoop.bt tcpconnect.bt tcpretrans.bt undump.bt writeback.bt
biolatency-kp.bt biostacks.bt cpuwalk.bt execsnoop.bt loads.bt old pidpersec.bt setuids.bt statsnoop.bt syscount.bt tcpdrop.bt tcpsynbl.bt vfscount.bt xfsdist.bt
biolatency.bt bitesize.bt dcsnoop.bt gethostlatency.bt mdflush.bt oomkill.bt runqlat.bt ssllatency.bt swapin.bt tcpaccept.bt tcplife.bt threadsnoop.bt vfsstat.bt
# ls /usr/local/bin/
bpftrace gopls lima-guestagent
# echo 'PATH=/usr/local/bin:$PATH' >> ~/.bashrc
# source ~/.bashrc
# bpftrace --version
bpftrace v0.18.0-97-ge010d
bpftrace 支持 #include
bpftrace info: 查看 bpftrace 信息 #
- Build: 如是否支持 libdw. 只有支持 libdw 才能使用 -lv 对 uprobe 函数展示参数列表;
- Kernel helpers: 内核支持的 eBPF Kernel helpers 特性列表;
- Kernel fatures: 内核支持的 eBPF 特性列表;
root@lima-ebpf-dev:~# bpftrace --info
System
OS: Linux 5.15.0-78-generic #85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023
Arch: x86_64
Build
version: v0.18.0-97-ge010d
LLVM: 14.0.0
unsafe probe: no
bfd: yes
libdw (DWARF support): yes
Kernel helpers
probe_read: yes
probe_read_str: yes
probe_read_user: yes
probe_read_user_str: yes
probe_read_kernel: yes
probe_read_kernel_str: yes
get_current_cgroup_id: yes
send_signal: yes
override_return: yes
get_boot_ns: yes
dpath: yes
skboutput: yes
get_tai_ns: no
get_func_ip: yes
Kernel features
Instruction limit: 1000000
Loop support: yes
btf: yes
module btf: yes
map batch: yes
uprobe refcount (depends on Build:bcc bpf_attach_uprobe refcount): yes
Map types
hash: yes
percpu hash: yes
array: yes
percpu array: yes
stack_trace: yes
perf_event_array: yes
ringbuf: yes
Probe types
kprobe: yes
tracepoint: yes
perf_event: yes
kfunc: yes
kprobe_multi: no
raw_tp_special: yes
iter: yes
bpftrace -lv: 列出插桩点和函数参数 #
bpftrace -l “tracepoint:*”: 查询指定类型插桩点。
bpftrace -lv “tracepoint:syscalls:sys_enter_execve”: 查询 tracepoint/syscall/kfunc/uprobe 函数的参数列表。 对于 urpobe, 需要提供 binary 的 DWARF 信息, 可以是 binary 自带或安装了对应的 debuginfo package, 对于 ubuntu,一般是 XX-dbgsym .
- kprobe 等不支持 -lv 查看参数。
# apt install bash-dbgsym
# bpftrace -lv 'uprobe:/bin/bash:readline'
uprobe:/bin/bash:readline
const char* prompt
# bpftrace -lv 'tracepoint:syscalls:sys_enter_write'
tracepoint:syscalls:sys_enter_write
int __syscall_nr
unsigned int fd
const char * buf
size_t count
bpftrace Cheat Sheet: https://www.brendangregg.com/BPF/bpftrace-cheat-sheet.html
打印函数调用栈 ustack、kstack #
ustack 表示用户空间的堆栈,其中 perf 表示栈的格式,还可以指定用户空间栈的层级,比如 ustack(perf, 3) 表示仅对 perf 格式的用户空间栈选取最近的 3 层。
[ku]stack([bpftrace|perf|raw])
# bpftrace -e 'uprobe:/cloud/my-agent:*doSaveNetworkLocateInfo {printf("%s\n", ustack(perf, 2));}'
12e0480 git.com/my-agent/pkg/storage.(*ProcessStore).doSaveNetworkLocateInfo+0 (/cloud/my-agent)
12e557d git.com/my-agent/pkg/network/processor/pidricher.(*pidEnricher).Process+1405 (/cloud/my-agent)
测量函数执行延迟 #
# 版本1:使用全局变量,有并发干扰问题
#!/usr/bin/bpftrace
uprobe:/usr/bin/dockerd:"github.com/docker/docker/api/server/router/network.(*networkRouter).getNetworksList" {
@start = nsecs;
}
uretprobe:/usr/bin/dockerd:"github.com/docker/docker/api/server/router/network.(*networkRouter).getNetworksList" {
printf("getNetworksList took %d ms\n", (nsecs - @start) / 1000000);
}
# 版本2: OK,使用 per thread 的变量
#!/usr/bin/bpftrace
uprobe:/usr/bin/dockerd:"github.com/docker/docker/api/server/router/network.(*networkRouter).getNetworksList" {
@start[tid] = nsecs;
}
uretprobe:/usr/bin/dockerd:"github.com/docker/docker/api/server/router/network.(*networkRouter).getNetworksList" {
if (@start[tid] != 0) {
printf("getNetworksList took %d ms\n", (nsecs - @start[tid]) / 1000000);
delete(@start[tid]);
}
}
print 支持打印 struct #
- https://github.com/bpftrace/bpftrace/issues/3036
- https://github.com/bpftrace/bpftrace/commit/059c25c1e4035a1e96adfcd1544c5587084a3d0f
- 需要先导入 struct xx 的定义,将参数转换为 struct xx 指针,然后才能用 print() 来打印
- https://stackoverflow.com/questions/62515301/how-to-use-structure-in-bpftracing-scripting
- https://github.com/bpftrace/bpftrace/commit/ded5b31166219f498362672c2a81f8a5f83a522a
NAME print_non_map_struct
RUN bpftrace -v -e 'struct Foo { int m; int n; } uprobe:./testprogs/simple_struct:func { $f = *((struct Foo *) arg0); print($f); exit(); }'
EXPECT { .m = 2, .n = 3 }
TIMEOUT 5
AFTER ./testprogs/simple_struct
NAME struct assignment into map
RUN bpftrace -v -e 'struct Foo { int m; int n; } u:./testprogs/simple_struct:func { @s = *((struct Foo *)arg0); exit(); }'
EXPECT @s: { .m = 2, .n = 3 }
TIMEOUT 4
AFTER ./testprogs/simple_struct
NAME nested struct assignment into map
RUN bpftrace -v -e 'struct Foo { struct { int m[1] } y; struct { int n } a; } u:./testprogs/simple_struct:func { @s = *((struct Foo *)arg0); exit(); }'
EXPECT @s: { .a = { .n = 3 }, .y = { .m = \[2\] } }
TIMEOUT 4
AFTER ./testprogs/simple_struct
join #
最对读取 16 个长度为 1024 的内容。
// https://github.com/iovisor/bpftrace/blob/0b3392baa881f501ce684637acbd4136f8a29ed3/src/bpftrace.h#L190C1-L191C37
unsigned int join_argnum_ = 16;
unsigned int join_argsize_ = 1024;
// https://github.com/iovisor/bpftrace/blob/0b3392baa881f501ce684637acbd4136f8a29ed3/src/bpftrace.cpp#L463C1-L478C4
else if (printf_id == asyncactionint(AsyncAction::join))
{
uint64_t join_id = (uint64_t) * (static_cast<uint64_t *>(data) + 1);
auto delim = bpftrace->resources.join_args[join_id].c_str();
std::stringstream joined;
for (unsigned int i = 0; i < bpftrace->join_argnum_; i++) {
auto *arg = arg_data + 2*sizeof(uint64_t) + i * bpftrace->join_argsize_;
if (arg[0] == 0)
break;
if (i)
joined << delim;
joined << arg;
}
bpftrace->out_->message(MessageType::join, joined.str());
return;
}
argN/sargN/reg/args/retval #
N 从 0 开始,表示函数第一个、第二个参数。
- arg0, arg1, …: Arguments to the traced function; assumed to be 64 bits wide
- 适用于:kprobes, uprobes, usdt
- sarg0, sarg1, …: Arguments to the traced function (for programs that store arguments on the stack); assumed to be 64 bits wide
- 适用于:kprobes, uprobes 注意:
- 如果函数参数不严格占用一个 64 bit(如 struct 而非 struct 指针),则该参数可能使用多个寄存器。但是内核函数惯例都是 struct 指针,所以基本上 argN 是对应第 N 个参数。
# bpftrace -e 'uprobe:/home/bgregg/func:main.add { printf("%d %d\n", arg0, arg1); }'
Attaching 1 probe...
42 13
# bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Attaching 1 probe...
opening: /proc/cpuinfo
opening: /proc/stat
opening: /proc/diskstats
opening: /proc/stat
opening: /proc/vmstat
[...]
reg(const string name):reg 是 bpftrace 内置函数,返回指定 name 的寄存器值,比如 amd64 的 ax/bx/cx/sp(不含 r 前缀);
- 适用于:kprobe、uprobe
# bpftrace -e 'uprobe:/home/bgregg/Lang/go/func:main*add { printf("%d %d\n", *(reg("sp") + 8), *(reg("sp") + 16)); }'
Attaching 1 probe...
42 13
args:The struct with all arguments of the traced function. Available in tracepoint
, kfunc
, and uprobe
(with DWARF) probes. Use args.x
to access argument x
or args
to get a record with all arguments.
- https://github.com/iovisor/bpftrace/commit/7e77f6896b1285a6b6eba044e16880c88faa2f44
- 内核函数 (tracepoint、kfunc)需要 BTF 支持。用户函数 uprobe 需要二进制有 DWARF 支持;
- args.
访问各名称参数,并支持 struct 类型的解引用;
root@lima-ebpf-dev:~# bpftrace -lv 'kfunc:vmlinux:__traceiter_net_dev_start_xmit'
kfunc:vmlinux:__traceiter_net_dev_start_xmit
void * __data
const struct sk_buff * skb
const struct net_device * dev
int retval
root@lima-ebpf-dev:~# bpftrace -e 'kfunc:vmlinux:__traceiter_net_dev_start_xmit {printf("%x\n", args.skb->protocol);}'
Attaching 1 probe...
retval: Value returned by the function being traced (kretprobe, uretprobe, fexit). For kretprobe and uretprobe, its type is uint64, but for fexit it depends. You can look up the type using bpftrace -lv
- 适用于 kretprobe, uretprobe, fexit
# bpftrace -e 'kretprobe:do_sys_open { printf("returned: %d\n", retval); }'
Attaching 1 probe...
returned: 8
returned: 21
returned: -2
returned: 21
[...]
对于指针类型的内核函数参数,可以:
-
引入头文件中 struct 定义,然后就可以解析各字段;
bpftrace 可以读取系统的头文件,如下面的内核头文件。
# cat path.bt #include <linux/path.h> #include <linux/dcache.h> kprobe:vfs_open { printf("open path: %s\n", str(((struct path *)arg0)->dentry->d_name.name)); } # bpftrace path.bt Attaching 1 probe... open path: dev open path: if_inet6 open path: retrans_time_ms [...]
-
或者,如果内置内置 BTF,就可以不引入头文件,直接解析字段;
# bpftrace -e 'kprobe:vfs_open { printf("open path: %s\n", str(((struct path *)arg0)->dentry->d_name.name)); }' Attaching 1 probe... open path: cmdline open path: interrupts [...]
C 和 Go 函数传参差异 #
C/C++ 使用 AMD64 ABI
规范,使用寄存器为函数传参:rdi, rsi, rdx, rcx, r8, r9, stack, stack …
bpftrace 使用 arg0, arg1, arg2, …argN 来获取这些寄存器传参的值。
# bpftrace -e 'uprobe:/home/bgregg/func:main.add { printf("%d %d\n", arg0, arg1); }'
Attaching 1 probe...
42 13
go 1.17 以前版本,使用 stack 传参,
- https://github.com/bpftrace/bpftrace/issues/740 bpftrace 使用 sarg0, sarg1, sarg3, …sargN 来获得 stack 传参的值:
sarg0 == *(reg(“sp”) + 8) sarg1 == *(reg(“sp”) + 16)
# bpftrace -e 'uprobe:/home/bgregg/Lang/go/func:main*add { printf("%d %d\n", *(reg("sp") + 8), *(reg("sp") + 16)); }'
Attaching 1 probe...
42 13
go 1.17 以后版本,改为寄存器传参为主,stack 传参为辅助
(具体需要反汇编二进制来确定),但使用的寄存器顺序:rax, rbx, rcx, rdi, rsi, r8, r9, r10, r11, stack 和 C 的 ADM64 ABI
使用的顺序不一致,所以 bpftrace 的 argN 不适用于新的 golang 版本
。
- https://github.com/bpftrace/bpftrace/issues/2547
- https://go.googlesource.com/go/+/refs/heads/dev.regabi/src/cmd/compile/internal-abi.md#amd64-architecture
- Calling convention: https://mechpen.github.io/posts/2022-10-30-golang-bpf/
另外一个问题是,go 的一些类型,如 string,实际是地址+长度组成(uinptr + i64),对于这样的一个 go 类型参数,需要 使用两个寄存器来传参
,这时 argN 就不一定对应第 N 个参数了。
对于 C/Go 函数,如果参数类型是 struct 而非指针,则也会有上面的问题,argN 和函数的第 N 个参数 不是一一对应了
!
# https://godbolt.org/z/67a4Yde5e
#include <stdint.h>
struct Foo {
uint64_t a;
uint64_t b;
};
void byval(Foo f);
void bar() {
Foo f = {
.a = 1,
.b = 2,
};
byval(f);
}
### 反汇编
bar():
mov edi, 1
mov esi, 2
jmp byval(Foo)
如果参数是一个地址或结构体,需要将该地址强转到对应的数据结构,才能正常解析。比如 golang 的 string 其实内部是一个 struct 定义,当函数参数是 string 类型时,需要使用如下方式解析:
struct GoString {
char * str;
int len;
};
uprobe:./string:main.join
{
$p1 = (struct GoString*) sarg0;
printf("arg1[%d]:%s\n", $p1->len, str($p1->str, $p1->len));
$p2 = (struct GoString*) sarg1;
printf("arg2[%d]:%s\n", $p2->len, str($p2->str, $p2->len));
}
所以,为了在 bpfstrace 中准确获取函数参数,最保险的办法是 反汇编函数然后看传参的方式和使用的寄存器或 stack 情况
。
#/tmp/bpftrace3 -v -e ‘uprobe:myagentt/current/bin/my-agent/my-agent:“git.com/my-agent/pkg/storage.(*ProcessStore).doSaveNetworkLocateInfo” { print(args)}’ https://github.com/bpftrace/bpftrace/commit/7e77f6896b1285a6b6eba044e16880c88faa2f44
C/Go 反汇编 #
gdb, objdump, go tool objdump 都可以实现 不运行程序
的情况下反汇编。
gdb 反汇编函数 #
Go 函数名一般包括完整路径,如:git.com/my-agent/pkg/containers/docker.GetSandboxLabels
这里的 . 和 / 和 - 都不是标准的 C 变量名称标识符,所以 gdb 不支持直接使用。
解决办法:
- info functions
支持使用正则匹配函数名。 - 使用字符串 ‘
’,如 - x/i ‘git.com/my-agent/pkg/containers/docker.GetSandboxLabels’
- print ‘git.com/my-agent/pkg/containers/docker.GetSandboxLabels’
- disassemble ‘git.com/my-agent/pkg/containers/docker.GetSandboxLabels’
# gdb /tmp/my-agent
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /tmp/my-agent...done.
# 显示所有函数
(gdb) info functions
# 显示匹配指定正则的函数
# info functions <Regexp> 支持正则表达式匹配函数名
(gdb) info functions GetSandboxLabels
All functions matching regular expression "GetSandboxLabels":
File /Users/alizj/go/src/git.com/my-agent/pkg/containers/docker/docker.go:
179: void git.com/my-agent/pkg/containers/docker.GetSandboxLabels(string, map[string]string);
(gdb) info functions git.com/my-agent/pkg/containers/docker.GetSandboxLabels
All functions matching regular expression "git.com/my-agent/pkg/containers/docker.GetSandboxLabels":
File /Users/alizj/go/src/git.com/my-agent/pkg/containers/docker/docker.go:
179: void git.com/my-agent/pkg/containers/docker.GetSandboxLabels(string, map[string]string);
# 显示函数地址,需要使用单引号括住函数名完整路径
(gdb) x/i 'git.com/my-agent/pkg/containers/docker.GetSandboxLabels'
0x1535ce0 <git.com/my-agent/pkg/containers/docker.GetSandboxLabels>: lea -0x68(%rsp),%r12
(gdb) print 'git.com/my-agent/pkg/containers/docker.GetSandboxLabels'
$1 = {void (string, map[string]string)} 0x1535ce0 <git.com/my-agent/pkg/containers/docker.GetSandboxLabels>
(gdb) info address 'git.com/my-agent/pkg/containers/docker.GetSandboxLabels'
No symbol "'git.com/my-agent/pkg/containers/docker.GetSandboxLabels'" in current context.
# 反汇编函数
## 按函数地址反汇编
(gdb) disassemble 0x1535ce0
Dump of assembler code for function git.com/my-agent/pkg/containers/docker.GetSandboxLabels:
0x0000000001535ce0 <+0>: lea -0x68(%rsp),%r12
0x0000000001535ce5 <+5>: cmp 0x10(%r14),%r12
## 如果函数名包含特殊字符,需要使用单引号括住
(gdb) disassemble git.com/my-agent/pkg/containers/docker.GetSandboxLabels
No symbol "gitlab" in current context.
## 按函数名反汇编
(gdb) disassemble 'git.com/my-agent/pkg/containers/docker.GetSandboxLabels'
Dump of assembler code for function git.com/my-agent/pkg/containers/docker.GetSandboxLabels:
0x0000000001535ce0 <+0>: lea -0x68(%rsp),%r12
0x0000000001535ce5 <+5>: cmp 0x10(%r14),%r12
0x0000000001535ce9 <+9>: jbe 0x1535f84 <git.com/my-agent/pkg/containers/docker.GetSandboxLabels+676>
0x0000000001535cef <+15>: push %rbp
0x0000000001535cf0 <+16>: mov %rsp,%rbp
0x0000000001535cf3 <+19>: sub $0xe0,%rsp
0x0000000001535cfa <+26>: mov %rbx,0xf8(%rsp)
0x0000000001535d02 <+34>: mov %rax,0xf0(%rsp)
0x0000000001535d0a <+42>: cmpq $0x0,0x1d2571e(%rip) # 0x325b430 <git.com/my-agent/pkg/containers/options.cache>
0x0000000001535d12 <+50>: jne 0x1535d4c <git.com/my-agent/pkg/containers/docker.GetSandboxLabels+108>
反汇编时可以加 /m 或 /s 来显示函数源码:
(gdb) disassemble /m 0x1535ce0
Dump of assembler code for function git.com/my-agent/pkg/containers/docker.GetSandboxLabels:
179 /Users/alizj/go/src/git.com/my-agent/pkg/containers/docker/docker.go: No such file or directory.
0x0000000001535ce0 <+0>: lea -0x68(%rsp),%r12
0x0000000001535ce5 <+5>: cmp 0x10(%r14),%r12
0x0000000001535ce9 <+9>: jbe 0x1535f84 <git.com/my-agent/pkg/containers/docker.GetSandboxLabels+676>
0x0000000001535cef <+15>: push %rbp
objdump 反汇编函数 #
使用 linux 标准的 objdump 工具反汇编:
- objdump 只能反汇编相同架构的二进制,其它架构需要对应的 objdump 工具,如 aarch64-linux-gnu-objdump -d
- llvm-objdump -d 可以反汇编支持的多种架构
# objdump -d /tmp/my-agent >asm
# grep GetSandboxLabels asm
15353b2: e8 29 09 00 00 callq 1535ce0 <git.com/my-agent/pkg/containers/docker.GetSandboxLabels>
0000000001535ce0 <git.com/my-agent/pkg/containers/docker.GetSandboxLabels>:
1535ce9: 0f 86 95 02 00 00 jbe 1535f84 <git.com/my-agent/pkg/containers/docker.GetSandboxLabels+0x2a4>
1535d12: 75 38 jne 1535d4c <git.com/my-agent/pkg/containers/docker.GetSandboxLabels+0x6c>
#grep -A 30 0000000001535ce0 asm
0000000001535ce0 <git.com/my-agent/pkg/containers/docker.GetSandboxLabels>:
1535ce0: 4c 8d 64 24 98 lea -0x68(%rsp),%r12
1535ce5: 4d 3b 66 10 cmp 0x10(%r14),%r12
1535ce9: 0f 86 95 02 00 00 jbe 1535f84 <git.com/my-agent/pkg/containers/docker.GetSandboxLabels+0x2a4>
1535cef: 55 push %rbp
1535cf0: 48 89 e5 mov %rsp,%rbp
1535cf3: 48 81 ec e0 00 00 00 sub $0xe0,%rsp
1535cfa: 48 89 9c 24 f8 00 00 mov %rbx,0xf8(%rsp)
1535d01: 00
1535d02: 48 89 84 24 f0 00 00 mov %rax,0xf0(%rsp)
1535d09: 00
1535d0a: 48 83 3d 1e 57 d2 01 cmpq $0x0,0x1d2571e(%rip) # 325b430 <git.com/my-agent/pkg/containers/options.cache>
1535d11: 00
1535d12: 75 38 jne 1535d4c <git.com/my-agent/pkg/containers/docker.GetSandboxLabels+0x6c>
1535d14: e8 a7 72 ae ff callq 101cfc0 <git.com/my-agent/pkg/containers/options.newCache>
1535d19: 83 3d a0 4c 29 02 00 cmpl $0x0,0x2294ca0(%rip) # 37ca9c0 <runtime.writeBarrier>
1535d20: 74 13 je 1535d35 <git.com/my-agent/pkg/containers/docker.GetSandboxLabels+0x55>
1535d22: e8 19 b4 f3 fe callq 471140 <runtime.gcWriteBarrier2>
1535d27: 49 89 03 mov %rax,(%r11)
1535d2a: 48 8b 15 ff 56 d2 01 mov 0x1d256ff(%rip),%rdx # 325b430 <git.com/my-agent/pkg/containers/options.cache>
1535d31: 49 89 53 08 mov %rdx,0x8(%r11)
1535d35: 48 89 05 f4 56 d2 01 mov %rax,0x1d256f4(%rip) # 325b430 <git.com/my-agent/pkg/containers/options.cache>
1535d3c: 48 8b 84 24 f0 00 00 mov 0xf0(%rsp),%rax
1535d43: 00
1535d44: 48 8b 9c 24 f8 00 00 mov 0xf8(%rsp),%rbx
1535d4b: 00
1535d4c: 48 8b 15 dd 56 d2 01 mov 0x1d256dd(%rip),%rdx # 325b430 <git.com/my-agent/pkg/containers/options.cache>
1535d53: 48 85 d2 test %rdx,%rdx
1535d56: 74 46 je 1535d9e <git.com/my-agent/pkg/containers/docker.GetSandboxLabels+0xbe>
1535d58: 48 8b 12 mov (%rdx),%rdx
1535d5b: 4c 8d 1d 5e ed 85 00 lea 0x85ed5e(%rip),%r11 # 1d94ac0 <github.com/jellydator/ttlcache/v3..dict.Cache[string,*git.com/my-agent/pkg/containers/options.Container]>
使用 nm 获得函数地址后,然后精确的从该函数地址处反汇编:
# nm /tmp/my-agent |grep GetSandboxLabels
0000000001535ce0 T git.com/my-agent/pkg/containers/docker.GetSandboxLabels
# objdump -d --start-address 0x0000000001535ce0 /tmp/my-agent | head -30
/tmp/my-agent: file format elf64-x86-64
Disassembly of section .text:
0000000001535ce0 <git.com/my-agent/pkg/containers/docker.GetSandboxLabels>:
1535ce0: 4c 8d 64 24 98 lea -0x68(%rsp),%r12
1535ce5: 4d 3b 66 10 cmp 0x10(%r14),%r12
1535ce9: 0f 86 95 02 00 00 jbe 1535f84 <git.com/my-agent/pkg/containers/docker.GetSandboxLabels+0x2a4>
1535cef: 55 push %rbp
1535cf0: 48 89 e5 mov %rsp,%rbp
1535cf3: 48 81 ec e0 00 00 00 sub $0xe0,%rsp
1535cfa: 48 89 9c 24 f8 00 00 mov %rbx,0xf8(%rsp)
1535d01: 00
1535d02: 48 89 84 24 f0 00 00 mov %rax,0xf0(%rsp)
1535d09: 00
1535d0a: 48 83 3d 1e 57 d2 01 cmpq $0x0,0x1d2571e(%rip) # 325b430 <git.com/my-agent/pkg/containers/options.cache>
1535d11: 00
1535d12: 75 38 jne 1535d4c <git.com/my-agent/pkg/containers/docker.GetSandboxLabels+0x6c>
1535d14: e8 a7 72 ae ff callq 101cfc0 <git.com/my-agent/pkg/containers/options.newCache>
1535d19: 83 3d a0 4c 29 02 00 cmpl $0x0,0x2294ca0(%rip) # 37ca9c0 <runtime.writeBarrier>
1535d20: 74 13 je 1535d35 <git.com/my-agent/pkg/containers/docker.GetSandboxLabels+0x55>
1535d22: e8 19 b4 f3 fe callq 471140 <runtime.gcWriteBarrier2>
1535d27: 49 89 03 mov %rax,(%r11)
1535d2a: 48 8b 15 ff 56 d2 01 mov 0x1d256ff(%rip),%rdx # 325b430 <git.com/my-agent/pkg/containers/options.cache>
1535d31: 49 89 53 08 mov %rdx,0x8(%r11)
1535d35: 48 89 05 f4 56 d2 01 mov %rax,0x1d256f4(%rip) # 325b430 <git.com/my-agent/pkg/containers/options.cache>
1535d3c: 48 8b 84 24 f0 00 00 mov 0xf0(%rsp),%rax
1535d43: 00
go tool objdump 反汇编函数 #
- go tool compile -S main.go # 编译代码为汇编代码。
- go tool objdump # 可用于查看任意函数的机器码、汇编指令、偏移。
- go tool objdump -S 二进制 # Objdump 打印二进制文件中所有文本符号(代码)的反汇编。如果存在 -S 选项,objdump 只会反汇编名称与正则表达式匹配的符号。
https://mp.weixin.qq.com/s?src=11×tamp=1736415416&ver=5740&signature=KtTwQoSjWCwn9k4ezcDBlqieFR7Lk4HAOFmp1-O7IL0ON8R4QcxpXKQpnPoifyWviewm1mAwgPhEGeT1q7nV3uIiDF8ylxsK1gfs1aetugiJzGQGMy27976jytIDjD&new=1
机器语言一堆的0/1代码确实反人类,汇编语言指令繁杂 不同机器设备还有较大差异。比如x86架构的汇编指令一般有两种格式:
- Intel汇编 DOS、Windows包括我们之前了解的8086处理器 Windows:VC编译器
- AT&T汇编 Linux、Unix、Mac OS Unix:GCC编译器
go tool objdump 使用的是 go plang9 汇编语法,而非标准的 AMD64 汇编:
oot@lima-dev2:/Users/alizj/go/src/git.com/my-agent# go tool objdump -S -s GetSandboxLabels ./my-agent
TEXT git.com/my-agent/pkg/containers/docker.GetSandboxLabels(SB) /Users/alizj/go/src/git.com/my-agent/pkg/containers/docker/docker.go
func GetSandboxLabels(sanboxId string) map[string]string {
0x1535ce0 4c8d642498 LEAQ -0x68(SP), R12
0x1535ce5 4d3b6610 CMPQ R12, 0x10(R14)
0x1535ce9 0f8695020000 JBE 0x1535f84
0x1535cef 55 PUSHQ BP
0x1535cf0 4889e5 MOVQ SP, BP
0x1535cf3 4881ece0000000 SUBQ $0xe0, SP
0x1535cfa 48899c24f8000000 MOVQ BX, 0xf8(SP)
0x1535d02 48898424f0000000 MOVQ AX, 0xf0(SP)
if cache == nil {
0x1535d0a 48833d1e57d20100 CMPQ git.com/my-agent/pkg/containers/options.cache(SB), $0x0
0x1535d12 7538 JNE 0x1535d4c
cache = newCache()
0x1535d14 e8a772aeff CALL git.com/my-agent/pkg/containers/options.newCache(SB)
0x1535d19 833da04c290200 CMPL runtime.writeBarrier(SB), $0x0
0x1535d20 7413 JE 0x1535d35
0x1535d22 e819b4f3fe CALL runtime.gcWriteBarrier2(SB)
0x1535d27 498903 MOVQ AX, 0(R11)
0x1535d2a 488b15ff56d201 MOVQ git.com/my-agent/pkg/containers/options.cache(SB), DX
go 反汇编例子 #
To figure out go calling conventions, we could compose some simple functions, then look at their generated assembly code. For example, the following simple function:
func fewArgsTest(a1, a2, a3, a4 uint64) (uint64, uint64, uint64, uint64) {
return a1+0x11, a2+0x22, a3+0x33, a4+0x44
}
We can disassemble this function (at 0x0000000000459d20):
$ gdb -batch -ex 'file ./example' -ex 'disassemble 0x0000000000459d20'
Dump of assembler code for function main.fewArgsTest:
0x0000000000459d20 <+0>: add $0x11,%rax
0x0000000000459d24 <+4>: add $0x22,%rbx
0x0000000000459d28 <+8>: add $0x33,%rcx
0x0000000000459d2c <+12>: add $0x44,%rdi
0x0000000000459d30 <+16>: ret
End of assembler dump.
It’s easy to see that the four arguments are passed in the registers
rax, rbx, rcx, and rdi. This example code has more such functions. The following are some results for Go 1.19.
下面是演示代码 main.go,我们的目标是通过 bpftrace 分析 concat 方法的输入输出: https://cloud.tencent.com/developer/article/1918230
package main
func main() {
println(concat("ab", "cd"))
}
func concat(a, b string) string {
return a + b
}
让我们通过 gdb 来看看 go1.17 中字符串参数是怎么传递的:
shell> go build -gcflags="-l" ./main.go
shell> gdb ./main
(gdb) # 设置断点
(gdb) b main.concat
(gdb) # 运行
(gdb) r
(gdb) # 查看参数
(gdb) i args
x = 0x461513 "ab"
y = 0x461515 "cd"
(gdb) # 查看寄存器
(gdb) i r
rax 0x461513 4592915
rbx 0x2 2
rcx 0x461515 4592917
rdi 0x2 2
(gdb) # 检查地址 0x461513
(gdb) x/2cb 0x461513
0x461513: 97 'a' 98 'b'
(gdb) # 检查地址 0x461515
(gdb) x/2cb 0x461515
0x461515: 99 'c' 100 'd'
(gdb) # 查看寄存器
(gdb) i r
rax 0xc00001a0e0 824633827552
rbx 0x4 4
(gdb) # 检查地址 0xc00001a0e0
(gdb) x/4cb 0xc00001a0e0
0xc00001a0e0: 97 'a' 98 'b' 99 'c' 100 'd'
如上可见:当我们给 main.sum 方法传递两个字符串参数的时候,实际上是占用 4 个寄存器,每个字符串参数占用两个寄存器,分别是地址和长度,正好贴合字符串的数据结构:
type StringHeader struct {
Data uintptr
Len int
}
了解了相关知识之后,我们就可以通过如下 bpftrace 命令来监控 sum 的输入输出了:
shell> bpftrace -e '
uprobe:./main:main.concat {
printf("a: %s b: %s\n",
str(reg("ax"), reg("bx")),
str(reg("cx"), reg("di"))
)
}
uretprobe:./main:main.concat {
printf("retval: %s\n", str(reg("ax"), reg("bx")))
// printf("retval: %s\n", str(retval))
}
'
a: ab b: cd
retval: abcd
以上,我们介绍了当参数和返回值是整形或字符串时,如何用 bpftrace 分析 golang 程序,如果类型更复杂的话,比如说是一个 struct,那么原理也是类似的,篇幅所限,本文就不再赘述了,有兴趣的读者可以参考文章后面的相关链接。
go 反汇编例子 2 #
通过 bpftrace 分析 golang 方法的参数和返回值
下面是演示代码 main.go,我们的目标是通过 bpftrace 分析 sum 方法的输入输出:
package main
func main() { println(sum(11, 22)) }
func sum(a, b int) int { return a + b }
在编译的时候,记得关闭内联,否则一旦 sum 被内联了,eBPF 就没法加探针了:
shell> go build -gcflags="-l" ./main.go shell> objdump -t ./main | grep -w sum 000000000045dd60 g F .text 0000000000000033 main.sum
准备工作做好之后,我们就可以通过如下 bpftrace 命令来监控 sum 的输入输出了:
shell> bpftrace -e ' uprobe:./main:main.sum {printf(“a: %d b: %d\n”, sarg0, sarg1)} uretprobe:./main:main.sum {printf(“retval: %d\n”, retval)} ' a: 11 b: 22 retval: 33
不过测试发现,如上 bpftrace 命令仅在 go1.17 之前的版本工作正常,在 go1.17 之后的版本,sargx 变量取不到数据,这是因为从 go.1.17 开始,参数不再保存在栈里,而是保存在寄存器中,关于这一点在 Go internal ABI specification 中有详细的描述:
amd64 architecture The amd64 architecture uses the following sequence of 9 registers for integer arguments and results: RAX, RBX, RCX, RDI, RSI, R8, R9, R10, R11
让我们通过 gdb 来验证这一点:
shell> gdb ./main (gdb) # 设置断点 (gdb) b main.sum (gdb) # 运行 (gdb) r (gdb) # 查看寄存器 (gdb) i r rax 0xb 11 rbx 0x16 22
如上可见:main.sum 的第一个参数保存在 rax 寄存器,第二个参数保存在 rbx 寄存器,和 Go internal ABI specification 中的描述一致。
搞清楚这些之后,我们就知道在 go1.17 以后的版本,如何用 bpftrace 监控输入输出了:
shell> bpftrace -e ' uprobe:./main:main.sum {printf(“a: %d b: %d\n”, reg(“ax”), reg(“bx”))} uretprobe:./main:main.sum {printf(“retval: %d\n”, retval)} ' a: 11 b: 22 retval: 33
说到这,细心的读者可能已经发现:我们一直在讨论整形,如果是字符串该怎么办?我们不妨构造一个字符串的例子再来测试一下,本次测试是在 go1.17 下进行的:
下面是演示代码 main.go,我们的目标是通过 bpftrace 分析 concat 方法的输入输出:
package main
func main() { println(concat(“ab”, “cd”)) }
func concat(a, b string) string { return a + b }
让我们通过 gdb 来看看 go1.17 中字符串参数是怎么传递的:
shell> go build -gcflags="-l" ./main.go shell> gdb ./main (gdb) # 设置断点 (gdb) b main.concat (gdb) # 运行 (gdb) r (gdb) # 查看参数 (gdb) i args x = 0x461513 “ab” y = 0x461515 “cd” (gdb) # 查看寄存器 (gdb) i r rax 0x461513 4592915 rbx 0x2 2 rcx 0x461515 4592917 rdi 0x2 2 (gdb) # 检查地址 0x461513 (gdb) x/2cb 0x461513 0x461513: 97 ‘a’ 98 ‘b’ (gdb) # 检查地址 0x461515 (gdb) x/2cb 0x461515 0x461515: 99 ‘c’ 100 ’d’ (gdb) # 查看寄存器 (gdb) i r rax 0xc00001a0e0 824633827552 rbx 0x4 4 (gdb) # 检查地址 0xc00001a0e0 (gdb) x/4cb 0xc00001a0e0 0xc00001a0e0: 97 ‘a’ 98 ‘b’ 99 ‘c’ 100 ’d’
如上可见:当我们给 main.sum 方法传递两个字符串参数的时候,实际上是占用 4 个寄存器,每个字符串参数占用两个寄存器,分别是地址和长度,正好贴合字符串的数据结构:
type StringHeader struct { Data uintptr Len int }
了解了相关知识之后,我们就可以通过如下 bpftrace 命令来监控 sum 的输入输出了:
shell> bpftrace -e ' uprobe:./main:main.concat { printf(“a: %s b: %s\n”, str(reg(“ax”), reg(“bx”)), str(reg(“cx”), reg(“di”)) ) } uretprobe:./main:main.concat { printf(“retval: %s\n”, str(reg(“ax”), reg(“bx”))) // printf(“retval: %s\n”, str(retval)) } ' a: ab b: cd retval: abcd
以上,我们介绍了当参数和返回值是整形或字符串时,如何用 bpftrace 分析 golang 程序,如果类型更复杂的话,比如说是一个 struct,那么原理也是类似的,篇幅所限,本文就不再赘述了,有兴趣的读者可以参考文章后面的相关链接。
go 反汇编例子 3 #
以如下 Go 函数为例:
- go string 占用两个 64 bits
func GetSandboxLabels(sanboxId string) map[string]string {
// 1.先从 cache 中查.
cCache := options.GetCache()
if cCache != nil {
if container := cCache.Get(sanboxId); container != nil {
return container.Value().Labels
}
}
// 2.如果 cache 中没有则调用接口查.
inspect, err := dClient.ContainerInspect(context.Background(), sanboxId)
if err != nil {
log.V(1).Infof("Error inspecting container %s: %v\n", sanboxId, err)
return map[string]string{}
}
return inspect.Config.Labels
}
#/tmp/bpftrace4 -lv 'uprobe:/tmp/my-agent:*GetSandboxLabels*'
/tmp/bpftrace4: stat /static-python: No such file or directory
uprobe:/tmp/my-agent:git.com/my-agent/pkg/containers/docker.GetSandboxLabels
struct string sanboxId
map[string]string ~r0
(gdb) b git.com/my-agent/pkg/containers/docker.GetSandboxLabels
Breakpoint 1 at 0x1535ce0: file /Users/alizj/go/src/git.com/my-agent/pkg/containers/docker/docker.go, line 179.
(gdb) disas 0x1535ce0
Dump of assembler code for function git.com/my-agent/pkg/containers/docker.GetSandboxLabels:
0x0000000001535ce0 <+0>: lea -0x68(%rsp),%r12
0x0000000001535ce5 <+5>: cmp 0x10(%r14),%r12
0x0000000001535ce9 <+9>: jbe 0x1535f84 <git.com/my-agent/pkg/containers/docker.GetSandboxLabels+676>
0x0000000001535cef <+15>: push %rbp
0x0000000001535cf0 <+16>: mov %rsp,%rbp
0x0000000001535cf3 <+19>: sub $0xe0,%rsp
0x0000000001535cfa <+26>: mov %rbx,0xf8(%rsp)
0x0000000001535d02 <+34>: mov %rax,0xf0(%rsp)
0x0000000001535d0a <+42>: cmpq $0x0,0x1d2571e(%rip) # 0x325b430 <git.com/my-agent/pkg/containers/options.cache>
0x0000000001535d12 <+50>: jne 0x1535d4c <git.com/my-agent/pkg/containers/docker.GetSandboxLabels+108>
0x0000000001535d14 <+52>: callq 0x101cfc0 <git.com/my-agent/pkg/containers/options.newCache>
0x0000000001535d19 <+57>: cmpl $0x0,0x2294ca0(%rip) # 0x37ca9c0 <runtime.writeBarrier>
0x0000000001535d20 <+64>: je 0x1535d35 <git.com/my-agent/pkg/containers/docker.GetSandboxLabels+85>
0x0000000001535d22 <+66>: callq 0x471140 <runtime.gcWriteBarrier2>
0x0000000001535d27 <+71>: mov %rax,(%r11)
0x0000000001535d2a <+74>: mov 0x1d256ff(%rip),%rdx # 0x325b430 <git.com/my-agent/pkg/containers/options.cache>
0x0000000001535d31 <+81>: mov %rdx,0x8(%r11)
0x0000000001535d35 <+85>: mov %rax,0x1d256f4(%rip) # 0x325b430 <git.com/my-agent/pkg/containers/options.cache>
0x0000000001535d3c <+92>: mov 0xf0(%rsp),%rax
0x0000000001535d44 <+100>: mov 0xf8(%rsp),%rbx
0x0000000001535d4c <+108>: mov 0x1d256dd(%rip),%rdx # 0x325b430 <git.com/my-agent/pkg/containers/options.cache>
让我分析这段代码的对应关系:
- 函数设置和参数处理
lea -0x68(%rsp),%r12 # 设置栈指针
cmp 0x10(%r14),%r12 # 栈空间检查
jbe 0x1535f84 # 栈溢出处理跳转
push %rbp # 保存旧的栈基址
mov %rsp,%rbp # 设置新的栈基址
sub $0xe0,%rsp # 分配 224 字节栈空间
- 保存寄存器和参数
mov %rbx,0xf8(%rsp) # 保存 rbx
mov %rax,0xf0(%rsp) # 保存 rax (可能包含 sandboxId 参数)
- 检查缓存是否存在(对应代码中的 options.GetCache())
cmpq $0x0,0x1d2571e(%rip) # 检查 options.cache 是否为 nil
jne 0x1535d4c # 如果不为 nil,跳转
- 缓存不存在时的处理
callq 0x101cfc0 # 调用 options.newCache()
cmpl $0x0,0x2294ca0(%rip) # 检查写屏障
je 0x1535d35 # 如果写屏障禁用,跳转
callq 0x471140 # 调用 runtime.gcWriteBarrier2
- 更新缓存
mov %rax,(%r11) # 存储新缓存
mov 0x1d256ff(%rip),%rdx # 加载旧缓存
mov %rdx,0x8(%r11) # 更新缓存
mov %rax,0x1d256f4(%rip) # 保存新缓存
这段汇编对应了原始代码中的:
- 获取缓存实例
options.GetCache()
- 检查缓存是否为空
if cCache != nil
- 如果缓存不存在,创建新的缓存
特别注意:
- Go 的垃圾回收写屏障机制在汇编中有体现
- 缓存操作涉及到全局变量的访问和更新
- 这只是函数的开始部分,后续还应该有:
- 从缓存中获取容器信息
- 调用 ContainerInspect
- 错误处理
- 返回标签映射
这段代码展示了 Go 在处理:
- 全局变量访问
- 内存管理和垃圾回收
- 条件分支
- 函数调用约定 等方面的底层实现细节。
前面的 GetSandboxLabels 是哪种传参模式?
让我分析一下 GetSandboxLabels 函数的传参模式。从之前的反汇编代码看:
Dump of assembler code for function GetSandboxLabels:
0x0000000001535ce0 <+0>: lea -0x68(%rsp),%r12
0x0000000001535ce5 <+5>: cmp 0x10(%r14),%r12
0x0000000001535ce9 <+9>: jbe 0x1535f84
0x0000000001535cef <+15>: push %rbp
0x0000000001535cf0 <+16>: mov %rsp,%rbp
0x0000000001535cf3 <+19>: sub $0xe0,%rsp
0x0000000001535cfa <+26>: mov %rbx,0xf8(%rsp)
0x0000000001535d02 <+34>: mov %rax,0xf0(%rsp) # 保存 rax 到栈上
结合原始代码:func GetSandboxLabels(sanboxId string) map[string]string
这是寄存器传参模式,原因如下:
-
输入参数 sandboxId 是通过 RAX 寄存器传入的
- 从
mov %rax,0xf0(%rsp)
可以看出,RAX 寄存器的值被保存到栈上
- 从
-
函数使用了 Go 的寄存器 ABI(Application Binary Interface)
- Go 1.17 之后的寄存器传参约定中,第一个参数通常使用 RAX 寄存器
- string 类型正好适合通过寄存器传递
-
函数开头没有从栈上加载参数的操作,而是将寄存器中的值保存到栈上
- 这是寄存器传参的典型特征
- 保存到栈上是为了后续使用或在函数调用时保护参数值
这说明该函数使用的是现代 Go 编译器的寄存器传参模式,而不是传统的栈传参模式。
bpftrace skb 解析 #
if the $ipheader->daddr is 192.168.2.44, just convert this four number to hex chars, which are c0 a8 02 2c. reversal chars are 2c 02 a8 c0. so you can just write: https://stackoverflow.com/questions/75172893/comparing-ip-addresses-in-bpftrace
root@lima-ebpf-dev:~# bpftrace -lv 'kfunc:ip_finish_output'
kfunc:vmlinux:ip_finish_output
struct net * net
struct sock * sk
struct sk_buff * skb
int retval
# bpftrace 可以读取系统的内核头文件
root@lima-ebpf-dev:~# cat /Users/zhangjun/skb.bt
#include <linux/skbuff.h>
#include <linux/icmp.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/in.h>
kfunc:ip_finish_output {
$skb = (struct sk_buff *)args.skb;
$dev = $skb->dev;
$name = $dev->name;
$ipheader = ((struct iphdr *) ($skb->head + $skb->network_header));
$version = ($ipheader->version) >>4;
if($ipheader->protocol == IPPROTO_ICMP) {
// get ICMP header; see skb_transport_header():
$icmph = (struct icmphdr *)($skb->head + $skb->transport_header);
if ($icmph->type == ICMP_ECHO) {
$id = $icmph->un.echo.id;
$seq = $icmph->un.echo.sequence;
printf("icmp: pid %d, comm: %s, [%d] %d\t%s > %s\n, id: %d, seq: %d, dev: %s\n", pid, comm, $version, $ipheader->protocol,
ntop($ipheader->saddr), ntop($ipheader->daddr), $id, $seq, $name);
}
}
}
root@lima-ebpf-dev:~# bpftrace /Users/zhangjun/skb.bt
Attaching 1 probe...
icmp: pid 206295, comm: ping, [0] 1 192.168.5.1 > 114.114.114.114
, id: 5376, seq: 48640, dev: eth0
icmp: pid 206295, comm: ping, [0] 1 192.168.5.1 > 114.114.114.114
, id: 5376, seq: 48896, dev: eth0
icmp: pid 206295, comm: ping, [0] 1 192.168.5.1 > 114.114.114.114
, id: 5376, seq: 49152, dev: eth0
另一个例子:https://lwn.net/Articles/793749/
There is an important capability missing from those one-liners: struct navigation
. Here is the
function prototype again: int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
bpftrace provides arg0-argN for kprobe function arguments, simply mapping them to the registers for
the calling convention (arg2 becomes %rdx on x86_64, for example). Since bpftrace can read kernel headers
, which are often installed on production systems, accessing struct data is possible by
including the right header and casting the arguments:
#include <net/sock.h>
[...]
$sk = (struct sock *)arg0;
Here’s an example of a bpftrace tool that prints the address information, size, and return value from tcp_sendmsg(). Example output:
# ./tcp_sendmsg.bt
Attaching 2 probes...
10.0.0.65 49978 -> 52.37.243.173 443 : 63 bytes, retval 63
127.0.0.1 58566 -> 127.0.0.1 22 : 36 bytes, retval 36
127.0.0.1 22 -> 127.0.0.1 58566: 36 bytes, retval 36
[...]
The source of tcp_sendmsg.bt:
#!/usr/local/bin/bpftrace
#include <net/sock.h>
k:tcp_sendmsg
{
@sk[tid] = arg0;
@size[tid] = arg2;
}
kr:tcp_sendmsg
/@sk[tid]/
{
$sk = (struct sock *)@sk[tid];
$size = @size[tid];
$af = $sk->__sk_common.skc_family;
if ($af == AF_INET) {
$daddr = ntop($af, $sk->__sk_common.skc_daddr);
$saddr = ntop($af, $sk->__sk_common.skc_rcv_saddr);
$lport = $sk->__sk_common.skc_num;
$dport = $sk->__sk_common.skc_dport;
$dport = ($dport >> 8) | (($dport << 8) & 0xff00);
printf("%-15s %-5d -> %-15s %-5d: %d bytes, retval %d\n",
$saddr, $lport, $daddr, $dport, $size, retval);
} else {
printf("IPv6...\n");
}
delete(@sk[tid]);
delete(@size[tid]);
}
In the kprobe, sk and size are saved in per-thread-ID maps, so they can be retrieved in the kretprobe when tcp_sendmsg() returns. The kretprobe casts sk and prints out details, if it is an IPv4 message, using the bpftrace function =ntop() to convert the address to a string=. The destination port is =flipped from network to host order=. To keep this short I skipped IPv6, but you can add code to handle it too (ntop() does support IPv6 addresses).
There is work underway for bpftrace to use BPF Type Format (BTF) information as well, which brings various advantages including struct definitions that are missing from kernel headers.
bpftrace profiling 和火焰图 #
和 perf record 类似,可以周期采样整个系统或特定进程:
# bpftrace -v -e 'profile:hz:100 /pid == 1/ { @[ustack(1)] = count(); }'
对于 bpftrace 产生的 profiling 数据,可以使用 flamegraph 提供的转换工具进行可视化:
sudo bpftrace -e 'profile:hz:99 { @[kstack] = count(); }' > trace.data
cd FlameGraph
# 使用 stackcollapse-bpftrace.pl 工具进行转换
./stackcollapse-bpftrace.pl trace.data > trace.folded
./flamegraph.pl --inverted trace.folded > traceflamegraph.svg
bpftrace 跟踪内核函数调用栈 #
使用 -e 来指定 event,同时打印 kstack:
# bpftrace -e 'kprobe:nf_conntrack_in {printf("%s\n", kstack); }'
nf_conntrack_in+1
nf_hook_slow+61
__ip_local_out+214
ip_local_out+23
ip_send_skb+21
udp_send_skb.isra.43+277
udp_sendmsg+1544
sock_sendmsg+48
___sys_sendmsg+688
__sys_sendmsg+99
do_syscall_64+85
entry_SYSCALL_64_after_hwframe+68
bpftrace 跟踪用户程序执行 #
注:bpftrace 不支持基于 DWARF 的 stack unwinding,需要用户程序编译时生成 frame pointer。或者使用支持 DWARF 的 perf record 命令。
- 执行使用 bpftrace 执行程序;
root@lima-ebpf-dev:~# cat test.c
#include <stdio.h>
#include <unistd.h>
void func_d() {
int msec=1;
printf("%s","Hello world from D\n");
usleep(10000*msec);
}
void func_c() {
printf("%s","Hello from C\n");
func_d();
}
void func_b() {
printf("%s","Hello from B\n");
func_c();
}
void func_a() {
printf("%s","Hello from A\n");
func_b();
}
int main() {
func_a();
}
# 没有指定 -O 优化选项,所以开启 FP
root@lima-ebpf-dev:~# gcc test.c -o hello
# 确认 gcc 在函数调用的开头添加保存 FP 的指令。
root@lima-ebpf-dev:~# objdump -S hello |grep -A 4 func_c
000000000000119e <func_c>:
119e: f3 0f 1e fa endbr64
11a2: 55 push %rbp # 保存 FP
11a3: 48 89 e5 mov %rsp,%rbp
11a6: 48 8d 05 6a 0e 00 00 lea 0xe6a(%rip),%rax # 2017 <_IO_stdin_used+0x17>
--
11de: e8 bb ff ff ff call 119e <func_c>
11e3: 90 nop
11e4: 5d pop %rbp
11e5: c3 ret
# 打印调用 func_c 的 user call stack
root@lima-ebpf-dev:~# bpftrace -e 'uprobe:./hello:func_c {printf("%s", ustack)}' -c ./hello
Attaching 1 probe...
Hello from A
Hello from B
Hello from C
Hello world from D
func_c+0
func_a+33
main+18
__libc_start_call_main+128
使用 pid 追踪正在运行的程序;
- 二进制程序需要支持 FP,才能进行 stack unwinding。
- 加 -p 则只 probe 特定进程的函数调用, 否则是系统范围内执行该二进制的函数;
root@lima-ebpf-dev:~# apt install bash-dbgsym bash-static-dbgsym
root@lima-ebpf-dev:~# bpftrace -e 'uprobe:/usr/bin/bash:readline {printf("%s", ustack)}' # -p 12446
bpftrace 跟踪容器方式部署的应用 #
如果应用程序跑在容器内,在宿主机用 bpftrace 跟踪时,需要一些额外信息。
指定目标文件的绝对路径 #
目标文件在宿主机上的绝对路径。
例如,如果想跟踪 cilium-agent 进程(本身是用 docker 容器部署的),首先需要找到 cilium-agent 文件在宿主机上的绝对路径,可以通过 container ID 或 name 找:
- merged path 是容器使用的 overlay 根文件系统。
# Check cilium-agent container
$ docker ps | grep cilium-agent
0eb2e76384b3 cilium:test "/usr/bin/cilium-agent ..." 4 hours ago Up 4 hours cilium-agent
# Find the merged path for cilium-agent container
$ docker inspect --format "{{.GraphDriver.Data.MergedDir}}" 0eb2e76384b3
/var/lib/docker/overlay2/a17f868d/merged # a17f868d.. is shortened for better viewing
# The object file we are going to trace
$ ls -ahl /var/lib/docker/overlay2/a17f868d/merged/usr/bin/cilium-agent
-rwxr-xr-x 1 root root 86M /var/lib/docker/overlay2/a17f868d/merged/usr/bin/cilium-agent
也可以暴力一点直接 find:
(node) $ find /var/lib/docker/overlay2/ -name cilium-agent
/var/lib/docker/overlay2/a17f868d/merged/usr/bin/cilium-agent
然后再指定绝对路径 uprobe:go 函数需要包含完整路径 字符串, 如 “github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerate”
(node) $ bpftrace -e 'uprobe:/var/lib/docker/overlay2/a17f868d/merged/usr/bin/cilium-agent:"github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerate" {printf("%s\n", ustack); }'
Attaching 1 probe...
github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerate+0
github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).run.func1+363
sync.(*Once).doSlow+236
github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).run+101
runtime.goexit+1
可以使用 nm 或者 bptrace 命令来查看 go 二进制中可以 tracing 的符号(函数)列表:
$ nm cilium-agent
000000000427d1d0 B bufio.ErrBufferFull
000000000427d1e0 B bufio.ErrFinalToken
0000000001d3e940 T type..hash.github.com/cilium/cilium/pkg/k8s.ServiceID
0000000001f32300 T type..hash.github.com/cilium/cilium/pkg/node/types.Identity
0000000001d05620 T type..hash.github.com/cilium/cilium/pkg/policy/api.FQDNSelector
0000000001d05e80 T type..hash.github.com/cilium/cilium/pkg/policy.PortProto
...
# bpftrace -l 'uprobe:./exec:*'|tail
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.lookupInfoNFC
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.lookupInfoNFKC
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.nextCGJCompose
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.nextCGJDecompose
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.nextComposed
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.nextDecomposed
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.nextDone
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.nextHangul
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.nextMulti
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.nextMultiNorm
指定目标进程 PID /proc/
#
二进制路径为 /proc/<pid>/root
下的路径.
$ sudo docker inspect -f '{{.State.Pid}}' cilium-agent
109997
$ bpftrace -e 'uprobe:/proc/109997/root/usr/bin/cilium-agent:"github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerate" {printf("%s\n", ustack); }'
bpftrace 调试 Go 程序调试 #
go build 生成的二进制默认包含 .debug_xx 调试符号表,使用 FP,但不包含 .eh_frame(go 不使用 eh_frame,而使用 FP 或 .debug_frame)
alizj@lima-dev2:/Users/alizj/go/src/git.com/my-agent$ GOOS=linux GOARCH=amd64 go build -o my-agent-amd64 ./cmd/
alizj@lima-dev2:/Users/alizj/go/src/git.com/my-agent$ file my-agent-amd64
my-agent-amd64: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=CwNfhKa02_E_0xJyBXTh/YtKaW6xl9F8fuPNB2FmQ/GTJ4i0gXCea-djzz28sm/8HQk6LYzhP01Evpb6hA5, with debug_info, not stripped
alizj@lima-dev2:/Users/alizj/go/src/git.com/my-agent$ go version
go version go1.21.1 linux/arm64
alizj@lima-dev2:/Users/alizj/go/src/git.com/my-agent$ readelf -S ./my-agent-amd64 |grep -E 'debug|eh'
[13] .debug_abbrev PROGBITS 0000000000000000 02e61000
[14] .debug_line PROGBITS 0000000000000000 02e61135
[15] .debug_frame PROGBITS 0000000000000000 0311f672
[16] .debug_gdb_s[...] PROGBITS 0000000000000000 031c7ed2
[17] .debug_info PROGBITS 0000000000000000 031c7eff
[18] .debug_loc PROGBITS 0000000000000000 0368e961
[19] .debug_ranges PROGBITS 0000000000000000 03a55bbf
go run 默认使用 “–ldflags ‘-s -w’”, 故删除了 symbol table(-s) 和 DWARF debug info(-w),不能用于调试。
为了更好的调试 go 程序,需要 The binary must be built with go build -gcflags=all="-N -l"
to disable inlining and optimizations that can interfere with debugging.
使用 bpftrace -l 查看 go 二进制中的函数列表:
# bpftrace -l uprobe:myagentt/current/bin/my-agent/my-agent |grep SaveNet
uprobe:/cloud/my-agent:git.com/my-agent/pkg/storage.(*ProcessStore).SaveNetworkLocateInfo
uprobe:/cloud/my-agent:git.com/my-agent/pkg/storage.(*ProcessStore).doSaveNetworkLocateInfo
如果二进制包含 DWARF 的 .debug_xx 信息,对于高版本的 bpftrace 可以使用 -lv 参数查看 uprobe 函数参数列表:
- 二进制需要包含 DWARF 信息;
- bpftrace 工具版本要高(98E 的版本不支持);
#readelf -S /tmp/my-agent |grep -E 'eh|debug'
[13] .debug_abbrev PROGBITS 0000000000000000 02e5b000
[14] .debug_line PROGBITS 0000000000000000 02e5b135
[15] .debug_frame PROGBITS 0000000000000000 03118b86
[16] .debug_gdb_script PROGBITS 0000000000000000 031c13e6
[17] .debug_info PROGBITS 0000000000000000 031c1413
[18] .debug_loc PROGBITS 0000000000000000 03687edb
[19] .debug_ranges PROGBITS 0000000000000000 03a4f139
#/tmp/bpftrace4 --version
/tmp/bpftrace4: stat /static-python: No such file or directory
bpftrace v0.20.4
#/tmp/bpftrace4 -lv 'uprobe:/tmp/my-agent:*SaveNetworkLocateInfo'
/tmp/bpftrace4: stat /static-python: No such file or directory
uprobe:/tmp/my-agent:git.com/my-agent/pkg/storage.(*ProcessStore).SaveNetworkLocateInfo
git.com/my-agent/pkg/storage.ProcessStore* ps
git.com/my-agent/pkg/storage.NetworkLocateInfo* n
error ~r0
uprobe:/tmp/my-agent:git.com/my-agent/pkg/storage.(*ProcessStore).doSaveNetworkLocateInfo
git.com/my-agent/pkg/storage.ProcessStore* ps
git.com/my-agent/pkg/storage.NetworkInfo* ni
struct string pid
bool isFailed
error ~r0
对于内核函数,bpftrace 也支持 -lv 打印参数:
#bpftrace -l 'tracepoint:syscalls:sys_enter_openat' -v
tracepoint:syscalls:sys_enter_openat
int __syscall_nr;
int dfd;
const char * filename;
int flags;
umode_t mode;
C++/Rust/Go 函数名 demangling #
perf report 默认支持和开启 C++ 函数名称 demangle: –demangle Demangle symbol names to human readable form. It’s enabled by default, disable with –no-demangle. –demangle-kernel Demangle kernel symbol names to human readable form (for C++ kernels).
Add support for rust demangling #3688 :https://github.com/bpftrace/bpftrace/pull/3688/files // Note that legacy rust programs use the same C++ mangling convention, // and therefore will always start with _Z. This is fine, but they also // include a symbol hash at the end which would normally be stripped // off. If users have legacy rust programs, they can just use a // wildcard match against the hash component, and the rust specific // bits will match against the newer v0 rust mangling convention.
// The legacy mangling scheme for rust actually uses the C++ // demangler with an extra hash at the end. We use the same scheme, // and users will need to explicitly wildcard against this hash.
// We may choose to parse the v0 mangled symbols defined by: // https://rust-lang.github.io/rfcs/2603-rust-symbol-name-mangling-v0.html // // Or may vendor/link an alternate library to do so.
COMMAND ${CMAKE_COMMAND} -E env “RUSTFLAGS=-C symbol_mangling_version=v0” ${CARGO_EXECUTABLE} build –target-dir ${CMAKE_CURRENT_BINARY_DIR}
Rust 函数名 mangling 后的格式: https://rust-lang.github.io/rfcs/2603-rust-symbol-name-mangling-v0.html
_RNvNtCs1234_7mycrate3foo3bar
<>^^^^^<----><------><--><-->
|||||| | | | |
|||||| | | | +--- "bar" identifier
|||||| | | +------- "foo" identifier
|||||| | +------------- "mycrate" identifier
|||||| +-------------------- disambiguator for "mycrate"
|||||+------------------------ start-tag for "mycrate"
||||+------------------------- namespace tag for "foo"
|||+-------------------------- start-tag for "foo"
||+--------------------------- namespace tag for "bar"
|+---------------------------- start-tag for "bar"
+----------------------------- common Rust symbol prefix
https://stackoverflow.com/questions/34234354/perf-shows-mangled-function-names
When perf report gives you mangled names like _Z*, _ZN*, _ZL*
etc, it means that your perf tool was compiled without access to demangling function or with it disabled
. There is code to detect demangler in Makefiles:
uprobe Go 函数名问题 #
bpftrace uprobe 的函数名不支持特殊字符,如点号、括号等。
- 新的 bpftrace 版本支持点号,https://github.com/bpftrace/bpftrace/issues/548
- 但是最新版还是不支持括号
而 golang 的函数明一般包含完整的 go package 路径,如 git.com/my-agent/pkg/storage.(*ProcessStore).doSaveNetworkLocateInfo,这会导致 bpftrace 解析函数名称时报错:
- 如果函数名太长,也会报错,可以 Yep, we finalized support for big strings on master. BPFTRACE_MAX_STRLEN can go as high as you want now (up to 32k)
- https://github.com/bpftrace/bpftrace/issues/3617
#bpftrace --version
bpftrace v0.11.2
#bpftrace -l uprobe:/cloud/my-agent |grep SaveNet
uprobe:myagentt/current/bin/my-agent/my-agent:git.com/my-agent/pkg/storage.(*ProcessStore).SaveNetworkLocateInfo
uprobe:myagentt/current/bin/my-agent/my-agent:git.com/my-agent/pkg/storage.(*ProcessStore).doSaveNetworkLocateInfo
#bpftrace -e 'uprobe:/cloud/my-agent:git.com/my-agent/pkg/storage.(*ProcessStore).SaveNetworkLocateInfo { @[ustack] = count(); }'
stdin:1:1-97: ERROR: syntax error, unexpected -, expecting {
uprobe:/cloud/my-agent:git.com/my-agent/pkg/storage.(*ProcessStore).SaveNetworkLocateInfo { @[ustack] = count(); }
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
解决办法:使用 1. 函数地址或 2. 函数名路径 wildcard 或 3. 函数名路径字符串
- bpftrace 的 uprobe 后面的函数名路径支持 wildcard,所以可以使用 * 来匹配或忽略特殊字符。
# OK
#bpftrace -e 'uprobe:/cloud/my-agent:*SaveNetworkLocateInfo { @[ustack] = count(); }'
# OK
#bpftrace -e 'uprobe:/cloud/my-agent:gitlab.comp*SaveNetworkLocateInfo { @[ustack] = count(); }'
Attaching 2 probes...
# OK
#bpftrace -e 'uprobe:/cloud/my-agent:gitlab.comp*SaveNetwork*Info { @[ustack] = count(); }'
# 为函数名路径添加字符串双引号,OK
#bpftrace -e 'uprobe:/cloud/my-agent:"git.com/my-agent/pkg/storage.(*ProcessStore).SaveNetworkLocateInfo" { @[ustack] = count(); }'
#nm -n /cloud/my-agent |grep SaveNetwork
00000000012e0220 T git.com/my-agent/pkg/storage.(*ProcessStore).SaveNetworkLocateInfo
00000000012e0480 T git.com/my-agent/pkg/storage.(*ProcessStore).doSaveNetworkLocateInfo
# 使用 地址 OK
# bpftrace -e 'uprobe:/cloud/my-agent:0x12e0220 { @[ustack] = count(); }'
uretprobe 不兼容 golang #
https://github.com/bpftrace/bpftrace/blob/master/man/adoc/bpftrace.adoc#uprobe-uretprobe
It is important to note that for uretprobe s to work the kernel runs a special helper on user-space function entry which overrides the return address on the stack. This can cause issues with languages that have their own runtime like Golang:
// example.go
func myprint(s string) {
fmt.Printf("Input: %s\n", s)
}
func main() {
ss := []string{"a", "b", "c"}
for _, s := range ss {
go myprint(s)
}
time.Sleep(1*time.Second)
}
bpftrace
# bpftrace -e 'uretprobe:./test:main.myprint { @=count(); }' -c ./test
runtime: unexpected return pc for main.myprint called from 0x7fffffffe000
stack: frame={sp:0xc00008cf60, fp:0xc00008cfd0} stack=[0xc00008c000,0xc00008d000)
fatal error: unknown caller pc
通过 uretprobe 检查 golang 方法的返回值可能存在风险。这是因为 uretprobe 是通过修改栈来加入探针的, 这和 golang 本身对栈的管理存在冲突的可能.
虽然在 golang 程序中使用 uretprobe 是不安全的,但是好在 uprobe 还可以放心用。其实换个角度看,即便我们不使用 uretprobe,依然有办法获取返回时,比如我们可以通过在本方法 return 的时候或者在一个方法开始的时候设置一个 uprobe 来获取返回值。