介绍 bpftrace 工具的使用方式、局限性和问题。
内核不支持 DWARF,perf、gdb、systemtap 等是在用户态使用 DWARF 做调用栈展开。
但目前 bpftrace/bcc 不支持使用 DWARF 来做用户态程序的调用栈展开,而只能使用 FP :
- Comparing SystemTap and bpftrace:https://lwn.net/Articles/852112/
- User-space backtrace support for programs built without frame pointers #1744 :https://github.com/iovisor/bpftrace/issues/1744
对于 BPF 程序,如果要 ustack() 函数正常工作,需要编译时开启 FP:
- 不开启优化,不使用任何 -O 选项或指定 -O0;
- 或者明确指定编译参数:
-fno-omit-frame-pointer
或--enable-frame-pointer
;
bpftrace 虽然不支持使用 DWARF 进行 unwinding,但是支持使用 DWARF 来对用户函数的参数进行解析。也即使用 bpftrace -lv 'uprobe:/bin/bash:readline'
来显示 readline 函数参数列表时,也是从调试符号表中解析函数名称和参数信息,如果 bpftrace 查不到调试符号表,则会报错: No DWARF found for XX,cannot show parameter info
- 参考:https://github.com/iovisor/bpftrace/blob/master/src/dwarf_parser.cpp
root@lima-ebpf-dev:~# apt install bash-dbgsym bash-static-dbgsym
root@lima-ebpf-dev:~# bpftrace -e 'uprobe:/usr/bin/bash:readline {printf("%s", ustack)}' # -p 12446
安装 bpftrace #
RPM/Deb 包安装:
echo "deb http://ddebs.ubuntu.com $(lsb_release -cs) main restricted universe multiverse
deb http://ddebs.ubuntu.com $(lsb_release -cs)-updates main restricted universe multiverse
deb http://ddebs.ubuntu.com $(lsb_release -cs)-proposed main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/ddebs.list
# ubuntu 需要安装 bpftrace-dbgsym 包:
apt install bpftrace-dbgsym
bpftrace -e 'BEGIN { printf("hello world\n"); }'
bpftrace 支持 #include<xx>
内核头文件来获得内核 struct 定义,所以需要安装内核头文件。
bpftrace Cheat Sheet: https://www.brendangregg.com/BPF/bpftrace-cheat-sheet.html
查看 bpftrace 信息 #
- Build: 如是否支持 libdw,只有支持 libdw 才能使用 -lv 显示用户函数的参数列表(来源于 DWARF);
- Kernel helpers: 内核支持的 eBPF Kernel helpers 特性列表;
- Kernel fatures: 内核支持的 eBPF 特性列表;
root@lima-ebpf-dev:~# bpftrace --info
System
OS: Linux 5.15.0-78-generic #85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023
Arch: x86_64
Build
version: v0.18.0-97-ge010d
LLVM: 14.0.0
unsafe probe: no
bfd: yes
libdw (DWARF support): yes
Kernel helpers
probe_read: yes
probe_read_str: yes
probe_read_user: yes
probe_read_user_str: yes
probe_read_kernel: yes
probe_read_kernel_str: yes
get_current_cgroup_id: yes
send_signal: yes
override_return: yes
get_boot_ns: yes
dpath: yes
skboutput: yes
get_tai_ns: no
get_func_ip: yes
Kernel features
Instruction limit: 1000000
Loop support: yes
btf: yes
module btf: yes
map batch: yes
uprobe refcount (depends on Build:bcc bpf_attach_uprobe refcount): yes
Map types
hash: yes
percpu hash: yes
array: yes
percpu array: yes
stack_trace: yes
perf_event_array: yes
ringbuf: yes
Probe types
kprobe: yes
tracepoint: yes
perf_event: yes
kfunc: yes
kprobe_multi: no
raw_tp_special: yes
iter: yes
列出插桩点和函数参数 #
- bpftrace -l “tracepoint:*”: 显示指定 glob 模式的插桩点名称。
- bftrace -lv “tracepoint:syscalls:sys_enter_execve”: 显示 tracepoint/syscall/kfunc/uprobe 函数的参数列表。
对于用户函数(uprobe),-lv 使用 DWARF 数据来解析函数参数的,所以需要 ELF 包含 .debug_XX 符号表,或者安装对应的 debuginfo 包。对于 ubuntu,一般是 XX-dbgsym。
- kprobe 等不支持 -lv 查看参数。
# apt install bash-dbgsym
# bpftrace -lv 'uprobe:/bin/bash:readline'
uprobe:/bin/bash:readline
const char* prompt
# bpftrace -lv 'tracepoint:syscalls:sys_enter_write'
tracepoint:syscalls:sys_enter_write
int __syscall_nr
unsigned int fd
const char * buf
size_t count
跟踪内核函数 #
使用 -e 来指定 kprobe、syscall 和 tracepoint 等类型事件,打印 kstack:
# bpftrace -e 'kprobe:nf_conntrack_in {printf("%s\n", kstack); }'
nf_conntrack_in+1
nf_hook_slow+61
__ip_local_out+214
ip_local_out+23
ip_send_skb+21
udp_send_skb.isra.43+277
udp_sendmsg+1544
sock_sendmsg+48
___sys_sendmsg+688
__sys_sendmsg+99
do_syscall_64+85
entry_SYSCALL_64_after_hwframe+68
跟踪用户函数 #
bpftrace 不支持基于 DWARF 的用户栈展开,需要用户程序编译时生成 frame pointer。
需要提供 ELF 对应的符号表,可以是 ELF 中自带,或者安装的对应的 debuginfo 包来提供。 中需要包含符号表,
- 执行使用 bpftrace 执行程序;
root@lima-ebpf-dev:~# cat test.c
#include <stdio.h>
#include <unistd.h>
void func_d() {
int msec=1;
printf("%s","Hello world from D\n");
usleep(10000*msec);
}
void func_c() {
printf("%s","Hello from C\n");
func_d();
}
void func_b() {
printf("%s","Hello from B\n");
func_c();
}
void func_a() {
printf("%s","Hello from A\n");
func_b();
}
int main() {
func_a();
}
# 没有指定 -O 优化选项,所以开启 FP
root@lima-ebpf-dev:~# gcc test.c -o hello
# 确认 gcc 在函数调用的开头添加保存 FP 的指令。
root@lima-ebpf-dev:~# objdump -S hello |grep -A 4 func_c
000000000000119e <func_c>:
119e: f3 0f 1e fa endbr64
11a2: 55 push %rbp # 保存 FP
11a3: 48 89 e5 mov %rsp,%rbp
11a6: 48 8d 05 6a 0e 00 00 lea 0xe6a(%rip),%rax # 2017 <_IO_stdin_used+0x17>
--
11de: e8 bb ff ff ff call 119e <func_c>
11e3: 90 nop
11e4: 5d pop %rbp
11e5: c3 ret
# 打印调用 func_c 的 user call stack
root@lima-ebpf-dev:~# bpftrace -e 'uprobe:./hello:func_c {printf("%s", ustack)}' -c ./hello
Attaching 1 probe...
Hello from A
Hello from B
Hello from C
Hello world from D
func_c+0
func_a+33
main+18
__libc_start_call_main+128
使用 pid 追踪正在运行的程序(需要使用 FP 和包含符号表):
root@lima-ebpf-dev:~# apt install bash-dbgsym bash-static-dbgsym
root@lima-ebpf-dev:~# bpftrace -e 'uprobe:/usr/bin/bash:readline {printf("%s", ustack)}' # -p 12446
打印 ustack、kstack 时,可以指定参数,如 ustack(perf, 3)
, 其中 perf 表示栈的格式,3 表示用户空间栈层级.
[ku]stack([bpftrace|perf|raw]):https://github.com/bpftrace/bpftrace/issues/430#issuecomment-2580126066
# bpftrace -e 'uprobe:/cloud/my-agent:*doSaveNetworkLocateInfo {printf("%s\n", ustack(perf, 2));}'
12e0480 git.com/my-agent/pkg/storage.(*ProcessStore).doSaveNetworkLocateInfo+0 (/cloud/my-agent)
12e557d git.com/my-agent/pkg/network/processor/pidricher.(*pidEnricher).Process+1405 (/cloud/my-agent)
测量函数执行延迟 #
版本1:使用全局变量,有并发干扰问题
#!/usr/bin/bpftrace
uprobe:/usr/bin/dockerd:"github.com/docker/docker/api/server/router/network.(*networkRouter).getNetworksList" {
@start = nsecs;
}
uretprobe:/usr/bin/dockerd:"github.com/docker/docker/api/server/router/network.(*networkRouter).getNetworksList" {
printf("getNetworksList took %d ms\n", (nsecs - @start) / 1000000);
}
版本2: OK,使用 per thread 的变量
#!/usr/bin/bpftrace
uprobe:/usr/bin/dockerd:"github.com/docker/docker/api/server/router/network.(*networkRouter).getNetworksList" {
@start[tid] = nsecs;
}
uretprobe:/usr/bin/dockerd:"github.com/docker/docker/api/server/router/network.(*networkRouter).getNetworksList" {
if (@start[tid] != 0) {
printf("getNetworksList took %d ms\n", (nsecs - @start[tid]) / 1000000);
delete(@start[tid]);
}
}
跟踪容器进程 #
容器进程在独立的 mount ns 中,即根文件系统和 Host 是独立的,需要在 Host 上上到容器内使用的二进制和库文件,然后做符号解析和函数追踪。
在 Host 上查找容器内二进制文件,由两种方式:
- /proc/
/root - 查找容器 mergedDir
# ls /proc/102366/root/
apsara bin boot dev entrypoint.sh etc home lib lib64 lost+found media mnt nsenter opt proc root run sbin srv sys tmp usr var
# ls -l /proc/102366/exe
lrwxrwxrwx 1 root root 0 Jan 9 16:11 /proc/102366/exe -> /usr/bin/plugin.csi.cloud.com
# ls -l /proc/102366/root/usr/bin/plugin.csi.cloud.com
-rwxr-xr-x 1 root root 82455166 Jan 9 13:23 /proc/102366/root/usr/bin/plugin.csi.cloud.com
$ sudo docker inspect -f '{{.State.Pid}}' cilium-agent
109997
$ bpftrace -e 'uprobe:/proc/109997/root/usr/bin/cilium-agent:"github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerate" {printf("%s\n", ustack); }'
另外也可以根据容器的 .GraphDriver.Data.MergedDir 来在 Host 上查找容器内路径的二进制路径。例如,如果想跟踪 cilium-agent 进程(本身是用 docker 容器部署的),首先需要找到 cilium-agent 文件在宿主机上的绝对路径,可以通过 container ID 或 name 找:
- merged path 是容器使用的 overlay 根文件系统。
# Check cilium-agent container
$ docker ps | grep cilium-agent
0eb2e76384b3 cilium:test "/usr/bin/cilium-agent ..." 4 hours ago Up 4 hours cilium-agent
# Find the merged path for cilium-agent container
$ docker inspect --format "{{.GraphDriver.Data.MergedDir}}" 0eb2e76384b3
/var/lib/docker/overlay2/a17f868d/merged # a17f868d.. is shortened for better viewing
# The object file we are going to trace
$ ls -ahl /var/lib/docker/overlay2/a17f868d/merged/usr/bin/cilium-agent
-rwxr-xr-x 1 root root 86M /var/lib/docker/overlay2/a17f868d/merged/usr/bin/cilium-agent
然后再指定绝对路径 uprobe:go 函数需要包含完整路径 字符串, 如 “github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerate”
(node) $ bpftrace -e 'uprobe:/var/lib/docker/overlay2/a17f868d/merged/usr/bin/cilium-agent:"github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerate" {printf("%s\n", ustack); }'
Attaching 1 probe...
github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerate+0
github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).run.func1+363
sync.(*Once).doSlow+236
github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).run+101
runtime.goexit+1
使用 nm 或者 bptrace 命令来查看 go 二进制中可以 tracing 的符号(函数)列表:
$ nm cilium-agent
000000000427d1d0 B bufio.ErrBufferFull
000000000427d1e0 B bufio.ErrFinalToken
0000000001d3e940 T type..hash.github.com/cilium/cilium/pkg/k8s.ServiceID
0000000001f32300 T type..hash.github.com/cilium/cilium/pkg/node/types.Identity
0000000001d05620 T type..hash.github.com/cilium/cilium/pkg/policy/api.FQDNSelector
0000000001d05e80 T type..hash.github.com/cilium/cilium/pkg/policy.PortProto
...
# bpftrace -l 'uprobe:./exec:*'|tail
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.lookupInfoNFC
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.lookupInfoNFKC
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.nextCGJCompose
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.nextCGJDecompose
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.nextComposed
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.nextDecomposed
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.nextDone
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.nextHangul
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.nextMulti
uprobe:./exec:vendor/golang.org/x/text/unicode/norm.nextMultiNorm
判断是否是容器进程的依据是:NSpid 字段:
$ cat /proc/1229/status | grep NSpid
NSpid: 1229
$ cat /proc/11459/status | grep NSpid
NSpid: 11459 1
11459 是在宿主机的 pid ns 内的进程 ID,1 是在容器自己的 pid ns 的进程 ID
采样和火焰图 #
和 perf record 类似,可以周期采样整个系统或特定进程:
# bpftrace -v -e 'profile:hz:100 /pid == 1/ { @[ustack(1)] = count(); }'
对于 bpftrace 产生的 profiling 数据,可以使用 flamegraph 提供的转换工具进行可视化:
sudo bpftrace -e 'profile:hz:99 { @[kstack] = count(); }' > trace.data
cd FlameGraph
# 使用 stackcollapse-bpftrace.pl 工具进行转换
./stackcollapse-bpftrace.pl trace.data > trace.folded
./flamegraph.pl --inverted trace.folded > traceflamegraph.svg
join #
最多读取 16 个长度为 1024 的内容。
// https://github.com/iovisor/bpftrace/blob/0b3392baa881f501ce684637acbd4136f8a29ed3/src/bpftrace.h#L190C1-L191C37
unsigned int join_argnum_ = 16;
unsigned int join_argsize_ = 1024;
// https://github.com/iovisor/bpftrace/blob/0b3392baa881f501ce684637acbd4136f8a29ed3/src/bpftrace.cpp#L463C1-L478C4
else if (printf_id == asyncactionint(AsyncAction::join))
{
uint64_t join_id = (uint64_t) * (static_cast<uint64_t *>(data) + 1);
auto delim = bpftrace->resources.join_args[join_id].c_str();
std::stringstream joined;
for (unsigned int i = 0; i < bpftrace->join_argnum_; i++) {
auto *arg = arg_data + 2*sizeof(uint64_t) + i * bpftrace->join_argsize_;
if (arg[0] == 0)
break;
if (i)
joined << delim;
joined << arg;
}
bpftrace->out_->message(MessageType::join, joined.str());
return;
}
argN/sargN/reg/args/retval #
N 从 0 开始,表示函数第一个、第二个参数。
- arg0, arg1, …: Arguments to the traced function; assumed to be 64 bits wide
- 适用于:kprobes, uprobes, usdt
- sarg0, sarg1, …: Arguments to the traced function (for programs that store arguments on the stack); assumed to be 64 bits wide
- 适用于:kprobes, uprobes
如果函数参数不严格占用一个 64 bit(如 struct 而非 struct 指针),则该参数可能使用多个寄存器。但是内核函数惯例都是 struct 指针,所以基本上 argN 是对应第 N 个参数。
# bpftrace -e 'uprobe:/home/bgregg/func:main.add { printf("%d %d\n", arg0, arg1); }'
Attaching 1 probe...
42 13
# bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Attaching 1 probe...
opening: /proc/cpuinfo
opening: /proc/stat
opening: /proc/diskstats
opening: /proc/stat
opening: /proc/vmstat
[...]
reg(const string name):reg 是 bpftrace 内置函数,返回指定 name 的寄存器值,比如 amd64 的 ax/bx/cx/sp(不含 r 前缀);
- 适用于:kprobe、uprobe
# bpftrace -e 'uprobe:/home/bgregg/Lang/go/func:main*add { printf("%d %d\n", *(reg("sp") + 8), *(reg("sp") + 16)); }'
Attaching 1 probe...
42 13
args:The struct with all arguments of the traced function. Available in tracepoint
, kfunc
, and uprobe
(with DWARF) probes. Use args.x
to access argument x
or args
to get a record with all arguments.
- https://github.com/iovisor/bpftrace/commit/7e77f6896b1285a6b6eba044e16880c88faa2f44
- 内核函数 (tracepoint、kfunc)需要 BTF 支持。用户函数 uprobe 需要二进制有 DWARF 支持;
- args.
访问各名称参数,并支持 struct 类型的解引用;
root@lima-ebpf-dev:~# bpftrace -lv 'kfunc:vmlinux:__traceiter_net_dev_start_xmit'
kfunc:vmlinux:__traceiter_net_dev_start_xmit
void * __data
const struct sk_buff * skb
const struct net_device * dev
int retval
root@lima-ebpf-dev:~# bpftrace -e 'kfunc:vmlinux:__traceiter_net_dev_start_xmit {printf("%x\n", args.skb->protocol);}'
Attaching 1 probe...
retval: Value returned by the function being traced (kretprobe, uretprobe, fexit). For kretprobe and uretprobe, its type is uint64, but for fexit it depends. You can look up the type using bpftrace -lv
- 适用于 kretprobe, uretprobe, fexit
# bpftrace -e 'kretprobe:do_sys_open { printf("returned: %d\n", retval); }'
Attaching 1 probe...
returned: 8
returned: 21
returned: -2
returned: 21
[...]
打印 struct 字段 #
- 需要先导入 struct 的定义,将参数转换为 struct xx 指针,然后才能用 print() 来打印
bpftrace -v -e 'struct Foo { int m; int n; } uprobe:./testprogs/simple_struct:func { $f = *((struct Foo *) arg0); print($f); exit(); }'
bpftrace -v -e 'struct Foo { int m; int n; } u:./testprogs/simple_struct:func { @s = *((struct Foo *)arg0); exit(); }'
bpftrace -v -e 'struct Foo { struct { int m[1] } y; struct { int n } a; } u:./testprogs/simple_struct:func { @s = *((struct Foo *)arg0); exit(); }'
参考:
- https://github.com/bpftrace/bpftrace/issues/3036
- https://github.com/bpftrace/bpftrace/commit/059c25c1e4035a1e96adfcd1544c5587084a3d0f
- https://stackoverflow.com/questions/62515301/how-to-use-structure-in-bpftracing-scripting
- https://github.com/bpftrace/bpftrace/commit/ded5b31166219f498362672c2a81f8a5f83a522a
-
引入头文件中 struct 定义,然后就可以解析各字段;
bpftrace 可以读取系统的头文件,如下面的内核头文件。
# cat path.bt #include <linux/path.h> #include <linux/dcache.h> kprobe:vfs_open { printf("open path: %s\n", str(((struct path *)arg0)->dentry->d_name.name)); } # bpftrace path.bt Attaching 1 probe... open path: dev open path: if_inet6 open path: retrans_time_ms [...]
-
或者,如果内置内置 BTF,就可以不引入头文件,直接解析字段;
# bpftrace -e 'kprobe:vfs_open { printf("open path: %s\n", str(((struct path *)arg0)->dentry->d_name.name)); }' Attaching 1 probe... open path: cmdline open path: interrupts [...]
C 和 Go 函数传参差异 #
C/C++ 使用 AMD64 ABI
规范,使用寄存器为函数传参:rdi, rsi, rdx, rcx, r8, r9, stack, stack …
bpftrace 使用 arg0, arg1, arg2, …argN 来获取这些寄存器传参的值。
# bpftrace -e 'uprobe:/home/bgregg/func:main.add { printf("%d %d\n", arg0, arg1); }'
Attaching 1 probe...
42 13
go 1.17 以前版本,使用 stack 传参,
- https://github.com/bpftrace/bpftrace/issues/740 bpftrace 使用 sarg0, sarg1, sarg3, …sargN 来获得 stack 传参的值:
sarg0 == *(reg(“sp”) + 8) sarg1 == *(reg(“sp”) + 16)
# bpftrace -e 'uprobe:/home/bgregg/Lang/go/func:main*add { printf("%d %d\n", *(reg("sp") + 8), *(reg("sp") + 16)); }'
Attaching 1 probe...
42 13
go 1.17 以后版本,改为寄存器传参为主,stack 传参为辅助
(具体需要反汇编二进制来确定),但使用的寄存器顺序:rax, rbx, rcx, rdi, rsi, r8, r9, r10, r11, stack 和 C 的 ADM64 ABI
使用的顺序不一致,所以 bpftrace 的 argN 不适用于新的 golang 版本
。
- https://github.com/bpftrace/bpftrace/issues/2547
- https://go.googlesource.com/go/+/refs/heads/dev.regabi/src/cmd/compile/internal-abi.md#amd64-architecture
- Calling convention: https://mechpen.github.io/posts/2022-10-30-golang-bpf/
另外一个问题是,go 的一些类型,如 string,实际是地址+长度组成(uinptr + i64),对于这样的一个 go 类型参数,使用两个寄存器来传参,这时 argN 就不一定对应第 N 个参数了。
对于 C/Go 函数,如果参数类型是 struct 而非指针,则也会有上面的问题,argN 和函数的第 N 个参数不是一一对应了 !
# https://godbolt.org/z/67a4Yde5e
#include <stdint.h>
struct Foo {
uint64_t a;
uint64_t b;
};
void byval(Foo f);
void bar() {
Foo f = {
.a = 1,
.b = 2,
};
byval(f);
}
### 反汇编
bar():
mov edi, 1
mov esi, 2
jmp byval(Foo)
如果参数是一个地址或结构体,需要将该地址强转到对应的数据结构,才能正常解析。比如 golang 的 string 其实内部是一个 struct 定义,当函数参数是 string 类型时,需要使用如下方式解析:
struct GoString {
char * str;
int len;
};
uprobe:./string:main.join
{
$p1 = (struct GoString*) sarg0;
printf("arg1[%d]:%s\n", $p1->len, str($p1->str, $p1->len));
$p2 = (struct GoString*) sarg1;
printf("arg2[%d]:%s\n", $p2->len, str($p2->str, $p2->len));
}
所以,为了在 bpfstrace 中准确获取函数参数,最保险的办法是:反汇编函数然后看传参的方式和使用的寄存器或 stack 情况。
bpftrace 调试 Go 程序调试 #
go 不使用 .eh_frame,而使用 FP 或 .debug_frame,go build 生成的二进制默认包含 .debug_xx 调试符号表和 FP:
alizj@lima-dev2:/Users/alizj/go/src/git.com/my-agent$ GOOS=linux GOARCH=amd64 go build -o my-agent-amd64 ./cmd/
alizj@lima-dev2:/Users/alizj/go/src/git.com/my-agent$ file my-agent-amd64
my-agent-amd64: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=CwNfhKa02_E_0xJyBXTh/YtKaW6xl9F8fuPNB2FmQ/GTJ4i0gXCea-djzz28sm/8HQk6LYzhP01Evpb6hA5, with debug_info, not stripped
alizj@lima-dev2:/Users/alizj/go/src/git.com/my-agent$ go version
go version go1.21.1 linux/arm64
alizj@lima-dev2:/Users/alizj/go/src/git.com/my-agent$ readelf -S ./my-agent-amd64 |grep -E 'debug|eh'
[13] .debug_abbrev PROGBITS 0000000000000000 02e61000
[14] .debug_line PROGBITS 0000000000000000 02e61135
[15] .debug_frame PROGBITS 0000000000000000 0311f672
[16] .debug_gdb_s[...] PROGBITS 0000000000000000 031c7ed2
[17] .debug_info PROGBITS 0000000000000000 031c7eff
[18] .debug_loc PROGBITS 0000000000000000 0368e961
[19] .debug_ranges PROGBITS 0000000000000000 03a55bbf
go run 默认使用 “–ldflags ‘-s -w’”, 故删除了 symbol table(-s) 和 DWARF debug info(-w),不能用于调试。
为了更好的调试 go 程序,需要使用 go build -gcflags=all="-N -l"
命令。
使用 bpftrace -l 查看 go 二进制中的函数列表:
# bpftrace -l uprobe:myagentt/current/bin/my-agent/my-agent |grep SaveNet
uprobe:/cloud/my-agent:git.com/my-agent/pkg/storage.(*ProcessStore).SaveNetworkLocateInfo
uprobe:/cloud/my-agent:git.com/my-agent/pkg/storage.(*ProcessStore).doSaveNetworkLocateInfo
如果二进制包含 DWARF 的 .debug_xx 信息,对于高版本的 bpftrace 可以使用 -lv 参数查看 uprobe 函数参数列表:
- 二进制需要包含 DWARF 信息;
#readelf -S /tmp/my-agent |grep -E 'eh|debug'
[13] .debug_abbrev PROGBITS 0000000000000000 02e5b000
[14] .debug_line PROGBITS 0000000000000000 02e5b135
[15] .debug_frame PROGBITS 0000000000000000 03118b86
[16] .debug_gdb_script PROGBITS 0000000000000000 031c13e6
[17] .debug_info PROGBITS 0000000000000000 031c1413
[18] .debug_loc PROGBITS 0000000000000000 03687edb
[19] .debug_ranges PROGBITS 0000000000000000 03a4f139
#/tmp/bpftrace4 --version
/tmp/bpftrace4: stat /static-python: No such file or directory
bpftrace v0.20.4
#/tmp/bpftrace4 -lv 'uprobe:/tmp/my-agent:*SaveNetworkLocateInfo'
/tmp/bpftrace4: stat /static-python: No such file or directory
uprobe:/tmp/my-agent:git.com/my-agent/pkg/storage.(*ProcessStore).SaveNetworkLocateInfo
git.com/my-agent/pkg/storage.ProcessStore* ps
git.com/my-agent/pkg/storage.NetworkLocateInfo* n
error ~r0
uprobe:/tmp/my-agent:git.com/my-agent/pkg/storage.(*ProcessStore).doSaveNetworkLocateInfo
git.com/my-agent/pkg/storage.ProcessStore* ps
git.com/my-agent/pkg/storage.NetworkInfo* ni
struct string pid
bool isFailed
error ~r0
C++/Rust/Go 函数名 demangling #
bpftrace 也支持 C++、Rust、Go 函数名的 demangling:
- Add support for rust demangling #3688 :https://github.com/bpftrace/bpftrace/pull/3688/files
// Note that legacy rust programs use the same C++ mangling convention,
// and therefore will always start with _Z. This is fine, but they also
// include a symbol hash at the end which would normally be stripped
// off. If users have legacy rust programs, they can just use a
// wildcard match against the hash component, and the rust specific
// bits will match against the newer v0 rust mangling convention.
// The legacy mangling scheme for rust actually uses the C++
// demangler with an extra hash at the end. We use the same scheme,
// and users will need to explicitly wildcard against this hash.
// We may choose to parse the v0 mangled symbols defined by:
// https://rust-lang.github.io/rfcs/2603-rust-symbol-name-mangling-v0.html
//
// Or may vendor/link an alternate library to do so.
COMMAND ${CMAKE_COMMAND} -E env "RUSTFLAGS=-C symbol_mangling_version=v0" ${CARGO_EXECUTABLE} build --target-dir ${CMAKE_CURRENT_BINARY_DIR}
go 和 bpftrace 的兼容性问题 #
uprobe Go 函数名问题 #
bpftrace uprobe 函数名不支持特殊字符,如点号、括号等。
- 新的 bpftrace 版本支持点号,https://github.com/bpftrace/bpftrace/issues/548
- 但是最新版还是不支持括号
而 golang 的函数明一般包含完整的 go package 路径,如 git.com/my-agent/pkg/storage.(*ProcessStore).doSaveNetworkLocateInfo,这会导致 bpftrace 解析函数名称时报错:
- 如果函数名太长,也会报错,可以通过环境变量 BPFTRACE_MAX_STRLEN 来设置,最大值为 32k
- https://github.com/bpftrace/bpftrace/issues/3617
#bpftrace --version
bpftrace v0.11.2
#bpftrace -l uprobe:/cloud/my-agent |grep SaveNet
uprobe:myagentt/current/bin/my-agent/my-agent:git.com/my-agent/pkg/storage.(*ProcessStore).SaveNetworkLocateInfo
uprobe:myagentt/current/bin/my-agent/my-agent:git.com/my-agent/pkg/storage.(*ProcessStore).doSaveNetworkLocateInfo
#bpftrace -e 'uprobe:/cloud/my-agent:git.com/my-agent/pkg/storage.(*ProcessStore).SaveNetworkLocateInfo { @[ustack] = count(); }'
stdin:1:1-97: ERROR: syntax error, unexpected -, expecting {
uprobe:/cloud/my-agent:git.com/my-agent/pkg/storage.(*ProcessStore).SaveNetworkLocateInfo { @[ustack] = count(); }
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
解决办法:使用 1. 函数地址,2. 函数名路径 wildcard ,3. 函数名路径字符串
- bpftrace 的 uprobe 后面的函数名路径支持 wildcard,所以可以使用 * 来匹配或忽略特殊字符。
# OK
#bpftrace -e 'uprobe:/cloud/my-agent:*SaveNetworkLocateInfo { @[ustack] = count(); }'
# OK
#bpftrace -e 'uprobe:/cloud/my-agent:gitlab.comp*SaveNetworkLocateInfo { @[ustack] = count(); }'
Attaching 2 probes...
# OK
#bpftrace -e 'uprobe:/cloud/my-agent:gitlab.comp*SaveNetwork*Info { @[ustack] = count(); }'
# 为函数名路径添加字符串双引号,OK
#bpftrace -e 'uprobe:/cloud/my-agent:"git.com/my-agent/pkg/storage.(*ProcessStore).SaveNetworkLocateInfo" { @[ustack] = count(); }'
#nm -n /cloud/my-agent |grep SaveNetwork
00000000012e0220 T git.com/my-agent/pkg/storage.(*ProcessStore).SaveNetworkLocateInfo
00000000012e0480 T git.com/my-agent/pkg/storage.(*ProcessStore).doSaveNetworkLocateInfo
# 使用 地址 OK
# bpftrace -e 'uprobe:/cloud/my-agent:0x12e0220 { @[ustack] = count(); }'
uretprobe 不兼容 golang #
通过 uretprobe 检查 golang 方法的返回值可能存在风险。这是因为 uretprobe 是通过修改栈来加入探针的, 这和 golang 本身对栈的管理存在冲突的可能.
// example.go
func myprint(s string) {
fmt.Printf("Input: %s\n", s)
}
func main() {
ss := []string{"a", "b", "c"}
for _, s := range ss {
go myprint(s)
}
time.Sleep(1*time.Second)
}
bpftrace uretprobe 出错:
# bpftrace -e 'uretprobe:./test:main.myprint { @=count(); }' -c ./test
runtime: unexpected return pc for main.myprint called from 0x7fffffffe000
stack: frame={sp:0xc00008cf60, fp:0xc00008cfd0} stack=[0xc00008c000,0xc00008d000)
fatal error: unknown caller pc
虽然在 golang 程序中使用 uretprobe 是不安全的,但是好在 uprobe 还可以放心用。其实换个角度看,即便我们不使用 uretprobe,依然有办法获取返回时,比如我们可以通过在本方法 return 的时候或者在一个方法开始的时候设置一个 uprobe 来获取返回值。
参考:
bpftrace skb 解析 #
if the $ipheader->daddr is 192.168.2.44, just convert this four number to hex chars, which are c0 a8 02 2c. reversal chars are 2c 02 a8 c0. so you can just write: https://stackoverflow.com/questions/75172893/comparing-ip-addresses-in-bpftrace
root@lima-ebpf-dev:~# bpftrace -lv 'kfunc:ip_finish_output'
kfunc:vmlinux:ip_finish_output
struct net * net
struct sock * sk
struct sk_buff * skb
int retval
# bpftrace 可以读取系统的内核头文件
root@lima-ebpf-dev:~# cat /Users/zhangjun/skb.bt
#include <linux/skbuff.h>
#include <linux/icmp.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/in.h>
kfunc:ip_finish_output {
$skb = (struct sk_buff *)args.skb;
$dev = $skb->dev;
$name = $dev->name;
$ipheader = ((struct iphdr *) ($skb->head + $skb->network_header));
$version = ($ipheader->version) >>4;
if($ipheader->protocol == IPPROTO_ICMP) {
// get ICMP header; see skb_transport_header():
$icmph = (struct icmphdr *)($skb->head + $skb->transport_header);
if ($icmph->type == ICMP_ECHO) {
$id = $icmph->un.echo.id;
$seq = $icmph->un.echo.sequence;
printf("icmp: pid %d, comm: %s, [%d] %d\t%s > %s\n, id: %d, seq: %d, dev: %s\n", pid, comm, $version, $ipheader->protocol,
ntop($ipheader->saddr), ntop($ipheader->daddr), $id, $seq, $name);
}
}
}
root@lima-ebpf-dev:~# bpftrace /Users/zhangjun/skb.bt
Attaching 1 probe...
icmp: pid 206295, comm: ping, [0] 1 192.168.5.1 > 114.114.114.114
, id: 5376, seq: 48640, dev: eth0
icmp: pid 206295, comm: ping, [0] 1 192.168.5.1 > 114.114.114.114
, id: 5376, seq: 48896, dev: eth0
icmp: pid 206295, comm: ping, [0] 1 192.168.5.1 > 114.114.114.114
, id: 5376, seq: 49152, dev: eth0
另一个例子:https://lwn.net/Articles/793749/
There is an important capability missing from those one-liners: struct navigation
. Here is the
function prototype again: int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
bpftrace provides arg0-argN for kprobe function arguments, simply mapping them to the registers for
the calling convention (arg2 becomes %rdx on x86_64, for example). Since bpftrace can read kernel headers
, which are often installed on production systems, accessing struct data is possible by
including the right header and casting the arguments:
#include <net/sock.h>
[...]
$sk = (struct sock *)arg0;
Here’s an example of a bpftrace tool that prints the address information, size, and return value from tcp_sendmsg(). Example output:
# ./tcp_sendmsg.bt
Attaching 2 probes...
10.0.0.65 49978 -> 52.37.243.173 443 : 63 bytes, retval 63
127.0.0.1 58566 -> 127.0.0.1 22 : 36 bytes, retval 36
127.0.0.1 22 -> 127.0.0.1 58566: 36 bytes, retval 36
[...]
The source of tcp_sendmsg.bt:
#!/usr/local/bin/bpftrace
#include <net/sock.h>
k:tcp_sendmsg
{
@sk[tid] = arg0;
@size[tid] = arg2;
}
kr:tcp_sendmsg
/@sk[tid]/
{
$sk = (struct sock *)@sk[tid];
$size = @size[tid];
$af = $sk->__sk_common.skc_family;
if ($af == AF_INET) {
$daddr = ntop($af, $sk->__sk_common.skc_daddr);
$saddr = ntop($af, $sk->__sk_common.skc_rcv_saddr);
$lport = $sk->__sk_common.skc_num;
$dport = $sk->__sk_common.skc_dport;
$dport = ($dport >> 8) | (($dport << 8) & 0xff00);
printf("%-15s %-5d -> %-15s %-5d: %d bytes, retval %d\n",
$saddr, $lport, $daddr, $dport, $size, retval);
} else {
printf("IPv6...\n");
}
delete(@sk[tid]);
delete(@size[tid]);
}
In the kprobe, sk and size are saved in per-thread-ID maps, so they can be retrieved in the kretprobe when tcp_sendmsg() returns. The kretprobe casts sk and prints out details, if it is an IPv4 message, using the bpftrace function =ntop() to convert the address to a string=. The destination port is =flipped from network to host order=. To keep this short I skipped IPv6, but you can add code to handle it too (ntop() does support IPv6 addresses).
There is work underway for bpftrace to use BPF Type Format (BTF) information as well, which brings various advantages including struct definitions that are missing from kernel headers.