各发行版 BTF 支持情况:
- Supported Kernels and Distribution Versions
- CentOS 7 从 7.6.1810 的 3.10.0-957 开始支持 BPF, 但是一直不支持 BTF;
- CentOS 8 从 8.2.2004 的 4.18.0-193 开始支持 BPF 和 BTF;
- Ubuntu 20.10 的 5.8.0 开始同时支持 BPF 和 BTF;
对于支持 in-kernel BTF 的内核(5.2 开始,打开了特性 CONFIG_DEBUG_INFO_BIT),在 /sys/kernel/btf/vmlinux 输出BTF 的 raw data,可以使用 bpftool 工具从中提取出内核数据结构定义头文件, 包含全量的内核数据结构, 类型和函数前面,这样就不需要单独 include 内核的各头文件了.
$ bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
然后在 eBPF Kernel C 文件中只需要 include “vmlinux.h”,而不需要再单独 include 各内核头文件。
注意: 生成的 vmlinux.h 不包含 #define values,所以可能需要自己定义这些,或 include 内核的相关头文件。
- libbpf 的 bpf_helpers.h 提供了大量缺失的常用内容.
另外,如果要使用 libbpf 里的 helper func,还需要 include vmlinux.h 或者内核头文件 linux-libc=dev 包提供给的linux/types 文件,这样才能获得 u32,u64 之类的定义。
使用 kernel 的 debuginfo package 生成内核 btf 和 vmlinux.h 头文件
$ export version=vmlinux-4.19.91-007
# 安装 pahole 工具
$ apt install pahole
# 从 vmlinux-4.19.91-007 的 debuginfo 中提取和生成 BTF 文件
$ pahole --btf_encode_detached "${version}.btf" "${version}.vmlinux"
# 从生成的 BTF 文件导出单一的内核数据结构定义头文件 vmlinux.h
$ bpftool btf dump file ./${version}.btf format c > ${version}.h
BTF is not really a symbol table, rather a type information. Like simpler and more compact DWARF.
使用 bpftool 打印 vmlinux 中的 btf type id:
- 第一列,如 INT 为 kind, 如 FUNC 代表函数;
- 参考:https://www.kernel.org/doc/html/next/bpf/btf.html
- libbpf/cilium 都可以解析 raw 或 elf 格式的 vmlinux btf 文件:
- pahole 从内核 debuginfo 包生成的 vmlinux 文件,以及 in-kernel 的 /sys/kernel/btf/vmlinux 是 raw 格式。
- 解析 raw 格式:https://github.com/libbpf/libbpf/blob/master/src/btf.c#L1058
- 解析 elf 格式:https://github.com/libbpf/libbpf/blob/master/src/btf.c#L922
# vmlinux-4.19.91-007.btf 是 raw format 格式,非 elf 格式。
# file vmlinux-4.19.91-007.btf
vmlinux-4.19.91-007.btf: data
# xxd vmlinux-4.19.91-007.btf |head
00000000: 9feb 0100 1800 0000 0000 0000 9882 2100 ..............!.
00000010: 9882 2100 5cbd 1500 0100 0000 0000 0001 ..!.\...........
00000020: 0800 0000 4000 0000 0000 0000 0000 000a ....@...........
00000030: 0100 0000 0000 0000 0000 0009 0100 0000 ................
00000040: 0000 0000 0000 0003 0000 0000 0100 0000 ................
00000050: 1600 0000 0200 0000 1300 0000 0000 0001 ................
00000060: 0800 0000 4000 0000 0000 0000 0000 0002 ....@...........
00000070: 0900 0000 0000 0000 0000 000a 0600 0000 ................
00000080: 1c00 0000 0000 0001 0100 0000 0800 0000 ................
00000090: 0000 0000 0000 000a 0800 0000 2100 0000 ............!...
# bpftool btf dump file vmlinux-4.19.91-007.btf format raw|head
[1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
[2] CONST '(anon)' type_id=1
[3] VOLATILE '(anon)' type_id=1
[4] ARRAY '(anon)' type_id=1 index_type_id=22 nr_elems=2
[5] INT 'sizetype' size=8 bits_offset=0 nr_bits=64 encoding=(none)
[6] PTR '(anon)' type_id=9
[7] CONST '(anon)' type_id=6
[8] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
[9] CONST '(anon)' type_id=8
[10] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)
# bpftool btf dump file vmlinux-4.19.91-007.btf format raw|grep -i function |head
[7087] FUNC 'xen_call_function_single_interrupt' type_id=6734 linkage=static
[7088] FUNC 'xen_call_function_interrupt' type_id=6734 linkage=static
# bpftool btf dump file vmlinux-4.19.91-007.btf format raw|grep -i typedef |head
[12] TYPEDEF '__s8' type_id=13
[14] TYPEDEF '__u8' type_id=15
libbpf 和 cilium/bpf 传入的自定义 vmlinux 文件 只用于 CO-RE Relocation
,因为这个阶段是纯
libbpf/cilium 客户端程序来实现的。
struct bpf_object_open_opts {
...
// https://github.com/libbpf/libbpf/blob/master/src/libbpf.h#L136-L141
/* Path to the custom BTF to be used for BPF CO-RE relocations.
* This custom BTF completely replaces the use of vmlinux BTF
* for the purpose of CO-RE relocations.
* NOTE: any other BPF feature (e.g., fentry/fexit programs,
* struct_ops, etc) will need actual kernel BTF at /sys/kernel/btf/vmlinux.
*/
const char *btf_custom_path;
...
}
但是 tp_btf/fentry/fexit/lsm/struct_ops 等,还需要内核态的 eBPF verifier 来查找和验证被追踪target 的函数签名,eBPF verifier 是使用 in-kernel btf 内容中的 type kind & name 来匹配的,获得 btf_attach_id, 如果运行时内核没有内置 btf(5.5 版本开始), 则会 verify 失败,报错:libbpf: load bpf program failed: Invalid argument
- https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/
- in-kernel BTF 信息从 5.5 开始默认开启的
CONFIG_DEBUG_INFO_BTF
来生成.
// https://elixir.bootlin.com/linux/v5.19.14/source/kernel/bpf/verifier.c#L14563
// 内核必须开启 CONFIG_DEBUG_INFO_BTF 配置
struct btf *bpf_get_btf_vmlinux(void)
{
if (!btf_vmlinux && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
mutex_lock(&bpf_verifier_lock);
if (!btf_vmlinux)
btf_vmlinux = btf_parse_vmlinux();
mutex_unlock(&bpf_verifier_lock);
}
return btf_vmlinux;
}
- btf attarch id 指的是 tp_btf/<name>, lsm/<name> 等 <name> 对应的 target 的查找:
- tp_btf/<name>/:查找
btf_trace_<name> 的 typedefine
- lsm/<name>: 查找
btf_lsm_<name> 的 function
- iter/<name>: 查找
btf_iter_<name> 的 function
- tp_btf/<name>/:查找
- https://lore.kernel.org/lkml/[email protected]/
// https://github.com/libbpf/libbpf/blob/1728e3e4bef0e138ea95ffe62163eb9a6ac6fa32/src/libbpf.c#L12394
#define BTF_TRACE_PREFIX "btf_trace_" // SEC("tp_btf/<name>")
#define BTF_LSM_PREFIX "bpf_lsm_" // SEC("lsm/<name>")
#define BTF_ITER_PREFIX "bpf_iter_"
#define BTF_MAX_NAME_SIZE 128
void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type,
const char **prefix, int *kind)
{
switch (attach_type) {
case BPF_TRACE_RAW_TP:
*prefix = BTF_TRACE_PREFIX; // "btf_trace_"
*kind = BTF_KIND_TYPEDEF; // typedef 类型, 及 vmlinux 中用 typedef 定义的名为 btf_trace_<name> 的函数指针
break;
case BPF_LSM_MAC:
case BPF_LSM_CGROUP:
*prefix = BTF_LSM_PREFIX; // "btf_lsm_"
*kind = BTF_KIND_FUNC; // 函数类型
break;
case BPF_TRACE_ITER:
*prefix = BTF_ITER_PREFIX; // "btf_iter_"
*kind = BTF_KIND_FUNC; // 函数类型
break;
default:
*prefix = "";
*kind = BTF_KIND_FUNC; // 函数
}
}
int bpf_program__set_attach_target(struct bpf_program *prog,
int attach_prog_fd,
const char *attach_func_name)
{
int btf_obj_fd = 0, btf_id = 0, err;
if (!prog || attach_prog_fd < 0)
return libbpf_err(-EINVAL);
if (prog->obj->loaded)
return libbpf_err(-EINVAL);
if (attach_prog_fd && !attach_func_name) {
/* remember attach_prog_fd and let bpf_program__load() find
* BTF ID during the program load
*/
prog->attach_prog_fd = attach_prog_fd;
return 0;
}
if (attach_prog_fd) {
btf_id = libbpf_find_prog_btf_id(attach_func_name, // 从 prog 中获取 btf id
attach_prog_fd);
if (btf_id < 0)
return libbpf_err(btf_id);
} else {
if (!attach_func_name)
return libbpf_err(-EINVAL);
/* load btf_vmlinux, if not yet */
err = bpf_object__load_vmlinux_btf(prog->obj, true); // 在一些固定路径加载运行时内核的
// vmlinux 文件,不包含传入 vmlinux 文件
// 路径
if (err)
return libbpf_err(err);
err = find_kernel_btf_id(prog->obj, attach_func_name, // 从加载到的 kernel 中获取 btf id
prog->expected_attach_type,
&btf_obj_fd, &btf_id);
if (err)
return libbpf_err(err);
}
prog->attach_btf_id = btf_id;
prog->attach_btf_obj_fd = btf_obj_fd;
prog->attach_prog_fd = attach_prog_fd;
return 0;
}
// https://github.com/libbpf/libbpf/blob/1728e3e4bef0e138ea95ffe62163eb9a6ac6fa32/src/libbpf.c#L9224
static int find_kernel_btf_id(struct bpf_object *obj, const char *attach_name,
enum bpf_attach_type attach_type,
int *btf_obj_fd, int *btf_type_id)
{
int ret, i;
ret = find_attach_btf_id(obj->btf_vmlinux, attach_name, attach_type); // 根据 attach_type 查找 vmlinux
// 中的 attach_name 即 <name>
if (ret > 0) {
*btf_obj_fd = 0; /* vmlinux BTF */
*btf_type_id = ret; // 找到 <name> 对应的 btf type id
return 0;
}
if (ret != -ENOENT)
return ret;
ret = load_module_btfs(obj); // 从内核加载的 btf module 中查找 <name>, 需要 5.5 以后内核版本才支持
if (ret)
return ret;
for (i = 0; i < obj->btf_module_cnt; i++) {
const struct module_btf *mod = &obj->btf_modules[i];
ret = find_attach_btf_id(mod->btf, attach_name, attach_type);
if (ret > 0) {
*btf_obj_fd = mod->fd;
*btf_type_id = ret;
return 0;
}
if (ret == -ENOENT)
continue;
return ret;
}
return -ESRCH;
}
static inline int find_attach_btf_id(struct btf *btf, const char *name,
enum bpf_attach_type attach_type)
{
const char *prefix;
int kind;
btf_get_kernel_prefix_kind(attach_type, &prefix, &kind); // 设置 prefix 和 kind, prefix 为 btf_trace_
// 或 btf_lsm_ 或 btf_iter_ . kind 为
// BTF_KIND_TYPEDEF (tp_btf) 或 BTF_KIND_FUNC(lsm)
return find_btf_by_prefix_kind(btf, prefix, name, kind);
}
static int find_btf_by_prefix_kind(const struct btf *btf, const char *prefix,
const char *name, __u32 kind)
{
char btf_type_name[BTF_MAX_NAME_SIZE];
int ret;
ret = snprintf(btf_type_name, sizeof(btf_type_name), // 组装 btf_trace_<name>
"name", prefix, name);
/* snprintf returns the number of characters written excluding the
* terminating null. So, if >= BTF_MAX_NAME_SIZE are written, it
* indicates truncation.
*/
if (ret < 0 || ret >= sizeof(btf_type_name))
return -ENAMETOOLONG;
return btf__find_by_name_kind(btf, btf_type_name, kind); // btf_type_name 为
// btf_[trace|lsm|iterm]_<name> , kind 为
// BTF_KIND_TYPEDEF (tp_btf) 或
// BTF_KIND_FUNC(lsm 等)
}
// https://github.com/libbpf/libbpf/blob/1728e3e4bef0e138ea95ffe62163eb9a6ac6fa32/src/btf.c#L780
__s32 btf__find_by_name_kind(const struct btf *btf, const char *type_name,
__u32 kind)
{
return btf_find_by_name_kind(btf, 1, type_name, kind);
}
static __s32 btf_find_by_name_kind(const struct btf *btf, int start_id,
const char *type_name, __u32 kind)
{
__u32 i, nr_types = btf__type_cnt(btf); // 返回 btf 中的 type 数量
if (kind == BTF_KIND_UNKN || !strcmp(type_name, "void"))
return 0;
for (i = start_id; i < nr_types; i++) {
const struct btf_type *t = btf__type_by_id(btf, i);
const char *name;
if (btf_kind(t) != kind) // 比较 kind
continue;
name = btf__name_by_offset(btf, t->name_off); // 比较 name
if (name && !strcmp(type_name, name))
return i; // 匹配 kind 和 name 的 btf type index(id)
}
return libbpf_err(-ENOENT);
}
需要 in-kernel btf_attach_id 的场景:
- PROG TYPE 类型(运行时内核的 vmlinux):BPF_PROG_TYPE_STRUCT_OPS,BPF_PROG_TYPE_LSM,BPF_PROG_TYPE_TRACING;
// https://github.com/cilium/ebpf/blob/main/link/tracing.go
type TracingOptions struct {
// Program must be of type Tracing with attach type
// AttachTraceFEntry/AttachTraceFExit/AttachModifyReturn or
// AttachTraceRawTp.
Program *ebpf.Program
// Program attach type. Can be one of:
// - AttachTraceFEntry
// - AttachTraceFExit
// - AttachModifyReturn
// - AttachTraceRawTp
// This field is optional.
AttachType ebpf.AttachType
// Arbitrary value that can be fetched from an eBPF program
// via `bpf_get_attach_cookie()`.
Cookie uint64
}
type LSMOptions struct {
// Program must be of type LSM with attach type
// AttachLSMMac.
Program *ebpf.Program
// Arbitrary value that can be fetched from an eBPF program
// via `bpf_get_attach_cookie()`.
Cookie uint64
}
// attachBTFID links all BPF program types (Tracing/LSM) that they attach to a btf_id.
func attachBTFID(program *ebpf.Program, at ebpf.AttachType, cookie uint64) (Link, error) {
if program.FD() < 0 {
return nil, fmt.Errorf("invalid program %w", sys.ErrClosedFd)
}
var (
fd *sys.FD
err error
)
switch at {
case ebpf.AttachTraceFEntry, ebpf.AttachTraceFExit, ebpf.AttachTraceRawTp,
ebpf.AttachModifyReturn, ebpf.AttachLSMMac:
// Attach via BPF link
fd, err = sys.LinkCreateTracing(&sys.LinkCreateTracingAttr{
ProgFd: uint32(program.FD()),
AttachType: sys.AttachType(at),
Cookie: cookie,
})
if err == nil {
break
}
if !errors.Is(err, unix.EINVAL) && !errors.Is(err, sys.ENOTSUPP) {
return nil, fmt.Errorf("create tracing link: %w", err)
}
fallthrough
case ebpf.AttachNone:
// Attach via RawTracepointOpen
if cookie > 0 {
return nil, fmt.Errorf("create raw tracepoint with cookie: %w", ErrNotSupported)
}
fd, err = sys.RawTracepointOpen(&sys.RawTracepointOpenAttr{
ProgFd: uint32(program.FD()),
})
if errors.Is(err, sys.ENOTSUPP) {
// This may be returned by bpf_tracing_prog_attach via bpf_arch_text_poke.
return nil, fmt.Errorf("create raw tracepoint: %w", ErrNotSupported)
}
if err != nil {
return nil, fmt.Errorf("create raw tracepoint: %w", err)
}
default:
return nil, fmt.Errorf("invalid attach type: %s", at.String())
}
raw := RawLink{fd: fd}
info, err := raw.Info()
if err != nil {
raw.Close()
return nil, err
}
if info.Type == RawTracepointType {
// Sadness upon sadness: a Tracing program with AttachRawTp returns
// a raw_tracepoint link. Other types return a tracing link.
return &rawTracepoint{raw}, nil
}
return &tracing{raw}, nil
}
// AttachTracing links a tracing (fentry/fexit/fmod_ret) BPF program or
// a BTF-powered raw tracepoint (tp_btf) BPF Program to a BPF hook defined
// in kernel modules.
func AttachTracing(opts TracingOptions) (Link, error) {
if t := opts.Program.Type(); t != ebpf.Tracing {
return nil, fmt.Errorf("invalid program type %s, expected Tracing", t)
}
switch opts.AttachType {
case ebpf.AttachTraceFEntry, ebpf.AttachTraceFExit, ebpf.AttachModifyReturn,
ebpf.AttachTraceRawTp, ebpf.AttachNone:
default:
return nil, fmt.Errorf("invalid attach type: %s", opts.AttachType.String())
}
return attachBTFID(opts.Program, opts.AttachType, opts.Cookie)
}
// AttachLSM links a Linux security module (LSM) BPF Program to a BPF
// hook defined in kernel modules.
func AttachLSM(opts LSMOptions) (Link, error) {
if t := opts.Program.Type(); t != ebpf.LSM {
return nil, fmt.Errorf("invalid program type %s, expected LSM", t)
}
return attachBTFID(opts.Program, ebpf.AttachLSMMac, opts.Cookie)
}
Right. Libbpf only supports a newer and safer way to attach to kprobes. For your experiments, try to stick to tracepoints and you’ll have a better time.
But it’s another thing I’ve been meaning to add to libbpf for supporting older kernels. I even have code written to do legacy kprobe attachment, just need to find time to send a patch to add it as a fallback for kernels that don’t support new kprobe interface.