跳过正文

创建 ebpf btf 和 vmlinux.h

·2544 字
Ebpf

各发行版 BTF 支持情况:

  • Supported Kernels and Distribution Versions
  • CentOS 7 从 7.6.1810 的 3.10.0-957 开始支持 BPF, 但是一直不支持 BTF;
  • CentOS 8 从 8.2.2004 的 4.18.0-193 开始支持 BPF 和 BTF;
  • Ubuntu 20.10 的 5.8.0 开始同时支持 BPF 和 BTF;

对于支持 in-kernel BTF 的内核(5.2 开始,打开了特性 CONFIG_DEBUG_INFO_BIT),在 /sys/kernel/btf/vmlinux 输出BTF 的 raw data,可以使用 bpftool 工具从中提取出内核数据结构定义头文件, 包含全量的内核数据结构, 类型和函数前面,这样就不需要单独 include 内核的各头文件了.

$ bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h

然后在 eBPF Kernel C 文件中只需要 include “vmlinux.h”,而不需要再单独 include 各内核头文件。

注意: 生成的 vmlinux.h 不包含 #define values,所以可能需要自己定义这些,或 include 内核的相关头文件。

另外,如果要使用 libbpf 里的 helper func,还需要 include vmlinux.h 或者内核头文件 linux-libc=dev 包提供给的linux/types 文件,这样才能获得 u32,u64 之类的定义。

使用 kernel 的 debuginfo package 生成内核 btf 和 vmlinux.h 头文件

$ export version=vmlinux-4.19.91-007

# 安装 pahole 工具
$ apt install  pahole

# 从 vmlinux-4.19.91-007 的 debuginfo 中提取和生成 BTF 文件
$ pahole --btf_encode_detached "${version}.btf" "${version}.vmlinux"

# 从生成的 BTF 文件导出单一的内核数据结构定义头文件 vmlinux.h
$ bpftool btf dump file ./${version}.btf format c > ${version}.h

BTF is not really a symbol table, rather a type information. Like simpler and more compact DWARF.

使用 bpftool 打印 vmlinux 中的 btf type id:

  1. 第一列,如 INT 为 kind, 如 FUNC 代表函数;
  2. 参考:https://www.kernel.org/doc/html/next/bpf/btf.html
  3. libbpf/cilium 都可以解析 raw 或 elf 格式的 vmlinux btf 文件:
#  vmlinux-4.19.91-007.btf 是 raw format 格式,非 elf 格式。

# file vmlinux-4.19.91-007.btf
vmlinux-4.19.91-007.btf: data

# xxd vmlinux-4.19.91-007.btf |head
00000000: 9feb 0100 1800 0000 0000 0000 9882 2100  ..............!.
00000010: 9882 2100 5cbd 1500 0100 0000 0000 0001  ..!.\...........
00000020: 0800 0000 4000 0000 0000 0000 0000 000a  ....@...........
00000030: 0100 0000 0000 0000 0000 0009 0100 0000  ................
00000040: 0000 0000 0000 0003 0000 0000 0100 0000  ................
00000050: 1600 0000 0200 0000 1300 0000 0000 0001  ................
00000060: 0800 0000 4000 0000 0000 0000 0000 0002  ....@...........
00000070: 0900 0000 0000 0000 0000 000a 0600 0000  ................
00000080: 1c00 0000 0000 0001 0100 0000 0800 0000  ................
00000090: 0000 0000 0000 000a 0800 0000 2100 0000  ............!...

# bpftool btf dump file vmlinux-4.19.91-007.btf format raw|head
[1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
[2] CONST '(anon)' type_id=1
[3] VOLATILE '(anon)' type_id=1
[4] ARRAY '(anon)' type_id=1 index_type_id=22 nr_elems=2
[5] INT 'sizetype' size=8 bits_offset=0 nr_bits=64 encoding=(none)
[6] PTR '(anon)' type_id=9
[7] CONST '(anon)' type_id=6
[8] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
[9] CONST '(anon)' type_id=8
[10] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)

# bpftool btf dump file vmlinux-4.19.91-007.btf format raw|grep -i function |head
[7087] FUNC 'xen_call_function_single_interrupt' type_id=6734 linkage=static
[7088] FUNC 'xen_call_function_interrupt' type_id=6734 linkage=static

# bpftool btf dump file vmlinux-4.19.91-007.btf format raw|grep -i typedef |head
[12] TYPEDEF '__s8' type_id=13
[14] TYPEDEF '__u8' type_id=15

libbpf 和 cilium/bpf 传入的自定义 vmlinux 文件 只用于 CO-RE Relocation ,因为这个阶段是纯 libbpf/cilium 客户端程序来实现的。

struct bpf_object_open_opts {
...
//  https://github.com/libbpf/libbpf/blob/master/src/libbpf.h#L136-L141
	/* Path to the custom BTF to be used for BPF CO-RE relocations.
	 * This custom BTF completely replaces the use of vmlinux BTF
	 * for the purpose of CO-RE relocations.
	 * NOTE: any other BPF feature (e.g., fentry/fexit programs,
	 * struct_ops, etc) will need actual kernel BTF at /sys/kernel/btf/vmlinux.
	 */
	const char *btf_custom_path;
...
}

但是 tp_btf/fentry/fexit/lsm/struct_ops 等,还需要内核态的 eBPF verifier 来查找和验证被追踪target 的函数签名,eBPF verifier 是使用 in-kernel btf 内容中的 type kind & name 来匹配的,获得 btf_attach_id, 如果运行时内核没有内置 btf(5.5 版本开始), 则会 verify 失败,报错:libbpf: load bpf program failed: Invalid argument

// https://elixir.bootlin.com/linux/v5.19.14/source/kernel/bpf/verifier.c#L14563

// 内核必须开启 CONFIG_DEBUG_INFO_BTF 配置
struct btf *bpf_get_btf_vmlinux(void)
{
	if (!btf_vmlinux && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
		mutex_lock(&bpf_verifier_lock);
		if (!btf_vmlinux)
			btf_vmlinux = btf_parse_vmlinux();
		mutex_unlock(&bpf_verifier_lock);
	}
	return btf_vmlinux;
}
  • btf attarch id 指的是 tp_btf/<name>, lsm/<name> 等 <name> 对应的 target 的查找:
    • tp_btf/<name>/:查找 btf_trace_<name> 的 typedefine
    • lsm/<name>: 查找 btf_lsm_<name> 的 function
    • iter/<name>: 查找 btf_iter_<name> 的 function
  • https://lore.kernel.org/lkml/[email protected]/
// https://github.com/libbpf/libbpf/blob/1728e3e4bef0e138ea95ffe62163eb9a6ac6fa32/src/libbpf.c#L12394

#define BTF_TRACE_PREFIX "btf_trace_" // SEC("tp_btf/<name>")
#define BTF_LSM_PREFIX "bpf_lsm_"  // SEC("lsm/<name>")
#define BTF_ITER_PREFIX "bpf_iter_"
#define BTF_MAX_NAME_SIZE 128


void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type,
				const char **prefix, int *kind)
{
	switch (attach_type) {
	case BPF_TRACE_RAW_TP:
		*prefix = BTF_TRACE_PREFIX; // "btf_trace_"
		*kind = BTF_KIND_TYPEDEF; // typedef 类型, 及 vmlinux 中用 typedef 定义的名为 btf_trace_<name> 的函数指针
		break;
	case BPF_LSM_MAC:
	case BPF_LSM_CGROUP:
		*prefix = BTF_LSM_PREFIX; // "btf_lsm_"
		*kind = BTF_KIND_FUNC;   // 函数类型
		break;
	case BPF_TRACE_ITER:
		*prefix = BTF_ITER_PREFIX; // "btf_iter_"
		*kind = BTF_KIND_FUNC;    // 函数类型
		break;
	default:
		*prefix = "";
		*kind = BTF_KIND_FUNC; // 函数
	}
}



int bpf_program__set_attach_target(struct bpf_program *prog,
				   int attach_prog_fd,
				   const char *attach_func_name)
{
	int btf_obj_fd = 0, btf_id = 0, err;

	if (!prog || attach_prog_fd < 0)
		return libbpf_err(-EINVAL);

	if (prog->obj->loaded)
		return libbpf_err(-EINVAL);

	if (attach_prog_fd && !attach_func_name) {
		/* remember attach_prog_fd and let bpf_program__load() find
		 * BTF ID during the program load
		 */
		prog->attach_prog_fd = attach_prog_fd;
		return 0;
	}

	if (attach_prog_fd) {
		btf_id = libbpf_find_prog_btf_id(attach_func_name, // 从 prog 中获取 btf id
						 attach_prog_fd);
		if (btf_id < 0)
			return libbpf_err(btf_id);
	} else {
		if (!attach_func_name)
			return libbpf_err(-EINVAL);

		/* load btf_vmlinux, if not yet */
		err = bpf_object__load_vmlinux_btf(prog->obj, true);  // 在一些固定路径加载运行时内核的
								      // vmlinux 文件,不包含传入 vmlinux 文件
								      // 路径
		if (err)
			return libbpf_err(err);
		err = find_kernel_btf_id(prog->obj, attach_func_name, // 从加载到的 kernel 中获取 btf id
					 prog->expected_attach_type,
					 &btf_obj_fd, &btf_id);
		if (err)
			return libbpf_err(err);
	}

	prog->attach_btf_id = btf_id;
	prog->attach_btf_obj_fd = btf_obj_fd;
	prog->attach_prog_fd = attach_prog_fd;
	return 0;
}

// https://github.com/libbpf/libbpf/blob/1728e3e4bef0e138ea95ffe62163eb9a6ac6fa32/src/libbpf.c#L9224
static int find_kernel_btf_id(struct bpf_object *obj, const char *attach_name,
			      enum bpf_attach_type attach_type,
			      int *btf_obj_fd, int *btf_type_id)
{
	int ret, i;

	ret = find_attach_btf_id(obj->btf_vmlinux, attach_name, attach_type); // 根据 attach_type 查找 vmlinux
									      // 中的 attach_name 即 <name>
	if (ret > 0) {
		*btf_obj_fd = 0; /* vmlinux BTF */
		*btf_type_id = ret;  // 找到 <name> 对应的 btf type id
		return 0;
	}
	if (ret != -ENOENT)
		return ret;

	ret = load_module_btfs(obj); // 从内核加载的 btf module 中查找 <name>, 需要 5.5 以后内核版本才支持
	if (ret)
		return ret;

	for (i = 0; i < obj->btf_module_cnt; i++) {
		const struct module_btf *mod = &obj->btf_modules[i];

		ret = find_attach_btf_id(mod->btf, attach_name, attach_type);
		if (ret > 0) {
			*btf_obj_fd = mod->fd;
			*btf_type_id = ret;
			return 0;
		}
		if (ret == -ENOENT)
			continue;

		return ret;
	}

	return -ESRCH;
}


static inline int find_attach_btf_id(struct btf *btf, const char *name,
				     enum bpf_attach_type attach_type)
{
	const char *prefix;
	int kind;

	btf_get_kernel_prefix_kind(attach_type, &prefix, &kind); // 设置 prefix 和 kind, prefix 为 btf_trace_
								 // 或 btf_lsm_ 或 btf_iter_ . kind 为
								 // BTF_KIND_TYPEDEF (tp_btf) 或 BTF_KIND_FUNC(lsm)
	return find_btf_by_prefix_kind(btf, prefix, name, kind);
}


static int find_btf_by_prefix_kind(const struct btf *btf, const char *prefix,
				   const char *name, __u32 kind)
{
	char btf_type_name[BTF_MAX_NAME_SIZE];
	int ret;

	ret = snprintf(btf_type_name, sizeof(btf_type_name), // 组装 btf_trace_<name>
		       "name", prefix, name);
	/* snprintf returns the number of characters written excluding the
	 * terminating null. So, if >= BTF_MAX_NAME_SIZE are written, it
	 * indicates truncation.
	 */
	if (ret < 0 || ret >= sizeof(btf_type_name))
		return -ENAMETOOLONG;
	return btf__find_by_name_kind(btf, btf_type_name, kind); // btf_type_name 为
								 // btf_[trace|lsm|iterm]_<name> , kind 为
								 // BTF_KIND_TYPEDEF (tp_btf) 或
								 // BTF_KIND_FUNC(lsm 等)
}

// https://github.com/libbpf/libbpf/blob/1728e3e4bef0e138ea95ffe62163eb9a6ac6fa32/src/btf.c#L780
__s32 btf__find_by_name_kind(const struct btf *btf, const char *type_name,
			     __u32 kind)
{
	return btf_find_by_name_kind(btf, 1, type_name, kind);
}


static __s32 btf_find_by_name_kind(const struct btf *btf, int start_id,
				   const char *type_name, __u32 kind)
{
	__u32 i, nr_types = btf__type_cnt(btf); // 返回 btf 中的 type 数量

	if (kind == BTF_KIND_UNKN || !strcmp(type_name, "void"))
		return 0;

	for (i = start_id; i < nr_types; i++) {
		const struct btf_type *t = btf__type_by_id(btf, i);
		const char *name;

		if (btf_kind(t) != kind) // 比较 kind
			continue;
		name = btf__name_by_offset(btf, t->name_off); // 比较 name
		if (name && !strcmp(type_name, name))
			return i; // 匹配 kind 和 name 的 btf type index(id)
	}
	return libbpf_err(-ENOENT);
}

需要 in-kernel btf_attach_id 的场景:

  1. PROG TYPE 类型(运行时内核的 vmlinux):BPF_PROG_TYPE_STRUCT_OPS,BPF_PROG_TYPE_LSM,BPF_PROG_TYPE_TRACING;
// https://github.com/cilium/ebpf/blob/main/link/tracing.go
type TracingOptions struct {
	// Program must be of type Tracing with attach type
	// AttachTraceFEntry/AttachTraceFExit/AttachModifyReturn or
	// AttachTraceRawTp.
	Program *ebpf.Program
	// Program attach type. Can be one of:
	// 	- AttachTraceFEntry
	// 	- AttachTraceFExit
	// 	- AttachModifyReturn
	// 	- AttachTraceRawTp
	// This field is optional.
	AttachType ebpf.AttachType
	// Arbitrary value that can be fetched from an eBPF program
	// via `bpf_get_attach_cookie()`.
	Cookie uint64
}

type LSMOptions struct {
	// Program must be of type LSM with attach type
	// AttachLSMMac.
	Program *ebpf.Program
	// Arbitrary value that can be fetched from an eBPF program
	// via `bpf_get_attach_cookie()`.
	Cookie uint64
}

// attachBTFID links all BPF program types (Tracing/LSM) that they attach to a btf_id.
func attachBTFID(program *ebpf.Program, at ebpf.AttachType, cookie uint64) (Link, error) {
	if program.FD() < 0 {
		return nil, fmt.Errorf("invalid program %w", sys.ErrClosedFd)
	}

	var (
		fd  *sys.FD
		err error
	)
	switch at {
	case ebpf.AttachTraceFEntry, ebpf.AttachTraceFExit, ebpf.AttachTraceRawTp,
		ebpf.AttachModifyReturn, ebpf.AttachLSMMac:
		// Attach via BPF link
		fd, err = sys.LinkCreateTracing(&sys.LinkCreateTracingAttr{
			ProgFd:     uint32(program.FD()),
			AttachType: sys.AttachType(at),
			Cookie:     cookie,
		})
		if err == nil {
			break
		}
		if !errors.Is(err, unix.EINVAL) && !errors.Is(err, sys.ENOTSUPP) {
			return nil, fmt.Errorf("create tracing link: %w", err)
		}
		fallthrough
	case ebpf.AttachNone:
		// Attach via RawTracepointOpen
		if cookie > 0 {
			return nil, fmt.Errorf("create raw tracepoint with cookie: %w", ErrNotSupported)
		}

		fd, err = sys.RawTracepointOpen(&sys.RawTracepointOpenAttr{
			ProgFd: uint32(program.FD()),
		})
		if errors.Is(err, sys.ENOTSUPP) {
			// This may be returned by bpf_tracing_prog_attach via bpf_arch_text_poke.
			return nil, fmt.Errorf("create raw tracepoint: %w", ErrNotSupported)
		}
		if err != nil {
			return nil, fmt.Errorf("create raw tracepoint: %w", err)
		}
	default:
		return nil, fmt.Errorf("invalid attach type: %s", at.String())
	}

	raw := RawLink{fd: fd}
	info, err := raw.Info()
	if err != nil {
		raw.Close()
		return nil, err
	}

	if info.Type == RawTracepointType {
		// Sadness upon sadness: a Tracing program with AttachRawTp returns
		// a raw_tracepoint link. Other types return a tracing link.
		return &rawTracepoint{raw}, nil
	}
	return &tracing{raw}, nil
}

// AttachTracing links a tracing (fentry/fexit/fmod_ret) BPF program or
// a BTF-powered raw tracepoint (tp_btf) BPF Program to a BPF hook defined
// in kernel modules.
func AttachTracing(opts TracingOptions) (Link, error) {
	if t := opts.Program.Type(); t != ebpf.Tracing {
		return nil, fmt.Errorf("invalid program type %s, expected Tracing", t)
	}

	switch opts.AttachType {
	case ebpf.AttachTraceFEntry, ebpf.AttachTraceFExit, ebpf.AttachModifyReturn,
		ebpf.AttachTraceRawTp, ebpf.AttachNone:
	default:
		return nil, fmt.Errorf("invalid attach type: %s", opts.AttachType.String())
	}

	return attachBTFID(opts.Program, opts.AttachType, opts.Cookie)
}

// AttachLSM links a Linux security module (LSM) BPF Program to a BPF
// hook defined in kernel modules.
func AttachLSM(opts LSMOptions) (Link, error) {
	if t := opts.Program.Type(); t != ebpf.LSM {
		return nil, fmt.Errorf("invalid program type %s, expected LSM", t)
	}

	return attachBTFID(opts.Program, ebpf.AttachLSMMac, opts.Cookie)
}

Right. Libbpf only supports a newer and safer way to attach to kprobes. For your experiments, try to stick to tracepoints and you’ll have a better time.

But it’s another thing I’ve been meaning to add to libbpf for supporting older kernels. I even have code written to do legacy kprobe attachment, just need to find time to send a patch to add it as a fallback for kernels that don’t support new kprobe interface.

相关文章

cilium/ebpf
·3283 字
Ebpf
广泛使用的 cilium/ebpf go 库分析,涵盖了 Go 开发 eBPF 程序的各方面内容。
eBPF 常见错误
·9722 字
Ebpf
总结了 eBPF 开发过程中常见的报错和兼容性问题。
搭建 eBPF 开发环境
·395 字
Ebpf
在 MacOS 下使用高性能、轻量级 lima vm 来搭建 eBPF 开发环境。
eBPF libbpf 库解析
·11325 字
Ebpf
libbpf 库解析,涉及宏定义、内存读写等。