Hero image for eBPF Fundamentals: Building Production-Grade Kernel Observability Without Recompiling Linux

eBPF Fundamentals: Building Production-Grade Kernel Observability Without Recompiling Linux


You need to trace network packets at the kernel level to debug a production latency issue, but recompiling the kernel isn’t an option. Traditional approaches like kernel modules require restarts and risk crashing your system with a single pointer dereference gone wrong. You’ve probably seen the blog posts claiming eBPF is “revolutionary” or a “game-changer,” but those marketing terms don’t help when you’re staring at P99 latencies spiking in production.

Here’s what actually matters: eBPF lets you inject custom programs into the Linux kernel that execute on specific events—packet arrivals, system calls, function entries—without modifying kernel source code or loading risky modules. The kernel runs your code in a sandboxed virtual machine with strict safety guarantees enforced by a verifier that analyzes every instruction before execution. If your program tries to dereference an invalid pointer or create an infinite loop, the verifier rejects it before it ever runs. This is the fundamental difference between eBPF and kernel modules: your observability code cannot crash the kernel.

The trade-off is that you work within constraints. The verifier enforces bounded loops, limited stack space, and restricted memory access patterns. You write programs in restricted C, compile to eBPF bytecode, and the kernel JIT-compiles it to native instructions for near-zero overhead execution. Understanding these constraints—and the architecture that enforces them—is what separates proof-of-concept eBPF scripts from production-grade instrumentation that runs safely across kernel versions.

Before diving into verifier mechanics and CO-RE portability, you need to understand what eBPF actually is beyond the marketing hype, starting with its architecture and how it evolved from simple packet filtering into a general-purpose kernel programmability layer.

What eBPF Actually Is: Beyond the Marketing Hype

eBPF (extended Berkeley Packet Filter) is a sandboxed virtual machine built directly into the Linux kernel. It lets you run user-written programs in kernel space without modifying kernel source code or loading kernel modules. Think of it as a safe, verified execution environment where you can observe and interact with kernel events—system calls, network packets, block I/O operations—with near-zero overhead.

Visual: eBPF architecture showing program flow from user space through verifier to kernel execution

From Packet Filtering to Programmable Kernel Instrumentation

The original BPF was designed in 1992 as an efficient packet filtering mechanism for tcpdump. It provided a simple instruction set to match network packets in kernel space, avoiding expensive context switches to user space for every packet. Extended BPF, introduced in 2014, transformed this into a general-purpose kernel programming interface. The instruction set expanded from 2 registers to 11, gained support for maps (shared kernel-user memory), and added helpers for safely accessing kernel data structures.

This evolution matters because eBPF programs can now attach to kprobes (kernel function entry/exit), tracepoints (stable kernel instrumentation points), and perf events—giving you visibility into everything from CPU scheduling decisions to filesystem operations without recompiling your kernel.

The Verifier: Why eBPF Is Production-Safe

Every eBPF program passes through the verifier before execution. This static analysis engine examines your bytecode instruction-by-instruction, ensuring it meets strict safety guarantees: no unbounded loops, no out-of-bounds memory access, no null pointer dereferences, and guaranteed termination. The verifier rejects programs that might crash the kernel or create security vulnerabilities.

This is fundamentally different from kernel modules, which run with full kernel privileges and no safety net. A buggy kernel module can panic your system, corrupt memory, or expose security holes. A buggy eBPF program simply gets rejected at load time. This verification step is why you can deploy eBPF instrumentation to production systems without the operational risk traditionally associated with kernel-level debugging.

Sandboxed Execution Without Kernel Recompilation

When an eBPF program passes verification, it gets JIT-compiled to native machine code and attached to its target hook point. The program runs in a restricted context with controlled access to kernel memory through helper functions. Need to read a network packet header? There’s a helper. Want to write to a trace buffer? Another helper. Direct pointer arithmetic into arbitrary kernel memory? Forbidden.

This sandbox model gives you the observability depth of kernel-level instrumentation with the safety profile approaching user-space code. You gain the ability to answer questions like “which process is generating these disk writes” or “what’s the tail latency distribution of this system call” without deploying a custom kernel build or risking system stability.

Now that you understand eBPF’s architecture and safety model, let’s write an actual program to see these concepts in practice.

Your First eBPF Program: Tracing System Calls

The fastest way to understand eBPF is to write a program that solves a real debugging problem: tracking which processes are opening files on your system. This is exactly the kind of visibility that traditionally required kernel modules or system call tracing tools with significant overhead.

Writing the eBPF Program

Create a minimal eBPF program that hooks the openat() syscall, the modern replacement for open() that nearly all file operations use:

trace_openat.bpf.c
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
SEC("tracepoint/syscalls/sys_enter_openat")
int trace_openat(struct trace_event_raw_sys_enter *ctx)
{
char filename[256];
const char __user *filename_ptr;
filename_ptr = (const char __user *)ctx->args[1];
bpf_probe_read_user_str(filename, sizeof(filename), filename_ptr);
bpf_printk("openat: %s\n", filename);
return 0;
}
char LICENSE[] SEC("license") = "GPL";

This 15-line program demonstrates eBPF’s core capabilities. The SEC() macro tells the loader where to attach this function—in this case, to the kernel tracepoint that fires whenever a process calls openat(). The verifier inspects every instruction to ensure safety before allowing this code into the kernel.

Notice the bpf_probe_read_user_str() helper function. You cannot simply dereference user-space pointers in eBPF; the verifier prohibits direct memory access. Instead, you must use approved helper functions that perform bounds checking and handle page faults safely. This is the verifier enforcing one of its fundamental rules: all memory access must be provably safe.

The bpf_printk() helper writes to the kernel’s trace buffer, accessible from /sys/kernel/debug/tracing/trace_pipe. In production code, you would use BPF maps instead, but for learning, the trace pipe provides immediate feedback.

Understanding the Verifier’s Constraints

Before you can load this program, you need to understand what the verifier allows and prohibits. The verifier’s primary concern is proving that your program terminates and cannot corrupt kernel memory.

The most visible constraint is the prohibition on unbounded loops. Early eBPF versions banned all backward jumps, making traditional loops impossible. Modern kernels (5.3+) allow bounded loops where the verifier can prove a maximum iteration count, but you must either use a compile-time constant bound or convince the verifier through careful register tracking. Most eBPF programs avoid this complexity by unrolling loops or using tail calls for iteration.

Memory access restrictions are equally stringent. Every pointer dereference must be proven safe at verification time. When accessing kernel memory, you use helpers like bpf_probe_read_kernel(). For user memory, bpf_probe_read_user() and its string variant handle the complexity of page faults and address translation. The verifier tracks pointer bounds through static analysis—if it cannot prove that an array access stays within bounds, it rejects the program.

Stack space is limited to 512 bytes, enforcing efficient data structure design. This constraint prevents stack overflow and keeps eBPF programs lightweight. For larger data structures, you use BPF maps, which we’ll cover in the next chapter.

Compiling and Loading

Compile the program with Clang, which has native eBPF support:

Terminal window
clang -O2 -g -target bpf -c trace_openat.bpf.c -o trace_openat.bpf.o

The -target bpf flag generates eBPF bytecode instead of x86 or ARM machine code. The resulting object file contains both the compiled eBPF instructions and BTF (BPF Type Format) metadata that the verifier uses to understand data structure layouts.

Load the program with bpftool, the Swiss Army knife of eBPF development:

Terminal window
bpftool prog load trace_openat.bpf.o /sys/fs/bpf/trace_openat
bpftool prog attach pinned /sys/fs/bpf/trace_openat tracepoint syscalls/sys_enter_openat

The first command loads the program into the kernel, where the verifier inspects every instruction. If the verifier rejects your program, it provides detailed error messages explaining which instruction violated which safety constraint—typically reporting the exact line number and register state at the point of failure. The second command attaches the loaded program to the tracepoint.

The pinning operation (/sys/fs/bpf/trace_openat) creates a persistent reference in the BPF filesystem, allowing the program to remain loaded even after bpftool exits. Without pinning, the program would be unloaded as soon as all file descriptors referencing it close.

Watching System Calls in Real Time

Read the trace output to see file opens as they happen:

Terminal window
cat /sys/kernel/debug/tracing/trace_pipe

You’ll see every file access on the system:

bash-3821 [001] .... 1234.567890: bpf_trace_printk: openat: /etc/ld.so.cache
bash-3821 [001] .... 1234.567901: bpf_trace_printk: openat: /lib/x86_64-linux-gnu/libc.so.6
apt-check-2134 [003] .... 1234.678234: bpf_trace_printk: openat: /var/lib/apt/lists/lock

This is eBPF’s power: you just instrumented the kernel without compiling a module, rebooting, or risking a kernel panic. The verifier guaranteed that your program terminates, doesn’t crash, and can’t corrupt kernel memory. The performance overhead is minimal—tracepoints execute in nanoseconds, and the JIT compiler translates your eBPF bytecode into native machine instructions for maximum efficiency.

The next section examines how the verifier achieves these guarantees and what constraints it imposes on your programs.

The Verifier: Understanding eBPF’s Safety Guarantees

The eBPF verifier is your safety net and your constraint boundary. Every eBPF program passes through a static analyzer before it touches kernel memory, ensuring that no program can crash the kernel, read arbitrary memory, or loop indefinitely. Understanding how the verifier thinks transforms debugging from frustration into systematic problem-solving.

Visual: verifier analysis flow showing register state tracking and safety checks

How the Verifier Analyzes Your Code

The verifier performs a depth-first search through every possible execution path in your program, tracking the state of each register and stack slot. It validates memory accesses, ensures proper initialization of variables, and proves that your program terminates. This isn’t heuristic analysis—the verifier must prove safety for every conceivable code path or reject your program entirely.

The verification process maintains a range of possible values for each register. When you check if (x < 100), the verifier knows that x is bounded by 100 in the true branch. This range tracking enables safe array accesses: bounds-check your index, and the verifier proves the access is safe.

Common Rejection Patterns

Most verifier rejections fall into predictable categories. Unbounded loops fail immediately—the verifier requires a maximum iteration count it can prove at load time. Memory accesses need explicit bounds checking before dereferencing pointers. Reading from the stack requires proving the value was initialized. Accessing map values requires null-checking the pointer returned by the lookup helper.

Helper function constraints are particularly strict. You can only call approved helper functions, and you must pass arguments that match the expected types and constraints. Passing a potentially-null pointer to a helper expecting a valid pointer triggers rejection. The verifier tracks pointer types distinctly: a pointer to a map value is not interchangeable with a pointer to a socket buffer.

💡 Pro Tip: When the verifier rejects your program, read the error message backward from the failing instruction. The verifier shows you its register state tracking, revealing exactly which invariant you violated.

The complexity limit (originally one million instructions verified, now configurable) exists because verification time scales with program complexity. Real production programs rarely hit this limit unless they inline large amounts of duplicated logic.

These constraints feel restrictive initially, but they’re the foundation of eBPF’s zero-overhead safety model. No runtime checks. No kernel panics. Just programs that either provably work or don’t load. With this safety foundation established, you need a mechanism to move data between your kernel program and user space analysis tools—that’s where BPF maps come in.

BPF Maps: Sharing Data Between Kernel and User Space

Your eBPF program runs in kernel space, processing thousands of events per second. But that data is worthless if you can’t get it to your user-space monitoring tools. BPF maps are the bridge—they’re shared memory regions that let kernel and user space exchange data efficiently, often aggregating millions of raw events into summary statistics before they ever leave the kernel.

Think of maps as kernel-resident hash tables or arrays that persist across program invocations. When your eBPF program hooks a TCP connect event, it writes to a map. Your user-space daemon reads from that same map. No expensive context switches for every event, no overwhelming your CPU with syscalls.

Map Types and Their Use Cases

eBPF provides several map types optimized for different access patterns:

BPF_MAP_TYPE_HASH stores key-value pairs with O(1) lookup. Use this for tracking per-process statistics or aggregating metrics by arbitrary identifiers. The kernel handles hash collisions and memory management automatically.

BPF_MAP_TYPE_ARRAY provides integer-indexed storage with predictable performance. Perfect for per-CPU counters or fixed-size lookup tables. Arrays pre-allocate all entries at creation, so they’re faster than hash maps but less flexible.

BPF_MAP_TYPE_PERF_EVENT_ARRAY and BPF_MAP_TYPE_RINGBUF stream events to user space. Ring buffers (the newer option) provide better performance and memory efficiency than perf event arrays, with a single shared buffer instead of per-CPU buffers.

BPF_MAP_TYPE_LRU_HASH adds automatic eviction when the map fills up, maintaining the most recently used entries. This prevents memory exhaustion when tracking unbounded keysets like external IP addresses.

Practical Example: TCP Connection Latency Tracking

Let’s build a tool that measures TCP handshake latency by destination port. We’ll track when connections start and complete, computing the delta in-kernel:

tcp_latency.bpf.c
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 10240);
__type(key, __u64);
__type(value, __u64);
} start_times SEC(".maps");
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 1024);
__type(key, __u16);
__type(value, __u64);
} port_latency SEC(".maps");
SEC("kprobe/tcp_v4_connect")
int trace_connect(struct pt_regs *ctx)
{
__u64 pid_tgid = bpf_get_current_pid_tgid();
__u64 ts = bpf_ktime_get_ns();
bpf_map_update_elem(&start_times, &pid_tgid, &ts, BPF_ANY);
return 0;
}
SEC("kretprobe/tcp_v4_connect")
int trace_connect_return(struct pt_regs *ctx)
{
__u64 pid_tgid = bpf_get_current_pid_tgid();
__u64 *start_ts = bpf_map_lookup_elem(&start_times, &pid_tgid);
if (!start_ts)
return 0;
__u64 delta = bpf_ktime_get_ns() - *start_ts;
struct sock *sk = (struct sock *)PT_REGS_RC(ctx);
__u16 dport = BPF_CORE_READ(sk, __sk_common.skc_dport);
bpf_map_update_elem(&port_latency, &dport, &delta, BPF_ANY);
bpf_map_delete_elem(&start_times, &pid_tgid);
return 0;
}
char LICENSE[] SEC("license") = "GPL";

The start_times map tracks when each thread initiated a connection using its PID/TGID as the key. When the connection completes, we calculate the latency and store it in port_latency indexed by destination port. The user-space program reads port_latency periodically to display results.

Performance Considerations

Map operations are fast but not free. Each bpf_map_lookup_elem call involves lock acquisition and memory access. For high-frequency hooks, consider using per-CPU arrays to avoid lock contention—each CPU writes to its own array slot, and user space aggregates the results.

Batch operations (bpf_map_lookup_batch, bpf_map_update_batch) reduce syscall overhead when processing many entries from user space. Instead of 1000 syscalls to read 1000 map entries, you make one syscall.

Map size matters. A hash map with 1 million entries consumes significant kernel memory and increases lookup times. Use LRU maps for unbounded keysets, or implement sampling to reduce the data volume. The verifier enforces memory limits, but you’ll hit performance cliffs before you hit those limits.

Understanding map semantics is critical for correctness. Concurrent updates to the same key can race—use BPF_NOEXIST or BPF_EXIST flags to implement atomic check-and-set patterns. For counters, use __sync_fetch_and_add to ensure atomic increments across CPUs.

With maps mastered, you can build sophisticated observability tools. But there’s a portability problem: kernel data structures change between versions, and hardcoded offsets break. Next, we’ll explore CO-RE—Compile Once, Run Everywhere—the technology that makes eBPF programs portable across kernel versions without recompilation.

CO-RE: Making eBPF Programs Portable Across Kernel Versions

One of eBPF’s biggest operational challenges is kernel struct portability. When your program reads struct task_struct to extract process information, it’s accessing kernel memory layouts that change between kernel versions. A field at offset 0x120 in kernel 5.10 might be at 0x150 in kernel 5.15. Traditional eBPF programs hard-code these offsets at compile time, breaking when deployed on different kernels.

This is where CO-RE (Compile Once, Run Everywhere) transforms eBPF from a fragile system hack into production-grade infrastructure. CO-RE lets you write kernel-agnostic programs that adapt to whatever kernel they’re running on—no recompilation required.

How CO-RE Works: BTF and Runtime Relocation

CO-RE relies on BTF (BPF Type Format), a compact metadata format that describes kernel data structures. Modern kernels (5.2+) ship with BTF information embedded in /sys/kernel/btf/vmlinux, providing a complete type map of every struct, union, and enum in the running kernel.

When you compile a CO-RE program with libbpf, it embeds relocation records that say “I need field X from struct Y.” At load time, libbpf reads the target kernel’s BTF, calculates the actual field offsets, and patches your program before passing it to the verifier. Your code runs with correct offsets, regardless of kernel version differences.

Here’s a portable program that reads process credentials across any kernel:

portable_creds.bpf.c
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_core_read.h>
struct event {
u32 pid;
u32 uid;
u32 gid;
char comm[16];
};
struct {
__uint(type, BPF_MAP_TYPE_RINGBUF);
__uint(max_entries, 256 * 1024);
} events SEC(".maps");
SEC("tp/sched/sched_process_exec")
int trace_exec(struct trace_event_raw_sched_process_exec *ctx)
{
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
struct event *e;
e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
if (!e)
return 0;
e->pid = BPF_CORE_READ(task, pid);
e->uid = BPF_CORE_READ(task, real_cred, uid.val);
e->gid = BPF_CORE_READ(task, real_cred, gid.val);
bpf_core_read_str(&e->comm, sizeof(e->comm), &task->comm);
bpf_ringbuf_submit(e, 0);
return 0;
}
char LICENSE[] SEC("license") = "GPL";

The BPF_CORE_READ macro is the secret weapon. Instead of writing task->real_cred->uid.val (which hard-codes offset assumptions), it uses CO-RE relocation to find the correct path at runtime. The bpf_core_read_str helper handles string reads with the same portability guarantees.

Writing Portable Programs: Best Practices

The vmlinux.h header file (generated by bpftool btf dump file /sys/kernel/btf/vmlinux format c) contains complete kernel type definitions for your target kernel. Use it instead of manually defining structs—libbpf will relocate field accesses automatically.

Always use CO-RE helpers for kernel memory access:

  • BPF_CORE_READ() for reading fields through pointer chains
  • bpf_core_read() for raw memory copies with relocation
  • bpf_core_field_exists() to check if a field exists (handles kernels where features were added or removed)
  • bpf_core_type_exists() to conditionally compile code based on type availability

This approach works across kernel versions from 5.2 onwards. Your compiled BPF object runs on development laptops with kernel 6.1, production servers on 5.10, and embedded devices on 5.15—identical bytecode, zero recompilation.

💡 Pro Tip: Use bpf_core_field_exists() to handle kernel config variations. Some distributions disable features like CONFIG_CGROUPS, removing related fields from structs. Check existence before accessing to avoid verifier rejections.

CO-RE turns eBPF programs into truly portable kernel modules. With proper BTF usage, you ship a single binary that adapts to any modern Linux kernel, eliminating the distribution nightmare that plagued traditional kernel instrumentation. Next, we’ll examine how to deploy these programs in production without sacrificing performance.

Production Deployment: Observability Without Overhead

The promise of eBPF is comprehensive kernel visibility with negligible performance impact. In production, that promise requires disciplined measurement and intentional design choices.

Quantifying Performance Impact

eBPF programs execute in nanoseconds, but high-frequency events add up. A program attached to tcp_sendmsg on a busy web server fires millions of times per second. Even a 500-nanosecond program creates measurable overhead.

The first production rule: measure your actual impact. Before deploying any eBPF instrumentation, establish baseline CPU metrics. Tools like bpftool prog show display per-program run counts and execution time. For a running program with ID 42, bpftool prog profile id 42 duration 10 samples CPU cycles consumed over 10 seconds.

Calculate overhead percentage explicitly. If your program runs 2 million times per second with an average 400ns execution time, that’s 800ms of CPU time per second per core—8% overhead before you’ve extracted any value from the data.

Strategic Sampling

Production-grade eBPF uses sampling to maintain single-digit overhead percentages. Rather than tracing every packet or system call, sample 1% of events. For most observability use cases, statistical sampling provides sufficient signal to identify anomalies and trends.

Implement sampling directly in your eBPF program using the bpf_get_prandom_u32() helper. Return early for events that don’t meet your sampling threshold—this happens in kernel space before any data copying or map updates occur. A simple check like if (bpf_get_prandom_u32() % 100 > 0) return 0; implements 1% sampling with minimal overhead.

Adjust sampling rates dynamically based on load. During normal operation, sample 1% of TCP connections. When your monitoring system detects elevated error rates, increase sampling to 10% for targeted investigation.

Integration with Observability Stacks

eBPF programs feed data to user-space collectors through BPF maps or ring buffers. These collectors expose metrics in Prometheus format, forward traces to Jaeger, or stream logs to existing SIEM systems.

For Prometheus integration, user-space programs poll BPF maps on a fixed interval (typically 15-60 seconds), convert kernel data structures to Prometheus metrics, and expose them via an HTTP endpoint. This pattern decouples kernel instrumentation from observability infrastructure—your eBPF programs remain simple and focused while user-space handles formatting, aggregation, and integration concerns.

Real-World Deployment Patterns

Network teams deploy eBPF to instrument TCP retransmits, DNS query latency, and connection tracking without adding sidecar proxies. Security teams use eBPF to monitor process execution, file access patterns, and network connections at kernel level—visibility that survives container restarts and namespace isolation.

Performance profiling with eBPF-based CPU flame graphs runs continuously in production with sub-1% overhead. Unlike traditional profilers that require process attachment, eBPF profilers observe the entire system with a single always-on collector.

With performance impact measured and sampling tuned, the final deployment consideration is choosing the right development toolchain for your team’s needs.

Choosing Your Development Path: bpftool, libbpf, or High-Level Frameworks

The eBPF ecosystem spans from bare-metal kernel APIs to fully managed observability platforms. Your choice depends on three factors: how much control you need, how quickly you need results, and whether you’re building reusable infrastructure or solving specific observability problems.

Start with high-level frameworks unless you have specific reasons not to. Tools like bpftrace provide one-liner system introspection with syntax inspired by awk and DTrace—perfect for interactive debugging and ad-hoc performance analysis. For production observability, Cilium’s Tetragon offers runtime security enforcement and process execution tracing with policy-based filtering, while BCC (BPF Compiler Collection) provides a Python-based workflow with pre-built tools for common tasks like TCP connection tracking and disk I/O profiling.

These frameworks handle verifier compliance, map management, and userspace integration. The trade-off is opacity: when the framework’s abstractions don’t match your use case, debugging becomes difficult. BCC’s runtime compilation also introduces dependencies that complicate deployment.

Drop down to libbpf when you need portability and control. The libbpf library gives you direct access to CO-RE while handling low-level details like map creation and program loading. You write C for the kernel side and C/C++/Rust/Go for userspace, maintaining full visibility into what executes in the kernel. This approach requires understanding BPF map types, helper function constraints, and verifier requirements—but produces standalone binaries with minimal runtime dependencies.

Projects like eunomia-bpf bridge this gap by providing WebAssembly-based distribution and JSON configuration while preserving libbpf’s CO-RE benefits. This architecture separates eBPF program development from deployment, enabling non-developers to operate pre-built instrumentation.

Reserve bpftool for debugging and inspection, not development. This CLI utility loads programs, inspects maps, and dumps BTF information—essential for troubleshooting production deployments but too low-level for regular use.

💡 Pro Tip: Start with bpftrace scripts to validate your approach, then reimplement critical paths in libbpf for production deployment. This workflow combines prototyping speed with production reliability.

The Linux kernel’s samples/bpf directory and the libbpf-bootstrap repository provide reference implementations. The eBPF community maintains active documentation at ebpf.io and responsive support channels on the eBPF Slack workspace.

With your development environment chosen, you’re ready to instrument production systems with confidence. The next step is integrating these capabilities into your existing observability stack.

Key Takeaways

  • Start with bpftrace or BCC frameworks to prototype eBPF programs quickly, then move to libbpf for production deployments that need portability
  • Always use CO-RE and BTF-enabled kernels (4.18+) to avoid recompiling programs for each kernel version
  • Measure eBPF program overhead with bpftool prog profile and use sampling on high-frequency hooks to keep CPU impact under 2%
  • Leverage BPF maps to aggregate data in-kernel before sending to user space, reducing context switches and improving performance