Skip to content

chore(blazesym): bump to v0.2.4, drop pure-Go kallsyms fallback#26

Open
dpsoft wants to merge 1 commit into
mainfrom
chore/bump-blazesym
Open

chore(blazesym): bump to v0.2.4, drop pure-Go kallsyms fallback#26
dpsoft wants to merge 1 commit into
mainfrom
chore/bump-blazesym

Conversation

@dpsoft

@dpsoft dpsoft commented Jun 10, 2026

Copy link
Copy Markdown
Owner

What

Bumps blazesym to v0.2.4 and removes the pure-Go /proc/kallsyms kernel-symbolization fallback that was added in #25 as a workaround for older blazesym.

Why

#25 added a pure-Go kallsyms fallback because pre-0.2.4 blazesym hard-failed kernel symbolization with BLAZE_ERR_PERMISSION_DENIED whenever /proc/kcore was inaccessible (lockdown=integrity / Secure Boot / no CAP_SYS_RAWIO).

blazesym v0.2.4 (987d36c, "kernel: Don't require /proc/kcore access when only kallsyms is used") fixes the root cause: it only reads /proc/kcore for the KASLR offset when a vmlinux DWARF resolver is actually present. On the common production host (no /boot/vmlinux-* DWARF installed) blazesym resolves kallsyms-only without ever touching kcore, returning the same name+offset the pure-Go fallback gave — so ~1100 lines of fallback machinery are now redundant.

Verification

Tested on a real lockdown=integrity host (kcore 0400, binary setcap'd without cap_sys_rawio, no /boot/vmlinux DWARF):

symbolize: batches=372 input_ips=2763 batch_failures=0 raw_addr_frames=0 eperm=0 other_err=0

eperm=0, fallback_engaged=0, kernel frames fully named (do_syscall_64, __schedule, finish_task_switch, __d_lookup, …).

  • make build links clean against v0.2.4
  • unit tests pass (incl. changed CGO pkgs: symbolize, perfagent, pprof, debuginfod, debuginfod/cache)
  • integration suite: 40 pass / 0 fail / 1 skipTestKernelStackResolution (the test exercising this change) passes; the lone skip is TestOffBoxLibcResolution, unsatisfiable here because glibc-debuginfo isn't published for fc44

Changes

  • remove symbolize/{kallsyms,kallsyms_cache}.go and their tests (pure-Go symbolizer + disk cache + boot-id EPERM marker)
  • simplify symbolize/local_kernel.go: SymbolizeKernel calls blazesym directly; on any error it preserves raw kernel addresses (rawKernelAddrFrames) so kernel context still survives into the pprof
  • drop the now-meaningless KernelFallbackEngaged counter from stats + the /metrics endpoint (keep KernelLockdownEPERM / KernelOtherErr for observability)
  • remove PERFAGENT_FORCE_KERNEL_FALLBACK env + the forced-fallback integration test
  • regenerate eBPF bytecode (.o)

Notes

Pinned to the v0.2.4 tag (the local go.mod uses a path replace, so the bump is just the blazesym checkout). The only case v0.2.4 doesn't cover — /boot/vmlinux-* DWARF installed and lockdown — is handled by post-v0.2.4 _stext KASLR fallback on blazesym main; it's a self-contradictory combo in practice (you don't ship debug vmlinux to a Secure Boot box), and the kept rawKernelAddrFrames backstop degrades it gracefully to raw addresses regardless.

blazesym v0.2.4 (987d36c) no longer reads /proc/kcore for the KASLR
offset unless a vmlinux DWARF resolver is present, so kernel
symbolization works under lockdown=integrity without CAP_SYS_RAWIO.
The pure-Go /proc/kallsyms fallback added in #25 (kallsyms.go +
disk cache + boot-id EPERM marker + sticky ladder + the
PERFAGENT_FORCE_KERNEL_FALLBACK env) is now redundant.

Verified on a lockdown=integrity host (kcore 0400, no cap_sys_rawio,
no /boot/vmlinux DWARF): blazesym resolves kernel frames with
eperm=0, fallback_engaged=0, all frames named.

SymbolizeKernel now calls blazesym directly and, on any error,
preserves raw kernel addresses (rawKernelAddrFrames) so kernel
context still survives into the pprof. Drops the now-meaningless
KernelFallbackEngaged counter from stats + the /metrics endpoint;
keeps KernelLockdownEPERM / KernelOtherErr for observability.

- remove symbolize/{kallsyms,kallsyms_cache}.go and their tests
- simplify symbolize/local_kernel.go (no fallback ladder/seam)
- drop the forced-fallback integration test
- regenerate eBPF bytecode (.o)
@dpsoft dpsoft marked this pull request as ready for review June 10, 2026 23:44
@dpsoft dpsoft requested a review from Copilot June 10, 2026 23:45

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the kernel symbolization pipeline to rely on blazesym v0.2.4 behavior (no longer requiring /proc/kcore for kallsyms-only resolution in the common case) and removes the now-redundant pure-Go /proc/kallsyms fallback machinery and its observability surface area.

Changes:

  • Remove the pure-Go kallsyms symbolizer + disk cache + EPERM marker logic and simplify LocalKernelSymbolizer to call blazesym directly, falling back only to raw-address frames on error.
  • Remove the forced-fallback integration test and the KernelFallbackEngaged counter/metric, updating metrics output and unit tests accordingly.
  • Update related documentation/comments and schemas to reflect the new “blazesym-only + raw-addr backstop” model.

Reviewed changes

Copilot reviewed 16 out of 26 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
symbolize/local_kernel.go Removes the fallback ladder and makes kernel symbolization blazesym-only with raw-address backstop on any blazesym error.
symbolize/stats.go Drops KernelFallbackEngaged and updates counter semantics/comments for the simplified flow.
symbolize/stats_test.go Updates stats tests and adds a seam-based stub to exercise raw-address backstop/stats without CGO.
perfagent/metrics_endpoint.go Removes the perf_agent_symbolize_kernel_fallback_engaged metric and updates HELP text for remaining metrics.
perfagent/metrics_endpoint_test.go Updates Prometheus exposition expectations after metric removal.
test/integration_test.go Removes the forced-fallback integration test that depended on the deleted fallback behavior.
symbolize/allocs_budget_test.go Removes allocation-budget tests that were specific to the deleted kallsyms parser/resolver.
bench/internal/schema/schema.go Updates self-metrics commentary to reflect “blazesym broke” rather than “blazesym + fallback broke”.
symbolize/kallsyms.go (deleted) Removes pure-Go /proc/kallsyms symbolizer implementation.
symbolize/kallsyms_cache.go (deleted) Removes kallsyms disk cache + boot-id/EPERM marker logic.
symbolize/kallsyms_*test.go (deleted) Removes unit/bench tests for deleted kallsyms parser/cache/symbolizer.
symbolize/local_kernel_fallback_test.go (deleted) Removes tests that pinned the old fallback ladder behavior.
symbolize/blazesym_eperm_marker_test.go (deleted) Removes tests for the deleted EPERM marker mechanism.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread symbolize/stats.go
Comment on lines 38 to 40
// Roadmap #4: lets operators distinguish "lockdown-class host
// (every batch EPERMs once before fallback)" from "blazesym is
// throwing some other error" without re-instrumenting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants