chore(blazesym): bump to v0.2.4, drop pure-Go kallsyms fallback#26
Open
dpsoft wants to merge 1 commit into
Open
chore(blazesym): bump to v0.2.4, drop pure-Go kallsyms fallback#26dpsoft wants to merge 1 commit into
dpsoft wants to merge 1 commit into
Conversation
blazesym v0.2.4 (987d36c) no longer reads /proc/kcore for the KASLR offset unless a vmlinux DWARF resolver is present, so kernel symbolization works under lockdown=integrity without CAP_SYS_RAWIO. The pure-Go /proc/kallsyms fallback added in #25 (kallsyms.go + disk cache + boot-id EPERM marker + sticky ladder + the PERFAGENT_FORCE_KERNEL_FALLBACK env) is now redundant. Verified on a lockdown=integrity host (kcore 0400, no cap_sys_rawio, no /boot/vmlinux DWARF): blazesym resolves kernel frames with eperm=0, fallback_engaged=0, all frames named. SymbolizeKernel now calls blazesym directly and, on any error, preserves raw kernel addresses (rawKernelAddrFrames) so kernel context still survives into the pprof. Drops the now-meaningless KernelFallbackEngaged counter from stats + the /metrics endpoint; keeps KernelLockdownEPERM / KernelOtherErr for observability. - remove symbolize/{kallsyms,kallsyms_cache}.go and their tests - simplify symbolize/local_kernel.go (no fallback ladder/seam) - drop the forced-fallback integration test - regenerate eBPF bytecode (.o)
There was a problem hiding this comment.
Pull request overview
This PR updates the kernel symbolization pipeline to rely on blazesym v0.2.4 behavior (no longer requiring /proc/kcore for kallsyms-only resolution in the common case) and removes the now-redundant pure-Go /proc/kallsyms fallback machinery and its observability surface area.
Changes:
- Remove the pure-Go kallsyms symbolizer + disk cache + EPERM marker logic and simplify
LocalKernelSymbolizerto call blazesym directly, falling back only to raw-address frames on error. - Remove the forced-fallback integration test and the
KernelFallbackEngagedcounter/metric, updating metrics output and unit tests accordingly. - Update related documentation/comments and schemas to reflect the new “blazesym-only + raw-addr backstop” model.
Reviewed changes
Copilot reviewed 16 out of 26 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
symbolize/local_kernel.go |
Removes the fallback ladder and makes kernel symbolization blazesym-only with raw-address backstop on any blazesym error. |
symbolize/stats.go |
Drops KernelFallbackEngaged and updates counter semantics/comments for the simplified flow. |
symbolize/stats_test.go |
Updates stats tests and adds a seam-based stub to exercise raw-address backstop/stats without CGO. |
perfagent/metrics_endpoint.go |
Removes the perf_agent_symbolize_kernel_fallback_engaged metric and updates HELP text for remaining metrics. |
perfagent/metrics_endpoint_test.go |
Updates Prometheus exposition expectations after metric removal. |
test/integration_test.go |
Removes the forced-fallback integration test that depended on the deleted fallback behavior. |
symbolize/allocs_budget_test.go |
Removes allocation-budget tests that were specific to the deleted kallsyms parser/resolver. |
bench/internal/schema/schema.go |
Updates self-metrics commentary to reflect “blazesym broke” rather than “blazesym + fallback broke”. |
symbolize/kallsyms.go (deleted) |
Removes pure-Go /proc/kallsyms symbolizer implementation. |
symbolize/kallsyms_cache.go (deleted) |
Removes kallsyms disk cache + boot-id/EPERM marker logic. |
symbolize/kallsyms_*test.go (deleted) |
Removes unit/bench tests for deleted kallsyms parser/cache/symbolizer. |
symbolize/local_kernel_fallback_test.go (deleted) |
Removes tests that pinned the old fallback ladder behavior. |
symbolize/blazesym_eperm_marker_test.go (deleted) |
Removes tests for the deleted EPERM marker mechanism. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
38
to
40
| // Roadmap #4: lets operators distinguish "lockdown-class host | ||
| // (every batch EPERMs once before fallback)" from "blazesym is | ||
| // throwing some other error" without re-instrumenting. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Bumps blazesym to v0.2.4 and removes the pure-Go
/proc/kallsymskernel-symbolization fallback that was added in #25 as a workaround for older blazesym.Why
#25 added a pure-Go kallsyms fallback because pre-0.2.4 blazesym hard-failed kernel symbolization with
BLAZE_ERR_PERMISSION_DENIEDwhenever/proc/kcorewas inaccessible (lockdown=integrity / Secure Boot / noCAP_SYS_RAWIO).blazesym v0.2.4 (
987d36c, "kernel: Don't require /proc/kcore access when only kallsyms is used") fixes the root cause: it only reads/proc/kcorefor the KASLR offset when a vmlinux DWARF resolver is actually present. On the common production host (no/boot/vmlinux-*DWARF installed) blazesym resolves kallsyms-only without ever touching kcore, returning the same name+offset the pure-Go fallback gave — so ~1100 lines of fallback machinery are now redundant.Verification
Tested on a real lockdown=integrity host (kcore
0400, binary setcap'd withoutcap_sys_rawio, no/boot/vmlinuxDWARF):eperm=0,fallback_engaged=0, kernel frames fully named (do_syscall_64,__schedule,finish_task_switch,__d_lookup, …).make buildlinks clean against v0.2.4symbolize,perfagent,pprof,debuginfod,debuginfod/cache)TestKernelStackResolution(the test exercising this change) passes; the lone skip isTestOffBoxLibcResolution, unsatisfiable here becauseglibc-debuginfoisn't published for fc44Changes
symbolize/{kallsyms,kallsyms_cache}.goand their tests (pure-Go symbolizer + disk cache + boot-id EPERM marker)symbolize/local_kernel.go:SymbolizeKernelcalls blazesym directly; on any error it preserves raw kernel addresses (rawKernelAddrFrames) so kernel context still survives into the pprofKernelFallbackEngagedcounter from stats + the/metricsendpoint (keepKernelLockdownEPERM/KernelOtherErrfor observability)PERFAGENT_FORCE_KERNEL_FALLBACKenv + the forced-fallback integration test.o)Notes
Pinned to the v0.2.4 tag (the local
go.moduses a pathreplace, so the bump is just the blazesym checkout). The only case v0.2.4 doesn't cover —/boot/vmlinux-*DWARF installed and lockdown — is handled by post-v0.2.4_stextKASLR fallback on blazesymmain; it's a self-contradictory combo in practice (you don't ship debug vmlinux to a Secure Boot box), and the keptrawKernelAddrFramesbackstop degrades it gracefully to raw addresses regardless.