Refactor jsRpcSession span parenting and ownership#6704
Refactor jsRpcSession span parenting and ownership#6704
Conversation
|
UnknownError: ProviderInitError |
|
@Ankcorn Bonk workflow failed. Check the logs for details. View workflow run · To retry, trigger Bonk again. |
There was a problem hiding this comment.
I'm Bonk, and I've done a quick review of your PR.
This lifts the jsRpcSession span out of TraceContext and onto JsRpcSessionCustomEvent, so the span is reachable from the event and available for enrichment in the followup PR. The span lifetime and ownership model looks correct.
One minor note (low severity):
|
Review posted successfully. Here's a summary of my analysis: Overall assessment: The PR is well-structured. The span lifetime model is correct — One minor issue flagged: The new |
04e17f6 to
b28449b
Compare
|
@ask-bonk review this PR |
There was a problem hiding this comment.
The featureFlagsForFl issue from my previous review has been addressed — the uint channel case now populates it via the metadataExtra callback.
On re-review, the span ownership model is sound and the code is correct. One minor nit below on a comment that looks like it has placeholder text left over from editing.
This review was generated by an AI assistant (Bonk) and may contain mistakes.
|
Review posted successfully. Here's a summary of the re-review: Previous feedback status: The Re-review findings: The code is correct. The span ownership model is sound:
One minor nit posted: A comment at line 2414 in |
fdc5681 to
04aa1ca
Compare
|
@ask-bonk review this PR |
There was a problem hiding this comment.
Both issues from prior reviews (missing featureFlagsForFl, truncated comment) are addressed. On re-review the refactoring is correct — no new issues found.
Summary of verification:
- Span lifetime.
SpanBuilderis move-only with RAIIend()in its destructor. Storing it onJsRpcSessionCustomEventgives it session-scoped lifetime, matching the logical span duration. The event is kept alive by.attach()on thecustomEvent()promise. Correct. SpanParentref-counting.SpanParent(SpanBuilder&)callsmapAddRefon the builder's observer (akj::Refcounted).addRef()does the same. Both thesessionSpanbuilder and theSubrequestMetadata::userSpanParenthold independentkj::Ownreferences to the same observer. Correct.- Capture-by-reference safety. All
[&]lambdas inwithSessionSpan/getSubrequestare called synchronously (kj::FunctionParamis non-owning by design). No dangling references. - Scratch path vector.
callImpluses a separatesessionPathvector fortryGetJsRpcSessionClientand onlykj::mvs it intopathon success.JsRpcPropertyappending unconditionally to the scratch vector is harmless since it's discarded on thekj::nonefallback. TheStringPtrs point toJsRpcProperty::namemembers that outlive the synchronousmethodPathconstruction. Correct. getCap()before event move.getCap()moves the promise-side cap out of the event; the cap then lives independently via capnp's refcounting. Moving the event intocustomEvent()afterward is safe — same pattern as the original code.- OutgoingFactory variants. Returning
SpanBuilder(nullptr)for DO/cross-process actor factories correctly skips thejsRpcSessionspan. These factories produce their own outer span (e.g.durable_object_subrequest), so no redundant span is created. Matches prior behavior. - Test. The new
jsRpcInsideEnterSpantest validates thatgetCurrentUserTraceSpan().newChild()picks up theenterSpan-pushedAsyncContextFrame, mirroring the existingfetchInsideEnterSpancase for HTTP. The.wd-testcorrectly adds therpccompat flag and the entrypoint binding.
This review was generated by an AI assistant (Bonk) and may contain mistakes.
|
Review posted successfully. Here's a summary: Both issues from my prior reviews have been addressed ( I verified seven aspects of correctness:
|
04aa1ca to
685b7fa
Compare
Adds Case 7 to tracing-hierarchy-instrumentation-test: an RPC call made inside enterSpan() must produce a jsRpcSession user span whose parent is the enterSpan, not the top-level onset span. This is the RPC analog of the existing fetchInsideEnterSpan case.
The jsRpcSession user span is now created in Fetcher::getJsRpcClient() — a new helper that returns the WorkerInterface plus the span — and transferred to the JsRpcSessionCustomEvent where it lives as a member SpanBuilder until the event is destroyed (i.e. session end). Previously the span was created inside getClientWithTracing() and attached to the WorkerInterface via .attach(). The event itself had no visibility into the span, which made it impossible for callImpl() to reach it for runtime enrichment (e.g. AI Gateway binding span tags). Behavioural notes: - Span created on the caller side before startRequest() so its ID is available for USER_SPAN_CONTEXT_PROPAGATION (the callee's onset event reports this span as its parent). - Direct channel variants (uint, IoOwn<SubrequestChannel>) get a jsRpcSession span. OutgoingFactory variants (DurableObject stubs, cross-process actors) already create their own outer span (e.g. durable_object_subrequest), so jsRpcSession is omitted to avoid redundancy. This matches pre-existing behaviour for those variants. - ioContext.now() (I/O time) is used for the explicit start time so we remain Spectre-safe and deterministic in test mode.
Splits JsRpcClientProvider's single virtual into two paths that match
the two real shapes of providers:
- getClientForOneCall(js, path) — for cap-holders (JsRpcStub,
JsRpcPromise) that already hold a live cap. Defaulted to
KJ_UNIMPLEMENTED so session-creating providers don't have to
pretend to implement it.
- tryGetJsRpcSessionClient(ioContext, path) — for session-creating
providers (Fetcher). Returns the WorkerInterface and jsRpcSession
SpanBuilder so callImpl can construct the JsRpcSessionCustomEvent
itself. Default returns kj::none.
callImpl checks the session path first and falls back to the cap path.
Fetcher no longer constructs JsRpcSessionCustomEvent; it just exposes
the building blocks.
Why: the previous shape made Fetcher (an HTTP-layer concept) responsible
for instantiating an RPC-layer event class. Worse, callImpl needed a
raw SpanBuilder* returned from getClientForOneCall to apply enrichment
to a span owned by an event it didn't construct. Moving event
construction into callImpl puts session lifecycle next to the only code
that depends on it and removes the cross-boundary pointer.
685b7fa to
b367ed8
Compare
The refactor that moved jsRpcSession span ownership into
JsRpcSessionCustomEvent only created the user-facing span via
ioContext.getCurrentUserTraceSpan().newChild(...), but the original
Fetcher::getClient("jsRpcSession") path used IoContext::makeUserTraceSpan
which produces a TraceContext containing BOTH an internal trace span and
a user span.
Without the internal span, the SubrequestMetadata.parentSpan handed to
the callee was an unobserved/empty SpanParent (the default-constructed
TraceContext local in IoContext::getSubrequestNoChecks), so trace
context never propagated to downstream subrequests the callee made
(e.g. the QuickSilver Read.get the dynamic-worker ban check issues).
This regressed edgeworker's DynamicWorkerBan."Dynamic worker loads when
not banned" test, which asserts traceIdLow on the QS request matches
the injected trace ID. Reproduced locally and verified fix passes all
six previously failing edgeworker tests.
Merging this PR will not alter performance
Comparing Footnotes
|
Previously JsRpcClientProvider::getClientForOneCall and
tryGetJsRpcSessionClient took kj::Vector<kj::StringPtr>& path as an
out-parameter. Each override had to know whether to append (only
JsRpcProperty does, because it's the only non-root provider) and not
to double-append, and callImpl needed a scratch path vector to handle
the kj::none-from-Fetcher case without leaking partial state.
Move path into the return value: getClientForOneCall returns
OneCall { client, path }, and tryGetJsRpcSessionClient's
JsRpcSessionClient now carries its own path. JsRpcProperty's overrides
forward to the parent and append to the returned struct's path.
Benefits:
- Forgetting to handle kj::none before touching path is now a
compile error (you have to KJ_IF_SOME the Maybe first), not a
silent runtime corruption.
- Each frame owns its own path vector — no shared state, no
double-append risk.
- Drops the (void)path; / scratch-vector / commit-on-success
ceremony in callImpl.
- Drops several explanatory comments that were warning readers about
the shared-mutation footgun -- the type system now enforces the
invariant, so the comments are unnecessary.
Behaviour is unchanged. All workerd tests pass; edgeworker
DynamicWorkerBan still passes.
Pure relocation -- the function stays a member of Fetcher (declared in
http.h, accessing its private channelOrClientFactory and isInHouse),
just defined in a different .c++. Builds in the same translation-unit
graph; no header or BUILD changes besides one #include.
The whole jsRpcSession lifecycle now lives in one file:
- tryGetJsRpcSessionClient: creates the user + internal spans and
builds the WorkerInterface
- callImpl: takes the user span out of the result, hands it to a
new JsRpcSessionCustomEvent, dispatches the worker
- JsRpcSessionCustomEvent::run: callee side
(and on the AIG branch, the response lambda that applies enrichment
to the user span sits in callImpl right next to where the span was
handed in)
Reviewers can follow the lifecycle without jumping between files.
Trade-off: Fetcher's implementation now spans http.c++ and
worker-rpc.c++, but only one method is split out -- the jsRpc-specific
one.
The base class JsRpcClientProvider::tryGetJsRpcSessionClient already documents the contract; the override doesn't change semantics so copy-pasting the doc adds no information.
Reverts the structural part of 8de09c1 (path-by-value via OneCall + JsRpcSessionClient.path). On reflection the refactor was net-negative for this small, stable hierarchy (4 providers, 1 forwarder): - Every leaf return now spells out an empty path: `{client, {}}`, `{worker, span, {}}`. Three leaves, in perpetuity. - The footgun the refactor was supposed to prevent ("forget to handle kj::none before mutating path") was already prevented by callImpl's scratch-vector pattern -- the failure mode was theoretical, not real. Restored layout: - getClientForOneCall(jsg::Lock&, kj::Vector<kj::StringPtr>& path) - tryGetJsRpcSessionClient(IoContext&, kj::Vector<kj::StringPtr>& path) - callImpl uses a scratch `sessionPath` for the session attempt, commits on success. - JsRpcProperty appends unconditionally (the convention). - Leaves don't touch `path`. Three rules across four places, all simple, no return-value boilerplate at the leaves. The structural improvements that *did* land earlier (span-pair separation, return-value-only struct for JsRpcSessionClient) are preserved.
tryGetJsRpcSessionClient used to create the user-facing jsRpcSession
span and the internal trace span itself, then return them inside
JsRpcSessionClient. That puts span policy in a Fetcher accessor that
shouldn't know about jsRpc semantics, only about channel variants.
Hoist span ownership into callImpl, which is the function that knows
it's about to start a jsRpc session (creating the membrane via
JsRpcSessionCustomEvent). Move maps the code structure onto the future
span hierarchy: callImpl will own jsRpcSession (here, today) and
jsRpcCall / jsRpcTargetCall (followups). Each span lives in the branch
that owns its lifecycle.
To preserve trace context propagation to the callee (the parent IDs
must be present in SubrequestMetadata when getSubrequest runs inside
tryGet), span creation has to happen *before* tryGet is called and
the parents passed in. Two virtuals:
* wouldCreateJsRpcSessionSpan() lets callImpl peek at policy
(true for service bindings / direct channels, false for
OutgoingFactory variants that already produce
durable_object_subrequest etc. and would only get redundant
nesting from a jsRpcSession on top).
* tryGetJsRpcSessionClient() now takes SpanParent internalSpanParent
and SpanParent userSpanParent as inputs, plumbed straight into
SubrequestMetadata. Both may be unobserved on the negative path.
JsRpcSessionClient shrinks to just { worker } -- the WorkerInterface.
callImpl then does:
- peek wouldCreateJsRpcSessionSpan(), conditionally make spans
- call tryGet with the SpanParents
- attach internal span to the worker (lives = session lifetime)
- move user span into JsRpcSessionCustomEvent (also session-scoped,
and reachable from callImpl's response lambda for binding span
enrichment in the AIG followup)
JsRpcProperty's forwarder grows a wouldCreateJsRpcSessionSpan
override that delegates to its parent, same shape as the existing
tryGet forwarder.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6704 +/- ##
==========================================
- Coverage 66.57% 66.56% -0.01%
==========================================
Files 402 402
Lines 115911 115904 -7
Branches 19407 19411 +4
==========================================
- Hits 77163 77147 -16
- Misses 27172 27173 +1
- Partials 11576 11584 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The previous shape had two virtuals (`wouldCreateJsRpcSessionSpan` and
`tryGetJsRpcSessionClient`) plus a peek-then-act dance in callImpl, all
to keep span construction visually inside callImpl. The cost was two
parallel switches over Fetcher::channelOrClientFactory that had to stay
in sync -- a real maintenance smell.
Collapse to one virtual that returns both the WorkerInterface and the
user-facing jsRpcSession span. For OutgoingFactory variants that don't
want a session span (DurableObject stubs etc. produce their own
durable_object_subrequest), return SpanBuilder(nullptr); moving an
unobserved SpanBuilder into the event is a no-op so callImpl handles
both cases uniformly without any branch.
Result:
- Drops `wouldCreateJsRpcSessionSpan` (base + Fetcher + JsRpcProperty).
- Single switch in Fetcher::tryGetJsRpcSessionClient.
- JsRpcProperty has one forwarder instead of two.
- callImpl no longer peeks or constructs spans -- it just moves what
tryGet returns into the event.
- ~50 lines net.
Dan's principle ("instrumentation in the function with knowledge of
internal semantics") still holds: tryGetJsRpcSessionClient lives in
worker-rpc.c++ and its job is literally to start a jsRpc session.
Constructing the session span at that point is the right place. The
followup PRs (jsRpcCall, jsRpcTargetCall) will still slot cleanly into
callImpl's existing two-branch structure -- those are per-call spans
and naturally belong in callImpl, not here.
Prep for native jsRpcSession span enrichment.
Background
jsRpcSessionis the user span for one RPC session. Today it's created byFetcher::getClientForOneCall(HTTP) and kept alive via.attach()on theWorkerInterface. AIG enrichment needs to write to this span fromcallImpl's response handler — but.attach()makes it unreachable, andFetcherconstructing anRPC-layer event blurs HTTP/RPC layering.
Commit 1 — Test:
jsRpcSessionparented to enclosingenterSpanRPC analog of
fetchInsideEnterSpan. Locks in the contract before moving the span.Commit 2 — Move span ownership into
JsRpcSessionCustomEvent.attach()ordering (side effect); now correct by construction..attach()hides the object; AIG needs a named path.event->jsRpcSessionSpanis that path."jsRpcSession"string moves out of generic HTTP plumbing intoFetcher::getJsRpcClientreturning{worker, span}.OutgoingFactoryvariants pass
SpanBuilder(nullptr)— they already make adurable_object_subrequestspan.Commit 3 — Move event construction into
callImplFetcherproduces{worker, span};callImplnames and uses them.&event->jsRpcSessionSpanincallImpl's response lambda; constructing the event there keeps it in-scope. Otherwisethe AIG PR would need a
ClientForOneCall { client, SpanBuilder* }struct on the provider API.JsRpcClientProviderhad one virtual hiding two shapes. Now two, each defaulted:getClientForOneCall— cap-holders (JsRpcStub,JsRpcPromise).tryGetJsRpcSessionClient— session-creators (Fetcher).Note. Both virtuals append to a shared
pathvector.callImplgives the session attempt a scratch vector and only commits on success, so forwarders canappend unconditionally. (Caught early — 4 pipelining tests failed before the fix.)
Net effect
.attach()-ed toWorkerInterfaceJsRpcSessionCustomEventFetcher(HTTP)callImpl(RPC)JsRpcClientProviderevent->jsRpcSessionSpan