feat(server-utils): Add tracingChannel-to-span binding#21641
Conversation
8c4ca47 to
61a560f
Compare
size-limit report 📦
|
2fbdda3 to
3cb7959
Compare
3cb7959 to
e34e28a
Compare
08b9cbb to
c02e8e1
Compare
|
|
||
| export interface TracingChannelBinding { | ||
| asyncLocalStorage: unknown; | ||
| getStoreWithActiveSpan: (span: Span) => unknown; |
There was a problem hiding this comment.
is it important/necessary what this returns? Seems like it would make it very hard to use this, that this can return completely "random" things? like, having { scope, isolationScope } or an otel context as possible return types seems to me as if it would be more or less impossible to write code just using this...?
There was a problem hiding this comment.
technically, it doesn't matter what it returns as long as it is consistent with the ALS value. For our implementations its the scope/isolationScope, for OTel its the context.
The contents of the value don't really matter for us, it's opaque and is not meant to be inspected upon by the caller. Locking it to specific types would lock the implementations as well and would make it rely on OTel, so I think I should leave it as unknown for now.
|
|
||
| const asyncLocalStorage = binding.asyncLocalStorage as AsyncLocalStorage<TData>; | ||
|
|
||
| channel.start.bindStore(asyncLocalStorage, (data: TracingChannelPayloadWithSpan<TData>) => { |
There was a problem hiding this comment.
Could you add comments throughout this file explaining how these things work, I find it pretty hard to follow what is happening where and why 😅 e.g. how is getStoreWithActiveSpan supposed to work when this is abstract and could return anything etc?
There was a problem hiding this comment.
I added the flow steps there, added a diagram as well and updated the PR description with more info.
| * but should reuse the enclosing span instead of opening (and ending) their own — e.g. an agent | ||
| * loop's per-step events, where ending a freshly opened span would close the parent prematurely. | ||
| */ | ||
| export function bindTracingChannelToSpan<TData extends object>( |
There was a problem hiding this comment.
Not 100% sure because honestly I find all of this a bit hard to follow, but does it make sense to make this a single method for auto/manual mode? Would it not be cleaner to have to methods, e.g. bindTracingChannelToSpan (generic, manual) and one binTracingChannelToSpanWithLifecycle for auto mode? 🤔
There was a problem hiding this comment.
Yea that's fair, I thought the manual one would be too underused to justify it but a clearer API is always better.
There was a problem hiding this comment.
Actually I noticed we wouldn't need to use the manual one at all, it was for the nitro SDK but with the error mechanism customization we don't need it.
Dropping it instead and keeping a one bind____ version, we can add manual variant later.
c02e8e1 to
738418f
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 0e7b31f. Configure here.
| }, | ||
| }; | ||
|
|
||
| handle.channel.subscribe(subscribers); |
There was a problem hiding this comment.
Repeated bind duplicates handlers
High Severity
Each call to bindTracingChannelToSpan adds another channel lifecycle subscription, but only the latest bindStore callback remains. Later traces can invoke every prior error/end/asyncEnd handler for the same payload, causing duplicate captureException calls and repeated beforeSpanEnd work while span.end() may only no-op after the first end.
Reviewed by Cursor Bugbot for commit 0e7b31f. Configure here.
JPeer264
left a comment
There was a problem hiding this comment.
Pretty nice stuff overall.
Just one question about the description. Why exactly is asyncStart not important and is a NOOP?
@JPeer264 Traced sync function
Traced async functionOrder is We mark the start on Traced function that returns a promiseSame lifecycle as the async case. The result lands on So in all three cases |
0e7b31f to
3e93527
Compare
| asyncLocalStorage: unknown; | ||
|
|
||
| /** | ||
| * Activates a span for the tracing channels nested invocations, the return value wi |
| * You can alternatively pass a function that sets the ExclusiveEventHintOrCaptureContext on the captured error. | ||
| * Set `false` for instrumentation that only annotates the span and lets the error be captured at the boundary that owns it (e.g. db spans). | ||
| */ | ||
| captureError?: boolean | ((e: unknown) => ExclusiveEventHintOrCaptureContext); |
There was a problem hiding this comment.
I don't think this should default to true! As usually we do not want to capture these as exceptions always, because a user could handle them, in which case they should not bubble up to sentry. so this should be opt-in, not out IMHO.
There was a problem hiding this comment.
That's fair, I will flip it over right away.
|
👋 @isaacs, @andreiborza — Please review this PR when you get a chance! |
Add `bindTracingChannelToSpan` to generalize Node `diagnostics_channel` TracingChannel instrumentation onto Sentry spans. It binds the active span into async context on `start` and, in the default `auto` lifecycle, ends the span on the canonical terminal event: `end` for synchronous calls (detected via presence of `result`/`error` on the context object) and `asyncEnd` for async calls, recording exceptions on `error`. Backed by a new `getTracingChannelBinding` async context strategy hook wired through core, node-core, opentelemetry, cloudflare, and deno.
…oSpan
`bindTracingChannelToSpan` now returns a `TracingChannelBindingHandle`
`{ channel, unbind }` instead of the channel directly. `unbind()`
unsubscribes the auto lifecycle handlers and unbinds the start store
(idempotent; a no-op when no async context binding is available), so
callers can detach a binding — useful for teardown and test isolation.
…ToSpan + captureError callback `getSpan` may now return `undefined` to opt a channel payload out entirely: nothing is bound, no span is tracked, and the active context is left untouched so nested operations keep parenting to the enclosing span. This lets a single channel carry events that should reuse the enclosing span (e.g. an agent loop's per-step events) without ending it prematurely. The `error` handler likewise no-ops when no span was bound. Also folds in the `captureError` callback form: pass a function to set the ExclusiveEventHintOrCaptureContext on the captured error, so integrations can supply their own mechanism. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the manual, no-lifecycle `bindTracingChannelToSpan` from the public surface and give its name to the lifecycle-managing variant (formerly `bindTracingChannelToSpanWithLifeCycle`). The bare bind-only logic lives on as a private `bindSpanToChannelStore` helper that the public function builds on. Every consumer (redis, nitro, the planned vercel-ai v7 subscriber) wants lifecycle management; nothing called the manual variant directly, so it was only API surface to maintain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
3e93527 to
4a4adb2
Compare
…AL_createTracingChannelBinding Collapse the byte-identical getStoreWithActiveSpan binding in the node-core light, cloudflare, and deno AsyncLocalStorage strategies into a single core factory. The ALS instance stays an opaque arg so node:async_hooks never leaks into browser-targeted core. The OTel strategy keeps its own binding since its store is an OTel context rather than a scope pair.


This abstracts away our tracing channels consumption into a utility that:
This util can be consumed uniformly across all of our server-side runtimes, I will create a stacked PR that refactors our code to use it, showcasing Nitro, Redis, and Deno adoption of that util.
How Does it work
By adding a
TracingChannelBindinggetters on our AsyncContextStrategy (ACS) implementations that implements a pair of fields:The exact return type doesn't matter as long as the ACS can use it to properly parent scopes and nested async contexts. We have two implementations so far:
{ scope, isolationScope }pair.To illustrate this more, we can walk the round-trip:
getStoreWithActiveSpan(span)clones the current scope, returns{ scope, isolationScope }.bindTracingChannelToSpantakes that opaque value and hands it tochannel.start.bindStore. Under the hood the channel does `asyncLocalStorage.run(producedValue, () => …the traced operation…). The courier never checks the value, it just drops it into the right ALS.getScopes()which isasyncStorage.getStore()will get the ALS instance the courier bound into. SogetCurrentScope(),getIsolationScope(), andgetActiveSpan()all resolve to the scope carrying our span. Any child span started in that operation parents to it. That's the nesting.So there are three parties, not two:
So it's pretty type safe given the courier never checks the value, which it shouldn't because there is no guarantee what the ACS has put on it. That's also why the OTel strategy can return a Context instead.
flowchart TB subgraph Strategy["async context strategy (per-runtime impl — owns the store shape)"] Producer["PRODUCE<br/>getStoreWithActiveSpan(span)<br/>clone scope + plant span<br/>→ ALS impl: { scope, isolationScope }<br/>→ OTel impl: Context"] Reader["READ<br/>getScopes() / getCurrentScope()<br/>getActiveSpan()<br/>reads store back out"] end ALS["AsyncLocalStorage (binding.asyncLocalStorage)<br/>the shared pipe — same instance on both sides"] Courier["COURIER — bindTracingChannelToSpan<br/>strategy-blind, never inspects (TValue = unknown)<br/>channel.start.bindStore(binding.asyncLocalStorage, data ⇒ getStoreWithActiveSpan(span))"] Op["traced async operation<br/>start child span → parents to bound span ✓"] Producer -->|"produced store"| ALS Courier -->|"als.run(store, op)"| ALS ALS -->|"als.getStore()"| Op Op -.->|"getCurrentScope / getActiveSpan"| Reader Reader -.->|"resolves to scope carrying our span"| OpLifecycle Management
After testing across several node releases (20 -> 26), I locked down the strategy that we can rely on to make this reliable and predictable.
startlifecycle event, this is emitted always and is reliable.errorFor capturing exceptions and setting error span status, always reliable.endis emitted:erroris present on the context object, end the span immediatly, there won’t be any async events coming up, the function threw in the sync part, so there is no async part to execute.resultis present on the context object, end the span immediately, there won’t be any async events coming up. The function returned in the sync part or is sync trace itself. UseinorhasOwnProperty, Do not useundefinedchecks as the function can return nothing.asyncStartis emitted: Do nothing, this event has no value for us.asyncEndis emitted: Just end the span.