-
Notifications
You must be signed in to change notification settings - Fork 126
chore(bridge): Refactor TS-Core bridge #1698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -0,0 +1,418 @@ | |||
# Bridge Layer Coding Norms, Best Practices and Pertinent Knowledge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIP - I'm leaving this doc here even though it is incomplete (I'll complete at a later time, post-merge). I think it still adds some value in its present form. Feel free to read and comment if curious, but don't feel obliged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looking good. This is absolutely massive so I couldn't give it super detailed scrutiny but tried to focus on the important looking bits.
|
||
//////////////////////////////////////////////////////////////////////////////////////////////////// | ||
|
||
mod config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these mods should just be put into their own files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, been hesitant, but I think it helps having that in the same file the feature they relate too. If anything, I would move them out of their ::config
modules, into their respective parents. I'll leave it that way for now, that's something that can easily be tuned later.
packages/core-bridge/src/runtime.rs
Outdated
enter_sync!(self); | ||
Ok(BridgeFuture::new(Box::pin(future))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would work better with .instrument()
? I'm surprised this works since the subscriber guard would be dropped immediately after this function exits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The future will execute on one of Tokio’s thread, which are bound with specific tracing contexts immediately after being created.
AFAICT the node thread would not need to access the tracing subscriber once the implementation of the “js_function” returns.
Keeping the guard in scope ensure that the tracing subscriber doesn’t bleed into calls made by other runtimes on the same thread, which has been causing issues in a previous attempt.
* - Always use `null` to indicate an intentionally unspecified optional value | ||
* in TypeScript interfaces. This will be converted to `None` on the Rust side. | ||
* - Explicitly set _every_ properties on objects sent to the native code, | ||
* including optional properties (e.g. `{ prop: input.prop ?? null }`). | ||
* - Never use the "optional property" syntax in TypeScript (i.e. `prop?: T`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
|
||
@SymbolBasedInstanceOfError('ServiceError') | ||
export class ServiceError extends Error implements grpc.ServiceError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would ideally merge TransportError and ServiceError, they're different representations of the tonic error.
You can remove this new error type and add the new properties to TransportError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was too lazy to do this as part of my PR but with your from
and into
helpers, it should be fairly straightforward.
packages/worker/src/connection.ts
Outdated
*/ | ||
async withAbortSignal<ReturnType>(abortSignal: AbortSignal, fn: () => Promise<ReturnType>): Promise<ReturnType> { | ||
const cc = this.callContextStorage.getStore(); | ||
return await this.callContextStorage.run({ ...cc, abortSignal }, fn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that this implementation is broken. if there's already an abortSignal on the context, it should be chained with the newly provided one. The implementation is broken in the client package too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, we completely missed that indeed. And probably most devs using abort signals did too.
I'll note for the moment, but won't change the semantic in this PR.
What changed
This is a major refactor of the Core Bridge layer. It strives to reduce complexity, and increase reliability and maintainability of the TS SDK bridge layer, as well as realign it with how more recent Core-based SDKs have been designed. More importantly, this refactor is a stepping stone to increase velocity of further development on the TS SDK, introducing the structure and abstractions required to efficiently implement several upcoming SDK features.
Here are some of the most significant changes introduced by this refactor:
Bridge APIs (i.e. those define in
@temporalio/core-bridge
) now adhere to much stricter rules.In short, types defined by the bridge are now designed as Data Transfer Objects.
Bridge types now align on Core SDK's APIs, rather than corresponding user-facing APIs). That moves concerns such as data conversion, validation, default value handling, type restructuring, etc, to the TS side, rather than relying on unclear and inconsistent expectations between TS and Rust.
All properties on bridge APIs are now non-optional (e.g. can't define a property as
someProp?: someType
);null
(rather thanundefined
) is used to indicate a value that is intentionally left unset (i.e. the equivalent ofNone
in Rust).These new rules ensure that most incoherencies in type definitions between TS and Rust will be caught early, avoiding some recurrent error patterns we have had previously in TS SDK. Several such bugs were in fact uncovered and fixed while working on this PR.
These rules also make the Rust code much simpler, moving complexity back to the TS side, where we have better context and better ability to implement more complex business logic and report back errors. That makes the bridge code more approachable by TS-first developpers, easier to maintain and keep in sync with evolution of the Core SDK, and much more reviewable by our engineers.
Most Rust side APIs (including entrypoint functions and most DTO type definitions) are now defined using macros, which avoid explicit handling of type conversion. That previously resulted in extremely verbose code, drowning actual business logic in data access and conversion plumbing.
Redresses and documents ownership rules for Rust-to-JS objects. The "Opaque Handle" abstraction introduced by this PR provides a clear yet simple to use ownership model for Runtime, Client, Worker, Ephemeral Servers and other native resources that are handed down to the JS world. This model notably guarantees proper clean up of resources, even when they are mistakenly dropped and garbage collected from the JS side (e.g. omitting to call
close
orshutdown
on a resource), or dropped on the Rust side due to unexpected errors.This fixes multiple cases of lifecycle issues that were causing process hang on shutdown. Fixes [Feature Request] Warn when runtime can't shutdown due left over
NativeConnection
#1413 and [Bug] Failure to start ephemeral server prevents shutdown of the process #1443.The bridge now properly handle cases where multiple Runtime coexists. We do not encourage that practice at the moment, and the Runtime class do not provide any API to intentionally do so, but there were previously scenarios that could result in such situations,
Bridge APIs now schedule async tasks directly to the Tokio runtime. The bridge itself no longer need event dispatching threads. This considerably simplify the effort requried to add new features to the SDK, and makes the code more approachable for new developers.
API calls into Rust now return actual JS
Promise
objects, rather than expecting TS side use ofpromissify
and callback functions. Those Promise are backed by RustFuture
.This is a huge benefit from a reliability perspective, as we can now guarantee that a
Promise
will be properly rejected if it is dropped by the Rust without having been settled. This could previously result in various unexpected behaviors including process hangs and suddent process termination. Fixes [Bug] Node discards promises created by calling into Core Bridge with callback #1302.Thanks to the
BridgeResult
,BridgeError
andTryIntoJs
types, async implementation of bridge APIs can directly return appropriate values or errors, without the need for implementer to explicitly handle conversion back into JS values (which can only be done on the Node thread, making this kind of pattern quite messy and verbose to implement).BridgeResult
andBridgeError
also ensure proper handling of Neon'sThrow
errors, indicating that the Node thread is currently in "throw state". This particular state makes it difficult to properly handle errors in Rust/Node code, as any attempt to use the JS Context once it has fall into "throw state" will result in a panic.BridgeResult
also restore the functionality of the Rust's?
suffix operator, by transparently encapsulating Neon'sThrow
intoBridgeError
, and back intoThrow
once theBridgeResult
surfaces back in place from where it can be handled.Introduced other well-defined abstractions for some recurring needs, such as sync and async callback functions into JS, and
AbortController
. Those were previously reimplemented at each call-site.Bridge APIs are now clearly marked as internal only.
Made it very clear that any API defined in the
@temporalio/core-bridge
package is internal APIs, and offers absolutely no stability guarantee. APIs meant for public usage must be defined and exposed by user facing packages exclusively (i.e.@temporalio/worker
and@temporalio/testing
). Reexporting bridge APIs from public facing APIs is also not allowed.Public APIs that were previously exported by
@temporalio/core-bridge
have all been moved to user facing packages. Documentation has been added where missing on public types. Bridge types no longer have typedoc themselves as that would be redundant and unhelpful.Some other note worthy points:
Logs forwarding from Rust to JS is now implemented using a push model rather than pull model. All other Core-based SDKs were already using push-based log forwarding.
The runtime now ensure proper cleanup of the Otel metrics exporter; this could previously cause tons of error messages on runtime shutdown. Fixes [Bug] Reenable OTLP export tests #1495.
Adds support of custom metric bucket sizes. Fixes [Feature Request] Custom metric bucket sizes #1634.