-
Notifications
You must be signed in to change notification settings - Fork 260
feat(pyth-lazer-stk): correctly handle connection drops + other improvements #2737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
7 Skipped Deployments
|
@@ -47,7 +55,7 @@ client.addAllConnectionsDownListener(() => { | |||
}); | |||
|
|||
// Create and remove one or more subscriptions on the fly | |||
await client.subscribe({ | |||
client.subscribe({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember one of the early consumers of the SDK wanted the ability to await
the subscribe()
(i.e. promise is fulfilled when the lazer server acks the subscription.) Is that no longer a requirement? Personally I think it makes for nicer ergonomics to keep it async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah good call. I'll check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some suggestions but overall I think the code is OK to ship, especially given you didn't introduce the class constructs in this PR. However, if you don't do it in this PR, I would recommend a follow up where you restructure things to minimize the class / OOP / mutative / procedural code and lean in to enumerating states, functional patterns, designing the interactions as a state machine, etc. I think you'll result in simpler, more easily validated & tested code, better tree-shakeability, stricter typing, etc.
logger: Logger = dummyLogger, | ||
): Promise<PythLazerClient> { | ||
const wsp = await WebSocketPool.create(urls, token, numConnections, logger); | ||
static async create(config: WebSocketPoolConfig): Promise<PythLazerClient> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of a nit, and not something you added here, but for future reference IMO static methods are an antipattern. Unless you have a good reason why you really need the BinaryResponse.create
syntax, this is equivalent to just exporting a function createBinaryResponse
that isn't scoped to the class. There are lots of downsides to using the static method version:
- Classes are not tree-shakeable
- It implies the code is more OOP than it actually is -- es6 classes aren't Java classes and I don't believe it's healthy to use patterns that make them look like they are.
- It subtly discourages functional coding patterns in favor of procedural code (by nature of encouraging thinking about code in Java OOP terms)
- In Javascript, IMO, modules are the best namespacing construct. Classes should be reserved for state management. This pattern abuses those constructs and turns classes into a namespacing tool instead.
|
||
const HEARTBEAT_TIMEOUT_DURATION = 10_000; | ||
const CONNECTION_TIMEOUT = 5000; | ||
const DEFAULT_HEARTBEAT_TIMEOUT_DURATION_MS = 5000; // 5 seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor but whenever I have consts like this, I like to construct them so they're super obvious:
const ONE_SECOND_IN_MS = 1000;
const DEFAULT_HEARTBEAT_TIMEOUT_DURATION_IN_MS = 5 * ONE_SECOND_IN_MS;
logAfterRetryCount?: number; | ||
}; | ||
|
||
export class ResilientWebSocket { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General advice I have (which we've discussed in the past) is to use es6 classes sparingly and only when really needed, and to only use them for managing state. There are better, more functional constructs for everything else. When you use es6 clasess, I strongly advise building the minimum necessary state tracking in a small contained class, and then wrapping that in functional constructs.
In this case, it may be valuable to see if you can reduce this class to a smaller surface area that just handles specific pieces of mutable state, rather than putting the entire resilient websocket implementation in one big class. It looks like the majority of the class is immutable; you really only need to use mutable state for tracking the connection state as far as I can tell.
Another point here is that for finite variations, I recommend using enumerations and tagged unions to define different states. Rather than having a number of mutable fields on a class, using an enumeration of possible states enables a much greater degree of type safety, reduces the amount of mutability happening, and IMO results is much clearer code. I have a very common pattern I follow for defining tagged unions for this kind of thing, see e.g. https://github.com/pyth-network/pyth-crosschain/blob/main/apps/staking/src/hooks/use-async.ts#L28-L45.
In this case, what I'd love to see is an enumeration of possible websocket states, a tagged union of constructors for values that represent those states, and a state machine to transition between possible states.
You can even add functions to the different state constructors, which allows you to do things like statically guarantee that send
is not called on a websocket which is not connected (i.e. you wouldn't have a send
function defined in the disconnected state). For a good example of this, see use-api.ts from the staking app, where we implemented exactly this pattern.
|
||
this.wsFailedAttempts = 0; | ||
this.onError = (error: ErrorEvent) => { | ||
this.logger?.error(error.error); | ||
void error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually recommend against using void
; it's implicit and unclear what the actual intent is -- I'm actually not even sure why it's here and what the intent is TBH; I'm guessing it's to avoid the error about an unused argument and if that's the case, I'd just put in an explicit eslint ignore OR just make this an argument-free function, which should still typecheck correctly.
For context, probably the most common place I see void
is when it's used to silence eslint errors about handling promise errors, i.e.
void doSomethingThatReturnsAPromise();
which tells eslint you don't care if errors happen. In these cases, I will always reach for a no-op function instead since I find it way more obvious and explicit about the intent:
doSomethingThatReturnsAPromise.catch((error: unknown) => { /* no-op on error */ })
} else { | ||
this.logger.warn( | ||
`WebSocket to ${this.endpoint} is not connected. Cannot send message.`, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to queue these and send them on connect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are doing this in onReconnect
function
} | ||
|
||
// Wait for all connections to be established |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:mild-panic:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much!
Summary
This PR refactors the PythLazerClient and the underlying components, WebSocketPool and ResilientWebSocket.
Rationale
The current client is broken, it crashed at the beginning if it can't connect to all the urls and if one of the connections drops it also crashes.
This caused some of our used to have downtime, even though only 1 endpoint was not working correctly.
How has this been tested?
Manually tested the code with local routers and tested crashing and recovering routers while monitoring the behavior of the client.