Skip to content

feat(pyth-lazer-stk): correctly handle connection drops + other improvements #2737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 28, 2025

Conversation

keyvankhademi
Copy link
Contributor

@keyvankhademi keyvankhademi commented May 27, 2025

Summary

This PR refactors the PythLazerClient and the underlying components, WebSocketPool and ResilientWebSocket.

  • When connecting initially it avoids crashing and retries until it can connect to at east one url
  • When a connection drops, it avoids crashing and keep retrying the connection
  • The config has been improved to get the parameters with hard-coded defaults. (compared hard coded configs)
  • Unnecessary async functions have been changed to sync
  • Logging has been improved to show important events and avoid spamming
  • Increases the default number of connections to 4 to balance the number of connections per url

Rationale

The current client is broken, it crashed at the beginning if it can't connect to all the urls and if one of the connections drops it also crashes.
This caused some of our used to have downtime, even though only 1 endpoint was not working correctly.

How has this been tested?

  • Current tests cover my changes
  • Added new tests
  • Manually tested the code

Manually tested the code with local routers and tested crashing and recovering routers while monitoring the behavior of the client.

@keyvankhademi keyvankhademi requested a review from a team as a code owner May 27, 2025 21:23
Copy link

vercel bot commented May 27, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
proposals ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 28, 2025 11:32pm
7 Skipped Deployments
Name Status Preview Comments Updated (UTC)
api-reference ⬜️ Skipped (Inspect) May 28, 2025 11:32pm
component-library ⬜️ Skipped (Inspect) May 28, 2025 11:32pm
developer-hub ⬜️ Skipped (Inspect) May 28, 2025 11:32pm
entropy-debugger ⬜️ Skipped (Inspect) May 28, 2025 11:32pm
entropy-explorer ⬜️ Skipped (Inspect) May 28, 2025 11:32pm
insights ⬜️ Skipped (Inspect) May 28, 2025 11:32pm
staking ⬜️ Skipped (Inspect) May 28, 2025 11:32pm

@vercel vercel bot temporarily deployed to Preview – api-reference May 27, 2025 21:39 Inactive
@vercel vercel bot temporarily deployed to Preview – developer-hub May 27, 2025 21:39 Inactive
@vercel vercel bot temporarily deployed to Preview – insights May 27, 2025 21:39 Inactive
@vercel vercel bot temporarily deployed to Preview – staking May 27, 2025 21:39 Inactive
@vercel vercel bot temporarily deployed to Preview – entropy-explorer May 27, 2025 21:39 Inactive
@vercel vercel bot temporarily deployed to Preview – entropy-debugger May 27, 2025 21:39 Inactive
@vercel vercel bot temporarily deployed to Preview – component-library May 27, 2025 21:39 Inactive
@vercel vercel bot temporarily deployed to Preview – component-library May 27, 2025 21:51 Inactive
@vercel vercel bot temporarily deployed to Preview – insights May 27, 2025 21:51 Inactive
@vercel vercel bot temporarily deployed to Preview – entropy-debugger May 27, 2025 21:51 Inactive
@vercel vercel bot temporarily deployed to Preview – developer-hub May 27, 2025 21:51 Inactive
@vercel vercel bot temporarily deployed to Preview – staking May 27, 2025 21:51 Inactive
@vercel vercel bot temporarily deployed to Preview – api-reference May 27, 2025 21:51 Inactive
@vercel vercel bot temporarily deployed to Preview – entropy-explorer May 27, 2025 21:51 Inactive
@@ -47,7 +55,7 @@ client.addAllConnectionsDownListener(() => {
});

// Create and remove one or more subscriptions on the fly
await client.subscribe({
client.subscribe({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember one of the early consumers of the SDK wanted the ability to await the subscribe() (i.e. promise is fulfilled when the lazer server acks the subscription.) Is that no longer a requirement? Personally I think it makes for nicer ergonomics to keep it async.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah good call. I'll check it.

@vercel vercel bot temporarily deployed to Preview – entropy-explorer May 27, 2025 22:14 Inactive
@vercel vercel bot temporarily deployed to Preview – staking May 27, 2025 22:14 Inactive
@vercel vercel bot temporarily deployed to Preview – entropy-debugger May 27, 2025 22:14 Inactive
@vercel vercel bot temporarily deployed to Preview – developer-hub May 27, 2025 22:14 Inactive
@vercel vercel bot temporarily deployed to Preview – api-reference May 27, 2025 22:14 Inactive
@vercel vercel bot temporarily deployed to Preview – component-library May 27, 2025 22:14 Inactive
@vercel vercel bot temporarily deployed to Preview – insights May 27, 2025 22:14 Inactive
Copy link
Collaborator

@cprussin cprussin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some suggestions but overall I think the code is OK to ship, especially given you didn't introduce the class constructs in this PR. However, if you don't do it in this PR, I would recommend a follow up where you restructure things to minimize the class / OOP / mutative / procedural code and lean in to enumerating states, functional patterns, designing the interactions as a state machine, etc. I think you'll result in simpler, more easily validated & tested code, better tree-shakeability, stricter typing, etc.

logger: Logger = dummyLogger,
): Promise<PythLazerClient> {
const wsp = await WebSocketPool.create(urls, token, numConnections, logger);
static async create(config: WebSocketPoolConfig): Promise<PythLazerClient> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of a nit, and not something you added here, but for future reference IMO static methods are an antipattern. Unless you have a good reason why you really need the BinaryResponse.create syntax, this is equivalent to just exporting a function createBinaryResponse that isn't scoped to the class. There are lots of downsides to using the static method version:

  1. Classes are not tree-shakeable
  2. It implies the code is more OOP than it actually is -- es6 classes aren't Java classes and I don't believe it's healthy to use patterns that make them look like they are.
  3. It subtly discourages functional coding patterns in favor of procedural code (by nature of encouraging thinking about code in Java OOP terms)
  4. In Javascript, IMO, modules are the best namespacing construct. Classes should be reserved for state management. This pattern abuses those constructs and turns classes into a namespacing tool instead.


const HEARTBEAT_TIMEOUT_DURATION = 10_000;
const CONNECTION_TIMEOUT = 5000;
const DEFAULT_HEARTBEAT_TIMEOUT_DURATION_MS = 5000; // 5 seconds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor but whenever I have consts like this, I like to construct them so they're super obvious:

const ONE_SECOND_IN_MS = 1000;
const DEFAULT_HEARTBEAT_TIMEOUT_DURATION_IN_MS = 5 * ONE_SECOND_IN_MS;

logAfterRetryCount?: number;
};

export class ResilientWebSocket {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General advice I have (which we've discussed in the past) is to use es6 classes sparingly and only when really needed, and to only use them for managing state. There are better, more functional constructs for everything else. When you use es6 clasess, I strongly advise building the minimum necessary state tracking in a small contained class, and then wrapping that in functional constructs.

In this case, it may be valuable to see if you can reduce this class to a smaller surface area that just handles specific pieces of mutable state, rather than putting the entire resilient websocket implementation in one big class. It looks like the majority of the class is immutable; you really only need to use mutable state for tracking the connection state as far as I can tell.

Another point here is that for finite variations, I recommend using enumerations and tagged unions to define different states. Rather than having a number of mutable fields on a class, using an enumeration of possible states enables a much greater degree of type safety, reduces the amount of mutability happening, and IMO results is much clearer code. I have a very common pattern I follow for defining tagged unions for this kind of thing, see e.g. https://github.com/pyth-network/pyth-crosschain/blob/main/apps/staking/src/hooks/use-async.ts#L28-L45.

In this case, what I'd love to see is an enumeration of possible websocket states, a tagged union of constructors for values that represent those states, and a state machine to transition between possible states.

You can even add functions to the different state constructors, which allows you to do things like statically guarantee that send is not called on a websocket which is not connected (i.e. you wouldn't have a send function defined in the disconnected state). For a good example of this, see use-api.ts from the staking app, where we implemented exactly this pattern.


this.wsFailedAttempts = 0;
this.onError = (error: ErrorEvent) => {
this.logger?.error(error.error);
void error;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually recommend against using void; it's implicit and unclear what the actual intent is -- I'm actually not even sure why it's here and what the intent is TBH; I'm guessing it's to avoid the error about an unused argument and if that's the case, I'd just put in an explicit eslint ignore OR just make this an argument-free function, which should still typecheck correctly.

For context, probably the most common place I see void is when it's used to silence eslint errors about handling promise errors, i.e.

void doSomethingThatReturnsAPromise();

which tells eslint you don't care if errors happen. In these cases, I will always reach for a no-op function instead since I find it way more obvious and explicit about the intent:

doSomethingThatReturnsAPromise.catch((error: unknown) => { /* no-op on error */ })

} else {
this.logger.warn(
`WebSocket to ${this.endpoint} is not connected. Cannot send message.`,
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to queue these and send them on connect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are doing this in onReconnect function

}

// Wait for all connections to be established
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:mild-panic:

Copy link
Collaborator

@ali-behjati ali-behjati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much!

@vercel vercel bot temporarily deployed to Preview – developer-hub May 28, 2025 23:30 Inactive
@vercel vercel bot temporarily deployed to Preview – staking May 28, 2025 23:30 Inactive
@vercel vercel bot temporarily deployed to Preview – insights May 28, 2025 23:30 Inactive
@vercel vercel bot temporarily deployed to Preview – component-library May 28, 2025 23:30 Inactive
@vercel vercel bot temporarily deployed to Preview – entropy-explorer May 28, 2025 23:30 Inactive
@vercel vercel bot temporarily deployed to Preview – entropy-debugger May 28, 2025 23:30 Inactive
@vercel vercel bot temporarily deployed to Preview – api-reference May 28, 2025 23:30 Inactive
@keyvankhademi keyvankhademi merged commit b8ae225 into main May 28, 2025
12 of 13 checks passed
@keyvankhademi keyvankhademi deleted the lazer-js-sdk branch May 28, 2025 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants