Skip to content

Conversation

@Shiranuit
Copy link
Contributor

@Shiranuit Shiranuit commented Jul 15, 2021

What does this PR do ?

This PR fixes a Race Condition that was occuring within the Heartbeat causing the SDK to think the connection was lost and making him engage the reconnection protocol which was causing the SDK to create multiple Websocket connection even though the previous connections were fine.

This is a serious problem that could cause applications to be overloaded with Websocket connections and can saturate the Event Loop or the limit of Websocket connections defined by the browsers.

What has been done:

  • Avoid creating a Timeout in the the ping Interval since they had the same amount of time before execution and that caused a race condition, we now only have an Interval that checks if there has been a pong since the last cycle, if not throw an Error, if there was a pong since the last ping cycle do another ping.
  • Correct tests to verify the new behaviour
  • Fixed: the ping Interval was not cleared when websocket.close was called
  • Ensures that the websocket connection is closed on timeout before engaging the reconnection protocol otherwise if this was a false positive because Kuzzle could not respond in time we would have 2 connection opened.
  • Ensures that the previous interval is cleared before creating a new one

How should this be manually tested

Run this script, there should not be any network error messages, and only one active websocket connection to kuzzle.
Unfortunately this doesn't help testing the Race Condition as it's not easy to reproduce the exact conditions needed, but it somehow occurs frequently in one of our project.

const { Kuzzle, WebSocket } = require('./index');

const kuzzle = new Kuzzle(new WebSocket('localhost'));

async function main() {
    kuzzle.on('networkError', (...args) => {
        console.log(...args);
    });
    for (let i  = 0; i < 10; i++) {
        await kuzzle.disconnect();
        await kuzzle.connect();
    }
}

main();

@codecov
Copy link

codecov bot commented Jul 15, 2021

Codecov Report

Merging #651 (2e154fa) into 7-dev (ea54b9f) will increase coverage by 0.38%.
The diff coverage is 61.90%.

Impacted file tree graph

@@            Coverage Diff             @@
##            7-dev     #651      +/-   ##
==========================================
+ Coverage   85.62%   86.00%   +0.38%     
==========================================
  Files          36       36              
  Lines        1635     1644       +9     
  Branches      297      298       +1     
==========================================
+ Hits         1400     1414      +14     
+ Misses        175      173       -2     
+ Partials       60       57       -3     
Impacted Files Coverage Δ
src/protocols/WebSocket.ts 78.57% <50.00%> (-2.11%) ⬇️
src/Kuzzle.ts 86.21% <100.00%> (+3.30%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab63916...2e154fa. Read the comment docs.

Comment on lines 121 to 124
if (this.client &&
this.client.readyState === this.client.OPEN &&
!this.waitForPong
) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong coding standards here, the && should be at the beginning

Aschen added 2 commits July 16, 2021 16:53
@Aschen Aschen merged commit 273b8d2 into 7-dev Jul 20, 2021
@Aschen Aschen deleted the hotfix-heartbeat-race-condition branch July 20, 2021 07:30
@Aschen Aschen mentioned this pull request Jul 21, 2021
Aschen added a commit that referenced this pull request Jul 21, 2021
# [7.7.2](https://github.com/kuzzleio/sdk-javascript/releases/tag/7.7.2) (2021-07-21)


#### Bug fixes

- [ [#651](#651) ] Hotfix heartbeat race condition   ([Shiranuit](https://github.com/Shiranuit))
- [ [#648](#648) ] Fix the Jwt.expired getter wrong comparison of micro timestamp and timestamp   ([robingrandval](https://github.com/robingrandval))
- [ [#646](#646) ] Fix usage of SearchResult.next with HTTP   ([Aschen](https://github.com/Aschen))
- [ [#644](#644) ] Correctly reject aborted queued requests   ([scottinet](https://github.com/scottinet))

#### Enhancements

- [ [#650](#650) ] Add authenticator function used at reconnection   ([Aschen](https://github.com/Aschen))
---
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants