Skip to content

Fix cursor set race condition #339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mattheworiordan
Copy link
Member

@mattheworiordan mattheworiordan commented Jun 5, 2025

When calling cursors.set, before the ::$cursors channel is attached, the set calls fail silently and positions are not emitted. This PR fixes that.

Warning:

This PR cannot be merged because the Spaces SDK has too many outdated npm modules causing the npm audit step to fail, output below:

Run npm audit --production
npm warn config production Use `--omit=dev` instead.
# npm audit report
nanoid  <3.3.8
Severity: moderate
Predictable results in nanoid generation when given non-integer values - https://github.com/advisories/GHSA-mwcw-c2x[4](https://github.com/ably/spaces/actions/runs/15477364779/job/43575926597?pr=339#step:5:5)-8c55
fix available via `npm audit fix`

npm audit fix does not work, and fixing the dependencies is outside the scope of this PR. There are currently 12 open PRs auto generated to update dependencies that have not been merged either.

Note:

  • After many hours of painfully trying to get the CLI tests to pass, I discovered the issue was not with the tests or the CLI, but the Spaces SDK!
  • It's a little surprising/bad that setting cursor positions is both a) not using async/await, b) has no means to fail given it just sticks the position onto a queue. Whilst I appreciate cursor positions are indeed lower QoS, obvious failures should still propagate back to the caller.
  • It seems this bug was in fact known, and in the integration tests that failed, instead of fixing the issue, a workaround in the integration test itself was injected. See https://github.com/ably/spaces/blame/main/test/integration/integration.test.ts#L169-L185, introduced in a2b0d08.
  • Seeing that none of the tests are e2e was a bit of a surprise! I suspect stuff like this would have been more obvious with e2e tests. Some high level e2e tests should be added at some point.

@github-actions github-actions bot temporarily deployed to staging/pull/339/typedoc June 5, 2025 21:11 Inactive
@mattheworiordan mattheworiordan requested a review from VeskeR June 5, 2025 21:11
When calling cursors.set, before the ::$cursors channel is attached, the set calls fail silently and positions are not emitted. This PR fixes that.

Note:

- After many hours of painfully trying to get the CLI tests to pass, I discovered the issue was not with the tests or the CLI, but the Spaces SDK!
- It's a little disappointing that setting cursor positions is both a) not using async/await, b) has no means to fail given it just sticks the position onto a queue. Whilst I appreciate cursor positions are indeed lower QoS, obvious failures should still propagate back to the caller.
- Seeing that none of the tests are e2e was a bit of a surprise! I suspect stuff like this would have been more obvious with e2e tests. Some high level e2e tests should be added at some point.
@mattheworiordan mattheworiordan force-pushed the fix-cursor-set-race-condition branch from b1bfd39 to 2bae929 Compare June 5, 2025 21:14
mattheworiordan added a commit to ably/cli that referenced this pull request Jun 6, 2025
Includes a workaround for a bug in Spaces, see ably/spaces#339
@lawrence-forooghian
Copy link
Collaborator

I haven't looked at the PR yet but I was curious about what led me to introduce this workaround in the tests without fixing it or having a follow-up issue. Found #229 (comment) from the original PR, in which Dom said that this was an "expected behaviour". (I think I was pretty new to Spaces at the time so just took him at his word.)

@mattheworiordan
Copy link
Member Author

in which Dom said that this was an "expected behaviour". (I think I was pretty new to Spaces at the time so just took him at his word.)

Ok, thanks for explaining. I suppose it's expected if the API intentionally discards messages, but I'm not sure why that would be the intended behaviour :)

Copy link
Collaborator

@ttypic ttypic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattheworiordan thanks for the update! But I'm a bit concerned about the queuing behavior introduced here. Cursor updates are meant to reflect live activity, and replaying past positions when a channel attaches might cause unexpected visual artifacts or flickering, especially if multiple updates are queued.

In most cases, if we haven't attached to the channel, it's probably best to just drop those updates rather than store and replay them later. I would avoid sending old cursor positions unless we have a strong use case that justifies it.

Curious to hear your thoughts, happy to discuss further!

@mattheworiordan
Copy link
Member Author

Cursor updates are meant to reflect live activity

Ok, but how is this any different to a user publishing a set of positions when their connection has congestion / is disconnected abruptly. We do exactly the same thing at a protocol and SDK level and queue up publishes once they have been accepted by the SDK.

when a channel attaches might cause unexpected visual artifacts or flickering, especially if multiple updates are queued.

Please explain this to me. If I move my cursor 10 times, and I am not yet attached to a channel OR I am attached to a channel but my connection is stalled, when the connectivity resumes you will see the mouse move quickly and immediately reflect the latest position. Why do you think we should treat attaching a channel differently to congestion on the connection, and if this is an issue, surely the only solution we should be advocating for is to improve how cursors are presented with batching & conflation on the channel and in the receiving end?

In most cases, if we haven't attached to the channel, it's probably best to just drop those updates rather than store and replay them later

Please explain this. If a user sends a position, why would we not send it? Why is the fact that the channel is not attached relevant to the user when the user has no control over the attachment state?

Key point here is a) we expose no API to manage channel state for the cursors channel, b) we chose a design that hides this from the user because it's an internal details, c) if a user sends an update we should send it IMHO and not decide that in some situations it's OK but in others it's not - that's just arbitrary to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants