Skip to content

RUST-1443 Stop executing server monitors after the server has been closed (2.3 backport) #733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Sep 12, 2022

Conversation

patrickfreed
Copy link
Contributor

RUST-1443

This PR fixes a bug where server monitors would continue executing even after a server had been removed from the topology.

/// topology.
#[cfg_attr(feature = "tokio-runtime", tokio::test(flavor = "multi_thread"))]
#[cfg_attr(feature = "async-std-runtime", async_std::test)]
async fn removed_server_monitor_stops() -> crate::error::Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will need to be forward-ported to the main branch once RUST-360 lands.

@patrickfreed patrickfreed marked this pull request as ready for review September 1, 2022 18:44
@patrickfreed patrickfreed requested review from a team and abr-egn and removed request for a team September 1, 2022 18:44
Copy link
Contributor

@abr-egn abr-egn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SDAM stuff can be subtle, so to make sure I'm following what's going on:

  • Monitor's loop condition changed from topology liveness to server liveness (this is the core fix).
  • To enable is_alive for polling in the above, WorkerHandle[Listener] changed to use watch::Receiver.
  • TopologyWorker pairs WorkerHandle with Server in a map of MonitoredServers; dropping those values is what will terminate the worker.
  • The server map is maintained in sync_hosts;
    • sync_hosts used to take a &mut TopologyState to update, but that state is now rolled directly into TopologyWorker.

Is that a reasonable summary?

pub(crate) added_addresses: HashSet<&'a ServerAddress>,
pub(crate) changed_servers:
HashMap<&'a ServerAddress, (&'a ServerDescription, &'a ServerDescription)>,
pub(crate) removed_addresses: Vec<&'a ServerAddress>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the change to Vec purely to allow sorting in tests? It looks like they're still "morally" sets, i.e. there won't be any duplicate keys. If so, it might be clearer to do the sorting when emitting the events in TopologyWorker::process_topology_diff and leave these as HashSet/Map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done.

@patrickfreed
Copy link
Contributor Author

Yep, that's all exactly right. The change from using the TopologyState that was stored in the watch::channel to using one stored on the worker was to avoid a reference cycle: receivers of the channel (which the monitor has) would keep the MonitoredServer instance alive, leading to the monitor never exiting.

Copy link
Contributor

@isabelatkinson isabelatkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one nit but otherwise looks good!

src/sdam/test.rs Outdated
.sdam_event_handler(handler.clone() as Arc<dyn SdamEventHandler>)
.repl_set_name("foo".to_string())
.build();
options.heartbeat_freq = Some(Duration::from_millis(50));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this assignment necessary? looks like heartbeat_freq is already set by the builder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, fixed

@patrickfreed patrickfreed changed the title RUST-1443 Stop executing server monitors after the server has been closed RUST-1443 Stop executing server monitors after the server has been closed (2.3 backport) Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants