Skip to content

RUST-1587 Implement server selection tracing events #805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Feb 2, 2023

Conversation

kmahar
Copy link
Contributor

@kmahar kmahar commented Jan 12, 2023

This implements the server selection tracing/log events/messages as defined in mongodb/specifications#1341.

There is a workaround in here for the fact that we still currently perform server selection to detect sessions support; logging messages in those cases makes these tests flaky. That flakiness is why the spec PR hasn't been merged yet, though I expect for any driver that doesn't log on server selection attempts for sessions support the tests will pass.

We could merge this and remove the workaround as part of RUST-1499, or alternatively we can wait to merge until that work is in; I don't have a strong opinion either way.

Will leave some more comments inline.

@@ -11,7 +11,8 @@ use crate::{

/// A description of the most up-to-date information known about a topology. Further details can
/// be found in the [Server Discovery and Monitoring specification](https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst).
#[derive(Clone)]
#[derive(Clone, derive_more::Display)]
#[display(fmt = "{}", description)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this just means that the output will be the Display impl for description. I took advantage of derive_more::Display in a couple places in this PR for the sake of brevity.

also, it was an intentional choice to leave Display prefixed with derive_more rather than importing it throughout this PR. I found myself confused other places in the code base that we were using an imported derive_more because it is easy to mistake it for the std trait with the same name.

@@ -102,38 +103,3 @@ impl fmt::Debug for TopologyDescription {
.finish()
}
}

impl fmt::Display for TopologyDescription {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this implementation was combined with the existing and more bare-bones implementation of Display on the internal crate::sdam::TopologyDescription type, so that this logic can be reused for the tracing messages, where we only have the internal type available.

@@ -25,7 +25,7 @@ use crate::{
use self::server_selection::IDLE_WRITE_PERIOD;

/// The possible types for a topology.
#[derive(Debug, Clone, Copy, Eq, PartialEq, Deserialize, Serialize)]
#[derive(Debug, Clone, Copy, Eq, PartialEq, Deserialize, Serialize, derive_more::Display)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seemed worth adding Display here for use in TopologyDescription's Display impl.

write!(f, "{{ Type: {:?}, Servers: [ ", self.topology_type)?;
for server_info in self.servers.values().map(ServerInfo::new_borrowed) {
write!(f, "{}, ", server_info)?;
fn fmt(&self, f: &mut fmt::Formatter) -> std::result::Result<(), fmt::Error> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic is largely copied over from the event::sdam::TopologyDescription implementation.

@@ -13,18 +13,20 @@ use crate::{
};

/// Describes which servers are suitable for a given operation.
#[derive(Clone, Derivative)]
#[derive(Clone, Derivative, derive_more::Display)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see unit tests in src/test/spec/trace.rs to see how the SelectionCriteria gets stringified by this logic.

@@ -128,13 +128,16 @@ impl TestCursor {

impl ClientEntity {
pub(crate) fn new(
client: Client,
client_options: ClientOptions,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to the issue we discussed in standup/slack, we just pass the options here now so we can make sure the observer exists first.

@@ -281,7 +280,7 @@ impl TestRunner {
self.sync_workers().await;

let all_tracing_events = tracing_subscriber
.collect_events(Duration::from_millis(500), |_| true)
.collect_events(Duration::from_millis(1000), |_| true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was running into some cases on macOS sharded clusters with SSL/auth/compression where we occasionally wouldn't see all of the tracing events within 500ms. increasing this fixed that

type Representation;

fn tracing_representation(self) -> Self::Representation;
fn tracing_representation(&self) -> Self::Representation;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only implementation of this trait that needed to take ownership of the value was for bson::Document. to save a bunch of unneeded clones of e.g. the topology description, I opted to update this to take self by reference and to remove the implementation of the trait for bson::Document. all document serialization is already currently handled via serialize_command_or_reply so it seemed OK to not rely on the trait and just serialize directly in that method.

@kmahar kmahar marked this pull request as ready for review January 12, 2023 21:01
@kmahar kmahar requested review from a team, JamesKovacs and abr-egn and removed request for a team and JamesKovacs January 12, 2023 21:01
@kmahar kmahar changed the title RUST-1495 Implement server selection tracing event RUST-1495 Implement server selection tracing events Jan 12, 2023
.description
.server_selection_timeout_error_message(criteria),
);
match result {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really unfortunate how much noise this adds to the function - can logging be factored out to helper methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great idea, done. I ended up adding a helper type to store the shared info and avoid having to pass it with each message. this matches nicely with our existing XEventEmitter types for CMAP and command tracing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

@kmahar kmahar changed the title RUST-1495 Implement server selection tracing events RUST-1587 Implement server selection tracing events Jan 31, 2023
Copy link
Contributor

@abr-egn abr-egn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a few comments that don't require re-review.

@@ -480,13 +481,18 @@ impl Client {
&self,
criteria: Option<&SelectionCriteria>,
) -> Result<ServerAddress> {
let server = self.select_server(criteria).await?;
let server = self.select_server(criteria, "test_select_server").await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names given elsewhere are short phrases rather than a snake_case tag, might be better to be consistent one way or the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to "Test select server" for consistency.

.description
.server_selection_timeout_error_message(criteria),
);
match result {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

) {
if trace_or_log_enabled!(target: SERVER_SELECTION_TRACING_EVENT_TARGET, TracingOrLogLevel::Debug)
// TODO: RUST-1585 Remove this condition.
&& self.operation_name != "Check sessions support status"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these values are being matched on, they should be named constants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. since the constant needs to be used unconditional of the tracing feature (in get_session_support_status), this couldn't be hidden away in the trace module. I ended up putting it in client/mod.rs for now.

@kmahar
Copy link
Contributor Author

kmahar commented Feb 1, 2023

pushed one more small change: I noticed that the other event emitter types were named XTracingEventEmitter but this one was missing the Tracing bit so I've renamed the type.

Copy link
Contributor

@isabelatkinson isabelatkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@kmahar kmahar merged commit fa952f4 into mongodb:main Feb 2, 2023
@kmahar kmahar deleted the RUST-1495/server-selection-tracing branch February 2, 2023 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants