-
Notifications
You must be signed in to change notification settings - Fork 245
DRIVERS-2707 Relax the CMAP documentation requirement for durations in events #1455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.rst
Outdated
Show resolved
Hide resolved
DRIVERS-2707
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
… events Move the rationale to the "Design Rationale" section. DRIVERS-2707
There were conversations in Slack related to this PR, but no consensus was achieved. @baileympearson, we need to move forward here somehow. I am fine with either excluding from 1.a. EXCLUDE the time to synchronously deliver
2.a. INCLUDE the time to synchronously deliver
While it is true that a driver does not provide hard real-time guarantees, the driver is still supposed to do its best to try and meet the specified timeouts. In these examples the driver failed to do its best, because there are ways to implement the functionality such that the user is spared from observing confusing data in executions as specified above. The below options neither have the issues as in the examples above, nor do they have any new issues compared to those examples. Obviously, a driver still does not provide hard real-time guarantees, and a noticeable delay may happen at any point in execution, not only in user code when we synchronously deliver events. But a driver still must do its best, and the options below are better than the options chosen in the examples above: 1.b. EXCLUDE the time to synchronously deliver 2.b. INCLUDE the time to synchronously deliver Both of these options allow a driver to start counting time towards |
As a side note - I don't think it's correct that duration and waitQueueTimeoutMS are related and that they must be aligned. It's already the case that no matter what we do with event delivery, To answer your question from slack (which is in reference to my suggestion that
I interpret the intention behind DRIVERS-2469 as: users have requested a way to determine how long connection checkout takes so they can optimize their application's consumption of driver connection pools. In drivers where events are emitted synchronously, this can impact pool performance. So, any synchronous event process definitely should be included in the new measurement. Standardizing and mandating how drivers handle synchronous emissions of events is bad because:
The simplest way to measure that duration for all drivers is to take the difference between the when drivers construct This doesn't clarify in the spec whether or not the durations of synchronous event emissions should be included or not. However, it does
This also closely matches my intuition of "matching" corresponding checkout started and checkout succeeded events and subtracting timestamps, which I think is how many users might view the new "duration" property as well. And specifically for Node, we actually have a timestamp on every pool event, so a user might actually be confused as to why the Given the options outlined in proceeding comment, I strongly lean towards option 1.b. If we do commit to behavior here, we should test it (probably with an optional prose test for drivers that emit events synchronously).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.rst
Outdated
Show resolved
Hide resolved
Conceptually, event emission is instantaneous, i.e., one may talk about the instant an event is emitted, | ||
and represents the start of an activity of delivering the event to a subscribed user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed clarifying that durations specifically apply to the point in time directly before the event emission. Can we mention that here instead? Whether or not the event emission is instantaneous, slow, async or sync - it makes no matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two terms: emission, delivery. The first one is already used in the spec, the second one is not. When we talked, I said that I am concerned that an implementor may interpret emission as delivery. Interestingly, this is exactly what happened to you when you wrote
Whether or not the event emission is instantaneous, slow, async or sync - it makes no matter.
It's the delivery that may be slow/sync/async (and, as we discussed, I mentioned neither of those words), and can't be instantaneous, because it's an activity. This activity has the beginning and the end. The beginning of the delivery activity is emission.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay - but what we discussed was specifying that we start measuring the duration immediately before emitting an event because we wanted consistency in when drivers started timing checkout durations relative to event emissions. That is not captured in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what we discussed was specifying that we start measuring the duration immediately before emitting an event because we wanted consistency in when drivers started timing checkout durations relative to event emissions
It may be that each of us was hearing/meaning a different thing when we discussed this. At the moment I do not understand what ambiguity (or a different issue) is in the PR text.
Durations are defined as the time elapsed between emitting event X
and event Y
. Emission is defined as the start of the delivery activity; it is instantaneous, i.e., we can talk about event X
being emitted at instant tx
(note how the word "instantaneous" is defined in the spec right away, leaving no ambiguity of what it means). Thus, the duration is ty - tx
.
Could you tell what undesirable implementation you envision that is allowed by the spec as it currently is in this PR, and what wording added to the spec will forbid that implementation? If you will use the word "immediately" or its derivative, could you also define it?1
1 It is usually fine to use the word "immediately" colloquially (we did use it when we talked), but it is not necessary the right way to express an idea precisely. "immediately" is a useful word when we talk about ordering relations (e.g., "in the total order of natural numbers 1 is followed immediately by 2"), but is much less useful when it is used in the temporal meaning, because the latter is vague and seems to be suitable only for informal usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're saying the same thing, I just didn't understand the phrasing. I also wouldn't be surprised if others don't have the same precise understandings of emission and delivery. When I say immediately, I don't really care about the exact moment temporally. I am okay with the wording, now that it has been clarified but would probably prefer something that is a little less dependent on the exact definitions of phrases and words (of which we don't define in the specification).
Perhaps, to avoid ambiguity and misunderstandings that stem from language, we could just update the pseudocode?
connection = Null
record checkout start time
emit ConnectionCheckOutStartedEvent and equivalent log message
try:
enter WaitQueue
wait until at top of wait queue
# Note that in a lock-based implementation of the wait queue would
# only allow one thread in the following block at a time
while connection is Null:
if a connection is available:
while connection is Null and a connection is available:
connection = next available connection
if connection is perished:
close connection
connection = Null
else if totalConnectionCount < maxPoolSize:
if pendingConnectionCount < maxConnecting:
connection = create connection
else:
# this waiting MUST NOT prevent other threads from checking Connections
# back in to the pool.
wait until pendingConnectionCount < maxConnecting or a connection is available
continue
except pool is "closed":
calculate checkout duration
emit ConnectionCheckOutFailedEvent(reason="poolClosed", duration=duration) and equivalent log message
throw PoolClosedError
except pool is "paused":
calculate checkout duration
emit ConnectionCheckOutFailedEvent(reason="connectionError", duration=duration) and equivalent log message
throw PoolClearedError
except timeout:
calculate checkout duration
emit ConnectionCheckOutFailedEvent(reason="timeout", duration=duration) and equivalent log message
throw WaitQueueTimeoutError
finally:
# This must be done in all drivers
leave wait queue
# If the Connection has not been established yet (TCP, TLS,
# handshake, compression, and auth), it must be established
# before it is returned.
# This MUST NOT block other threads from acquiring connections.
if connection state is "pending":
try:
establish connection
except connection establishment error:
calculate checkout duration
emit ConnectionCheckOutFailedEvent(reason="connectionError", duration=duration) and equivalent log message
decrement totalConnectionCount
throw
finally:
decrement pendingConnectionCount
else:
decrement availableConnectionCount
set connection state to "in use"
# If there is no background thread, the pool MUST ensure that
# there are at least minPoolSize total connections.
do asynchronously:
while totalConnectionCount < minPoolSize:
populate the pool with a connection
calculate checkout duration
emit ConnectionCheckedOutEvent(duration=duration) and equivalent log message
return connection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Updated the pseudocode to reflect the added duration
s. Some lines are quite long. I'll split them if you think the code will be more readable that way.
…-and-pooling.rst Co-authored-by: Bailey Pearson <bailey.pearson@gmail.com>
DRIVERS-2707
DRIVERS-2707
@baileympearson, @matthewdale I added you as reviewers because you approved the PR where I introduced the wording that this PR changes.
Please complete the following before merging: