-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mutexes to osquery-go #99
Conversation
6c32805
to
ec27307
Compare
ec27307
to
4361887
Compare
} | ||
} | ||
|
||
type ExtensionManager interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this interface is exported. It seems like it should be an internal interface, since it's really declaring a consumer's expectation for what the server expects a client to look like. But I didn't want to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember any good reason for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you left this in draft, but seems good to merge to me if you don't have further changes that you want to get into the same PR.
} | ||
} | ||
|
||
type ExtensionManager interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember any good reason for this.
Hrm. One caveat I see here, is that I've added mutexes, but not timeouts. Which strikes me as a somewhat dangerous. |
func (c *ExtensionManagerClient) QueryContext(ctx context.Context, sql string) (*osquery.ExtensionResponse, error) { | ||
c.mu.Lock() | ||
defer c.mu.Unlock() | ||
return c.Client.Query(ctx, sql) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had to use c.Client.Query
directly because we wanted to pass cancellable context and the c.Client is public. After this change API is still public, the calls are still not serialized and can be called concurrently. Just have to remember to switch to the new QueryContext API I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part isn't serialized here?
I'm thinking about whether it makes sense to drop the mutexes in favor of a rate.Limiter curious what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part isn't serialized here?
In our code we used c.Client.Query
directly because there were no API with context exposed on ExtensionManagerClient
. With this PR we can switch to QueryContext
and drop our own locking/synchronization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about whether it makes sense to drop the mutexes in favor of a rate.Limiter curious what you think
What property of rate.Limiter would be appealing in this case?
As far as I understand the protocol breaks if there is more than one request in flight. Mutex seems appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning a timeout to the caller. Mutex will hang forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this makes sense: time out the request at the limiter.Wait
if the context.WithTimeout is used, instead of timing out when the previous request is completed and you get the change to grab the mutex lock.
Also, thinking of the cases where we need to limit the query running time: for example, do not run query for longer than a minute. Currently this is only possible with setting the deadline on the context of the query (context.WithTimeout).
In both cases the context timeout would not be exactly the query running time limit though.
One bad (long running) query can cause the pending queries contexts to timeout. By the time the query gets to run, the context might be already in the context deadline exceeded
state.
Thoughts?
Another question: if we time out through the context, does this cancel the query at the protocol level and the server, meaning the guarantee that no response will be sent back for the timed out query, when we already sent the next query request (not familiar with that part)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a year, but I've resurrected this over on #108. I think I've found a better idiom for handling timeouts and context cancelation.
I'm not sure what the underlying thrift protocol does if you cancel the context for a running query, I'm not sure this SDK can do much
// NewClient creates a new client communicating to osquery over the socket at | ||
// the provided path. If resolving the address or connecting to the socket | ||
// fails, this function will error. | ||
func NewClient(path string, timeout time.Duration) (*ExtensionManagerClient, error) { | ||
trans, err := transport.Open(path, timeout) | ||
func NewClient(path string, socketOpenTimeout time.Duration) (*ExtensionManagerClient, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appreciate the name change, it's clearer now
As discussed in #98, osquery's communication is single threaded. This PR adds several mechanisms to help manage that.
Context
version of the underlying thrift library. Thrift documents some amount of retry logic, if contexts have deadlines, though I cannot tell if it works. It's also a little scaffolding to support using limitersWithClient
option toNewExtensionManagerServer
. The intent of this is to allow an application to reuse the same client for the client portions of server communication.I'm probably willing to rework this to use a
rate.Limiter
if people thought that was a better ideaFixes: #98