Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mutexes to osquery-go #99

Closed
wants to merge 4 commits into from

Conversation

directionless
Copy link
Member

@directionless directionless commented Apr 19, 2022

As discussed in #98, osquery's communication is single threaded. This PR adds several mechanisms to help manage that.

  1. Most critically, it adds mutexes to the underlying client calls to enforce that.
  2. It exposes the Context version of the underlying thrift library. Thrift documents some amount of retry logic, if contexts have deadlines, though I cannot tell if it works. It's also a little scaffolding to support using limiters
  3. It adds a WithClient option to NewExtensionManagerServer. The intent of this is to allow an application to reuse the same client for the client portions of server communication.

I'm probably willing to rework this to use a rate.Limiter if people thought that was a better idea

Fixes: #98

}
}

type ExtensionManager interface {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this interface is exported. It seems like it should be an internal interface, since it's really declaring a consumer's expectation for what the server expects a client to look like. But I didn't want to change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember any good reason for this.

Copy link
Member

@zwass zwass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you left this in draft, but seems good to merge to me if you don't have further changes that you want to get into the same PR.

}
}

type ExtensionManager interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember any good reason for this.

@directionless
Copy link
Member Author

Hrm. One caveat I see here, is that I've added mutexes, but not timeouts. Which strikes me as a somewhat dangerous.

func (c *ExtensionManagerClient) QueryContext(ctx context.Context, sql string) (*osquery.ExtensionResponse, error) {
c.mu.Lock()
defer c.mu.Unlock()
return c.Client.Query(ctx, sql)
Copy link

@aleksmaus aleksmaus May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had to use c.Client.Query directly because we wanted to pass cancellable context and the c.Client is public. After this change API is still public, the calls are still not serialized and can be called concurrently. Just have to remember to switch to the new QueryContext API I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part isn't serialized here?

I'm thinking about whether it makes sense to drop the mutexes in favor of a rate.Limiter curious what you think

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part isn't serialized here?

In our code we used c.Client.Query directly because there were no API with context exposed on ExtensionManagerClient. With this PR we can switch to QueryContext and drop our own locking/synchronization.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about whether it makes sense to drop the mutexes in favor of a rate.Limiter curious what you think

What property of rate.Limiter would be appealing in this case?
As far as I understand the protocol breaks if there is more than one request in flight. Mutex seems appropriate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning a timeout to the caller. Mutex will hang forever.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this makes sense: time out the request at the limiter.Wait if the context.WithTimeout is used, instead of timing out when the previous request is completed and you get the change to grab the mutex lock.

Also, thinking of the cases where we need to limit the query running time: for example, do not run query for longer than a minute. Currently this is only possible with setting the deadline on the context of the query (context.WithTimeout).
In both cases the context timeout would not be exactly the query running time limit though.
One bad (long running) query can cause the pending queries contexts to timeout. By the time the query gets to run, the context might be already in the context deadline exceeded state.
Thoughts?

Another question: if we time out through the context, does this cancel the query at the protocol level and the server, meaning the guarantee that no response will be sent back for the timed out query, when we already sent the next query request (not familiar with that part)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a year, but I've resurrected this over on #108. I think I've found a better idiom for handling timeouts and context cancelation.

I'm not sure what the underlying thrift protocol does if you cancel the context for a running query, I'm not sure this SDK can do much

// NewClient creates a new client communicating to osquery over the socket at
// the provided path. If resolving the address or connecting to the socket
// fails, this function will error.
func NewClient(path string, timeout time.Duration) (*ExtensionManagerClient, error) {
trans, err := transport.Open(path, timeout)
func NewClient(path string, socketOpenTimeout time.Duration) (*ExtensionManagerClient, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appreciate the name change, it's clearer now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should we add mutexs to osquery-go?
3 participants