Speculative query execution #1178

alourie · 2018-09-07T15:30:55Z

This is an attempt to implement speculative query execution. I think it's more or less covers the feature, but I clearly could have missed some things.

This is still a bit of WIP, so any feedback is welcome @Zariel ( and @annismckenzie if you have the time).

Thanks!

alourie · 2018-09-10T04:28:55Z

Fixed the locking mechanism and Travis is now happy.

alourie · 2018-09-10T05:31:30Z

Restructured the work. Now it's in 4 specific commits, each with self-descriptive commit message and can be looked at separately.

Zariel · 2018-09-12T20:29:45Z

query_executor.go

+
+		// if it's not the first attempt, pause
+		if specExecCounter > 0 {
+			<-time.After(sp.Delay())


reuse a timer instead of using time.After. This should also have a way to cancel, for now just doing a select on the ctx.Done() should be enough. Though that would need exposing from the query

Zariel · 2018-09-12T20:30:16Z

query_executor.go

+	hostIter := q.policy.Pick(qry)
+	sp := qry.speculativeExecutionPolicy()
+
+	results := make(chan queryResponse)


buffer this to the number of possible executions so goroutines dont leak

Thanks, that's actually a great idea.

Zariel · 2018-09-12T20:31:29Z

query_executor.go

 	RetryableQuery
 }

 type queryExecutor struct {
 	pool   *policyConnPool
 	policy HostSelectionPolicy
+	specWG sync.WaitGroup


we need one of these per query, define it local to the execution loop

Zariel · 2018-09-12T20:39:24Z

query_executor.go

+		res             queryResponse
+		specExecCounter int
+	)
+	for selectedHost := hostIter(); selectedHost != nil && specExecCounter < sp.Executions(); selectedHost = hostIter() {


I'm not sure I like this logic very much. I was thinking this would look something more like

timer := time.NewTimer(speculation.After()) defer timer.Stop() ctx, cancel := context.WithCancel(q.Context()) defer cancel() result := make(chan *Iter, speculation.Attempts()) for i := 0; i < speculation.Attempts(); i++ { go queryAttempt(ctx, hostIter, result) select { case <-timer.C: timer.Reset(speculation.After()) case result := <-result: return result case ctx.Done(): return ctx.Err() } }

And have all retrying logic handled within queryAttempt somehow.

This may require adjustments to the hostIter interface to say if we have more hosts to try

Should query speculation use the same host iterator for each query or a new one?

Considering that each speculative "execution" is defined per node, I think we need the same host iterator in order to go only once over all the endpoints. We don't want a different query to try a node that has already been tried.

Zariel · 2018-09-12T20:41:02Z

query_executor.go

+	}
+}
+
+func (q *queryExecutor) executeNormalQuery(qry ExecutableQuery) (*Iter, error) {


I'm not keen on how this requires having 2 implementations of a very similar function to execute a query if it is speculatable.

I totally agree. Just haven't found an elegant way to do it yet.

* Metrics are now split into: hostMetrics - for a list of metrics queryMetrics - for a map and a locker * Added functions to perform locked metrics updates/reads * Locking is private for the metrics only, so should have no performance effects. Signed-off-by: Alex Lourie <alex@instaclustr.com>

* Define the speculative policy * Add NonSpeculative policy * Add SimpleSpeculative policy Signed-off-by: Alex Lourie <alex@instaclustr.com>

Signed-off-by: Alex Lourie <alex@instaclustr.com>

* Refactor executeQuery to execute main code in a separate goroutine * Handle speculative/non-speculative cases separately * Add TestSpeculativeExecution test Signed-off-by: Alex Lourie <alex@instaclustr.com>

* Make one code path for all executions * Simplify the results handling * Update the tests Signed-off-by: Alex Lourie <alex@instaclustr.com>

alourie · 2018-09-19T06:00:33Z

@Zariel I think it's much better, more clear and kinda elegant now. Would love your feedback. Thanks.

annismckenzie

I really like this implementation and just have a couple nits.

annismckenzie · 2018-09-19T21:04:18Z

session.go

-	attempts := 0
-	for _, metric := range q.metrics {
+	q.metrics.l.Lock()
+	defer q.metrics.l.Unlock()


I'd handle this the same way with the unlock as you did in the other functions below – and not take the defer hit and unlock after the for loop. There's no other code path executed so the defer isn't necessary.

annismckenzie · 2018-09-19T21:04:28Z

session.go

-	hostMetrics, exists := q.metrics[host.ConnectAddress().String()]
+func (q *Query) getHostMetrics(host *HostInfo) *hostMetrics {
+	q.metrics.l.Lock()
+	defer q.metrics.l.Unlock()


See my comment below.

annismckenzie · 2018-09-19T21:05:47Z

policies.go

+type NonSpeculativeExecution struct{}
+
+func (sp NonSpeculativeExecution) Attempts() int        { return 0 }
+func (sp NonSpeculativeExecution) Delay() time.Duration { return 1 }


Why is the delay 1 in the non-speculative execution policy? 🤔

it's just a delay :-). I can return 0 as well, but it's irrelevant, as the delay is not used in non-speculative execution flow. But, it is used in the Ticker creation, so must be positive, so picked 1.

annismckenzie · 2018-09-19T21:08:30Z

query_executor.go

+
+			// Exit if the query was successful
+			// or no retry policy defined or retry attempts were reached
+			if rt == nil || iter.err == nil || !rt.Attempt(qry) {


Reorder the statements according to the comment (which is also the logical order to check, even though, yes, it doesn't really matter – a nit is a nit is a nit):

if iter.err == nil || rt == nil || !rt.Attempt(qry) {

Right, came back after I read your comment over in the other PR. I think we should take another stab at pulling #1151, #1164, and #1162 together into a coherent whole so that we can put speculative on top. I'm not harking on your implementation in here – it reads really fluidly. What it doesn't do right now is handle the case of a non-idempotent query being executed speculatively and we could do that if #1162 were implemented by explicitly stopping retries in those cases like write errors. On the other hand… one has to specifically turn on speculative execution on a query, right? 🤔 I may be overthinking this in that case.

@annismckenzie Yea, I was thinking to myself whether you're getting back to this :-) So indeed, it is possible that a non-idempotent query will be retried, but definitely not in a speculative execution. Speculative in this case means that there are multiple executions running at the same time, and the first one to get a response wins. We take care of that in the entrance to the executeQuery by forcing the speculative execution policy to NonSpeculative if the query is not idempotent.

Back to the issue. I'm all for fixing the retries to ack the idempotence, and definitely would be for taking another stab at error handling. I'm not overly sure these depend on this work, they can be layered in any order. It can be pretty decent in size, especially if we include the retries/idempotence handling and actually wrapping errors in their own types. Do you need a hand with that one?

I would really, really, really appreciate that! 🤝 I'll text you via other channels.

annismckenzie · 2018-09-19T21:34:40Z

query_executor.go

+				continue
+			default:
+				// Undefined?
+				results <- queryResponse{iter: iter, err: iter.err}


We had the same problem over in #1151. Ideally, we'd just want to panic here.

annismckenzie · 2018-09-19T21:36:22Z

query_executor.go

+}
+
+func (q *queryExecutor) run(qry ExecutableQuery, specWG *sync.WaitGroup, results chan queryResponse, stop chan struct{}) {
+


drop this blank line

* Metric lock improvements * Style cleanups Signed-off-by: Alex Lourie <alex@instaclustr.com>

Signed-off-by: Alex Lourie <alex@instaclustr.com>

…iveExecution_1083_new

Signed-off-by: Alex Lourie <alex@instaclustr.com>

alourie · 2018-10-08T01:23:12Z

@Zariel mind having another look? It's been laying around for awhile... Thanks!

beltran · 2018-10-09T01:55:29Z

query_executor.go

+				go q.run(qry, &specWG, results, stop)
+			case <-qry.GetContext().Done():
+				// not starting additional executions
+				return


Is it possible that return happens here and specWG.Wait() waits forever if the context is done because the for hasn't done all the loops? Could it with bad luck (hitting the continues in run) block the whole executeQuery?

No, it won't. We actually already have at least 1 execution added to the waiting group in line 55 (the main execution). Also, the run will only execute on N nodes and then finish, no blocking should be happening there.

beltran · 2018-10-09T02:06:57Z

conn_test.go

+func (t *testRetryPolicy) GetRetryType(err error) RetryType {
+	return Retry
+}
+


Test looks good. It'd be great if we could have:

A test where the context gets done

A test with delay zero (I'd like this to run with -race),which I believe is valid use case

Sure, I'll have a go at adding that.

alourie force-pushed the SpeculativeExecution_1083_new branch from 9d0efa3 to 95e91c2 Compare September 10, 2018 05:29

alourie changed the title ~~WIP: Speculative query execution~~ Speculative query execution Sep 10, 2018

Zariel reviewed Sep 12, 2018

View reviewed changes

alourie changed the title ~~Speculative query execution~~ WIP: Speculative query execution Sep 14, 2018

Alex Lourie added 4 commits September 19, 2018 15:04

Introduce Speculative Policy

25050b0

* Define the speculative policy * Add NonSpeculative policy * Add SimpleSpeculative policy Signed-off-by: Alex Lourie <alex@instaclustr.com>

Add IsIdempotent to ExecutableQuery interface

cec44f6

Signed-off-by: Alex Lourie <alex@instaclustr.com>

Implement speculative execution

d523899

* Refactor executeQuery to execute main code in a separate goroutine * Handle speculative/non-speculative cases separately * Add TestSpeculativeExecution test Signed-off-by: Alex Lourie <alex@instaclustr.com>

alourie force-pushed the SpeculativeExecution_1083_new branch from 9d75db9 to 3890f2e Compare September 19, 2018 05:35

Review comments

5630214

* Make one code path for all executions * Simplify the results handling * Update the tests Signed-off-by: Alex Lourie <alex@instaclustr.com>

alourie force-pushed the SpeculativeExecution_1083_new branch from 3890f2e to 5630214 Compare September 19, 2018 05:39

alourie changed the title ~~WIP: Speculative query execution~~ Speculative query execution Sep 19, 2018

annismckenzie mentioned this pull request Sep 19, 2018

Document that query idempotence has no effect yet #1159

Closed

annismckenzie suggested changes Sep 19, 2018

View reviewed changes

alourie force-pushed the SpeculativeExecution_1083_new branch from 02e3140 to e66f4b7 Compare September 20, 2018 04:30

More review comments

20bd1a2

* Metric lock improvements * Style cleanups Signed-off-by: Alex Lourie <alex@instaclustr.com>

alourie force-pushed the SpeculativeExecution_1083_new branch from e66f4b7 to 20bd1a2 Compare September 20, 2018 04:42

Alex Lourie added 2 commits September 24, 2018 14:07

Fix Latency calc lock

156e2a3

Signed-off-by: Alex Lourie <alex@instaclustr.com>

Merge branch 'master' of https://github.com/gocql/gocql into Speculat…

e386eb2

…iveExecution_1083_new

alourie force-pushed the SpeculativeExecution_1083_new branch from 5593eca to b0cfada Compare September 24, 2018 12:36

Fix session.go for new metrics

813b288

Signed-off-by: Alex Lourie <alex@instaclustr.com>

alourie force-pushed the SpeculativeExecution_1083_new branch from b0cfada to 813b288 Compare September 24, 2018 12:38

beltran reviewed Oct 9, 2018

View reviewed changes

Zariel merged commit aa46e85 into apache:master Oct 9, 2018

alourie deleted the SpeculativeExecution_1083_new branch October 10, 2018 00:36

axw mentioned this pull request Oct 10, 2018

Observers no longer running in calling goroutine #1212

Open

This was referenced Oct 11, 2018

Query.IsIdempotent has no effect #1153

Closed

Extend Batch to allow setting idempotence #1093

Closed

drew-richardson mentioned this pull request Oct 17, 2018

Stuck in infinite loop #1218

Closed

alourie mentioned this pull request Apr 30, 2019

RetryPolicies should acknowledge idempotence set on a query #1154

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speculative query execution #1178

Speculative query execution #1178

alourie commented Sep 7, 2018

alourie commented Sep 10, 2018 •

edited

Loading

alourie commented Sep 10, 2018

Zariel Sep 12, 2018

alourie Sep 14, 2018

Zariel Sep 12, 2018

alourie Sep 14, 2018

Zariel Sep 12, 2018

alourie Sep 14, 2018

Zariel Sep 12, 2018 •

edited

Loading

Zariel Sep 12, 2018

alourie Sep 13, 2018

Zariel Sep 12, 2018

alourie Sep 14, 2018

alourie commented Sep 19, 2018

annismckenzie left a comment

annismckenzie Sep 19, 2018

annismckenzie Sep 19, 2018

annismckenzie Sep 19, 2018

alourie Sep 19, 2018 •

edited

Loading

annismckenzie Sep 19, 2018

annismckenzie Sep 19, 2018 •

edited

Loading

alourie Sep 19, 2018 •

edited

Loading

annismckenzie Oct 3, 2018

alourie Oct 4, 2018

annismckenzie Sep 19, 2018

alourie Sep 19, 2018

annismckenzie Sep 19, 2018

alourie commented Oct 8, 2018

beltran Oct 9, 2018 •

edited

Loading

alourie Oct 9, 2018 •

edited

Loading

beltran Oct 9, 2018

beltran Oct 9, 2018

alourie Oct 9, 2018

		}

		func (q queryExecutor) run(qry ExecutableQuery, specWG sync.WaitGroup, results chan queryResponse, stop chan struct{}) {

Speculative query execution #1178

Speculative query execution #1178

Conversation

alourie commented Sep 7, 2018

alourie commented Sep 10, 2018 • edited Loading

alourie commented Sep 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zariel Sep 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alourie commented Sep 19, 2018

annismckenzie left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alourie Sep 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

annismckenzie Sep 19, 2018 • edited Loading

Choose a reason for hiding this comment

alourie Sep 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alourie commented Oct 8, 2018

beltran Oct 9, 2018 • edited Loading

Choose a reason for hiding this comment

alourie Oct 9, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alourie commented Sep 10, 2018 •

edited

Loading

Zariel Sep 12, 2018 •

edited

Loading

alourie Sep 19, 2018 •

edited

Loading

annismckenzie Sep 19, 2018 •

edited

Loading

alourie Sep 19, 2018 •

edited

Loading

beltran Oct 9, 2018 •

edited

Loading

alourie Oct 9, 2018 •

edited

Loading