Fix badger merge-join algorithm to correctly filter indexes #1721

burmanm · 2019-08-08T12:26:35Z

Which problem is this PR solving?

Resolves #1719, the index seeks were not correctly merged and filtered.

Short description of the changes

Make the merge-join correctly update two indices when encountering equal items. Also, the input must be the output of previous merge. Also, changed ASC to DESC reversing to happen after the top query filtering - thus reducing unnecessary work.

…acing#1719 Signed-off-by: Michael Burman <yak@iki.fi>

codecov · 2019-08-09T06:51:23Z

Codecov Report

Merging #1721 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1721      +/-   ##
==========================================
+ Coverage   98.36%   98.36%   +<.01%     
==========================================
  Files         193      193              
  Lines        9358     9361       +3     
==========================================
+ Hits         9205     9208       +3     
  Misses        119      119              
  Partials       34       34

Impacted Files	Coverage Δ
plugin/storage/badger/spanstore/reader.go	`96.66% <100%> (+0.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 98fd69a...4455754. Read the comment docs.

plugin/storage/badger/spanstore/reader.go

pavolloffay · 2019-08-14T15:11:21Z

plugin/storage/badger/spanstore/reader.go

@@ -346,53 +345,60 @@ func (r *TraceReader) durationQueries(query *spanstore.TraceQueryParameters, ids
 	return ids
 }

+func mergeJoinIds(left, right [][]byte) [][]byte {


rename to mergeEqualIds ?

Are the ids sorted? Maybe that should be documented somewhere.

It's mentioned at the beginning of the package. Everything is sorted (it's a sorted K/V).

As for the name, it's because the algorithm is called "sort-merge join" and is used in relational databases. Here the sorting phase happens in the DB and the merge phase in this code. It's pretty descriptive in my opinion since if someone wants to improve this method such as doing it parallel or using sharding from multiple badgers there are known algorithms for those variations too (which would underneath use this in any case).

Thanks for the explanation. Now it rings a bell..

objectiser

LGTM - It essentially looks like the same id must exist in the array of id lists supplied - sorry haven't had a chance to dig into the implementation in more detail - is there a quick explanation of what each id list represents?

plugin/storage/badger/spanstore/reader.go

objectiser · 2019-08-16T11:58:32Z

plugin/storage/badger/spanstore/read_write_test.go

@@ -200,6 +205,7 @@ func TestIndexSeeks(t *testing.T) {
 		params.OperationName = "operation-1"
 		tags := make(map[string]string)
 		tags["k11"] = "val0"
+		tags["error"] = "true"


Curious why this was added, as doesn't seem related to the PR?

Devil is in the details. That single line exploits the bug (the test fails with older version) since it adds another index query against the tags.

As for the id list, it is basically the list of matches for the search query. A form of a posting list (of traceIDs) if thinking in terms of the ES.

In terms of relational database, it's equivalent to something like: SELECT id FROM dbo.spans WHERE service = 'invoices'

That is, a single id list is equivalent to that one. Just imagine each id list is one similar query, touching a single index and single value. It doesn't matter if the index is the same or not (so one query could be against service, one against tags index etc).

Signed-off-by: Michael Burman <yak@iki.fi>

objectiser

@burmanm Thanks for the explanation.

burmanm requested review from black-adder, jpkrohling, objectiser, pavolloffay, tiffon, vprithvi and yurishkuro as code owners August 8, 2019 12:26

burmanm force-pushed the badger_merge_fix branch from 2a3e3d1 to baff4cb Compare August 9, 2019 06:39

Fix merge-join algorithm to correctly filter indexes, closes jaegertr…

baff4cb

…acing#1719 Signed-off-by: Michael Burman <yak@iki.fi>

pavolloffay added the storage/badger Issues related to badger storage label Aug 9, 2019

pavolloffay changed the title ~~Fix badger merge-join algorithm to correctly filter indexes, closes #1719~~ Fix badger merge-join algorithm to correctly filter indexes Aug 14, 2019

pavolloffay reviewed Aug 14, 2019

View reviewed changes

objectiser reviewed Aug 16, 2019

View reviewed changes

Address comments

4455754

Signed-off-by: Michael Burman <yak@iki.fi>

objectiser approved these changes Aug 16, 2019

View reviewed changes

pavolloffay approved these changes Aug 19, 2019

View reviewed changes

pavolloffay merged commit ecdecd1 into jaegertracing:master Aug 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix badger merge-join algorithm to correctly filter indexes #1721

Fix badger merge-join algorithm to correctly filter indexes #1721

burmanm commented Aug 8, 2019 •

edited by pavolloffay

Loading

codecov bot commented Aug 9, 2019 •

edited

Loading

pavolloffay Aug 14, 2019

pavolloffay Aug 14, 2019

burmanm Aug 14, 2019

burmanm Aug 14, 2019

pavolloffay Aug 15, 2019

objectiser left a comment

objectiser Aug 16, 2019

burmanm Aug 16, 2019

burmanm Aug 16, 2019

objectiser left a comment

Fix badger merge-join algorithm to correctly filter indexes #1721

Fix badger merge-join algorithm to correctly filter indexes #1721

Conversation

burmanm commented Aug 8, 2019 • edited by pavolloffay Loading

Which problem is this PR solving?

Short description of the changes

codecov bot commented Aug 9, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

objectiser left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

objectiser left a comment

Choose a reason for hiding this comment

burmanm commented Aug 8, 2019 •

edited by pavolloffay

Loading

codecov bot commented Aug 9, 2019 •

edited

Loading