Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downsampling based on MaxResolutionWindow #68

Merged
merged 10 commits into from
Dec 19, 2019
Merged

Downsampling based on MaxResolutionWindow #68

merged 10 commits into from
Dec 19, 2019

Conversation

eswdd
Copy link
Contributor

@eswdd eswdd commented Dec 9, 2019

Implementation as per comments in #31 - namely only implement where we've been given a non-zero MaxResolutionWindow

@eswdd eswdd requested review from dgl and robincw-gr December 9, 2019 13:39
@dgl
Copy link
Contributor

dgl commented Dec 10, 2019

I think this also needs to return the chunks in the right bucket, see Raw: at

Raw: &storepb.Chunk{Type: storepb.Chunk_XOR, Data: c.Bytes()},

and https://github.com/thanos-io/thanos/blob/master/pkg/store/storepb/types.proto#L41

@eswdd
Copy link
Contributor Author

eswdd commented Dec 10, 2019

Oh, half-a-job Simon, will take a look this eve.

@eswdd
Copy link
Contributor Author

eswdd commented Dec 10, 2019

Ugh, so this now gets mucky and there are two solutions. The challenge is working out what bucket we need to put each resulting time series into. IF OpenTSDB supported sending in query ids with sub-queries which we could then reference on response then we could maintain a mapping of id to requested aggregate and then unpack accordingly, BUT it doesn't.

So, there are now two options, both of which rely on using the show_query parameter on the query API, for which I've added support in our OpenTSDB client in a new PR (G-Research/opentsdb-goclient#12)

  1. Because there is currently a 1:1 mapping between aggregate type and downsample function, we could parse out the downsample function from the returned results and use this to map back to bucket.
  2. We would maintain a mapping between sub-query and aggregate type and then use this to map back later. However if we believe that there won't always be a 1:1 mapping, then we'd potentially want to send multiple query requests as we might not be able to uniquely map everything back later. I've used this approach with Aardvark and it's non-trivial (made worse there because it's all hidden in JavaScript callbacks).

I'm plumping for option 1 for now as we can always switch later.

@eswdd
Copy link
Contributor Author

eswdd commented Dec 10, 2019

@dgl PTAL

return store, nil
}

func (store *OpenTSDBStore) populateMaps() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is not dependent on external data I'd consider just doing this once in init, then the code doesn't have to pass errors around (just panic, as it will run on startup).

@eswdd
Copy link
Contributor Author

eswdd commented Dec 16, 2019 via email

@dgl
Copy link
Contributor

dgl commented Dec 17, 2019

You don’t have to force anything, simply func init in a file will get run for you.

@eswdd
Copy link
Contributor Author

eswdd commented Dec 17, 2019 via email

@eswdd
Copy link
Contributor Author

eswdd commented Dec 19, 2019

I'm not convinced an init() function would be helpful here as that can only init a package level variable, not state within the struct, which then becomes public for all to see, rather than hidden from view..

@eswdd eswdd merged commit b6a1fcb into master Dec 19, 2019
@eswdd eswdd deleted the downsampling branch December 19, 2019 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants