-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add util package for timeseries data compression using streamvbyte #7810
Conversation
3f54c80
to
984c552
Compare
984c552
to
9994ff4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we could simplify this a bit, both the protos and the library. Since we're storing absolute timestamps, we don't need any chunking, you can just slam more data in the array.
Proto API could look like:
message TimeSeries {
// A streamvbyte delta encoded list of int64 timestamps in usec since the epoch.
// Use server/util/timeseries to transcode.
bytes timestamps = 1;
// A streamvbyte delta encoded list of int64 sample values.
// Use server/util/timeseries to transcode.
bytes samples = 2;
// The length of this timeseries. Necessary for decoding.
int64 length = 3;
}
and then library code could look like:
func Encode(timestamps, samples []int64) *TimeSeries {}
func Decode(ts *TimeSeries) ([]int64, []int64) {}
Then the calling code can be real simple and dumb:
func main() {
timestamps := make([]int64, 0)
samples := make([]int64, 0)
for i := 0; i < 100_000; i++ {
if len(timestamps) > 1e6 { // whatever your max memory you want to use for this.
break
}
timestamps = append(timestamps, time.Now().UnixMicros())
samples = append(samples, observeSampleValue())
}
ts := timeseries.Encode(timestamps, samples)
task.observedCPU = ts
}
@tylerwilliams I did chunking so that the executor can compress the data incrementally rather than having to keep the full uncompressed dataset in memory while the task is running. Similarly when the apps decompress the data they can do so in a streaming fashion. I guess we can deal with this by just limiting the number of samples to 100K or something though? re the proto API I think it's redundant to duplicate the timestamps in each timeseries. The timestamps for memory, IO, etc. are all going to be the same because we scrape all of the cgroup files together. Maybe not a big deal but it does double the amount of storage required to add a new timeseries. Also I don't know how we'd support |
Ack, I don't think this is worth the added complexity in the protos or the code right now. Something we could come back and add if we really need it, but performance and memory wise, you can keep 3+ hours of 10ms resolution data in less than 20MB, and decompress it at ~GB/sec. Similar for the timestamps, it's not a ton of data, if you gzipped the proto it would disappear anyway. I think it's best to keep this simple here. |
@tylerwilliams I reworked this based on your comments and what we chatted about offline - PTAL |
7fc3707
to
6430173
Compare
No description provided.