You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trying to use this as a library from 0.10.6 commit for spark druid re-processor. Using dataframes and segment pusher to make this as independent process. For some reason when I use the default map-reduce task on 1 day worth of data at "fifteen_minute" granularity I get 700mb file, but if I use this library then I get 4 GB of file for the same spec. Is there some compression or configuration I am missing ?
Verified the dimensions & metrics are the same
verified its the same data that I am processing
verified that data is actually at fifteen_minute granularity by looking at footer of smoosh file.
My guess is that it is missing dimension compression, but unsure how to figure out by looking at the smoosh file.
using druid 0.10.1
The text was updated successfully, but these errors were encountered:
@Igosuki the issue is because of the sorting. since the rows are unsorted aggs on grouping result in higher number of rows. We should sort within partitions so that groups have similar rows and aggs happen nicely. Will open a pull request soon
I tried doing sortwithinpartions but does not scale well, with one segment data in one executor the sorting phase is very slow. Do not know how to move forward.
Trying to use this as a library from 0.10.6 commit for spark druid re-processor. Using dataframes and segment pusher to make this as independent process. For some reason when I use the default map-reduce task on 1 day worth of data at "fifteen_minute" granularity I get 700mb file, but if I use this library then I get 4 GB of file for the same spec. Is there some compression or configuration I am missing ?
Verified the dimensions & metrics are the same
verified its the same data that I am processing
verified that data is actually at fifteen_minute granularity by looking at footer of smoosh file.
My guess is that it is missing dimension compression, but unsure how to figure out by looking at the smoosh file.
using druid 0.10.1
The text was updated successfully, but these errors were encountered: