-
Notifications
You must be signed in to change notification settings - Fork 117
Add support for disabling hash aggregate with hll #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a very quick look. I'll have a more detailed look after the caching part is completed.
src/hll.c
Outdated
RegisterHLLConfigVariables(void) | ||
{ | ||
DefineCustomBoolVariable( | ||
"hll.disable_hashagg", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somehow enable_hashagg
felt better here, postgres also uses it.
b4e251a
to
90e9cbd
Compare
src/hll.c
Outdated
foreach(cacheEntryCell, hllAggregateCache) | ||
{ | ||
HllAggregateCacheEntry *cacheEntry = (HllAggregateCacheEntry *)lfirst(cacheEntryCell); | ||
if (strcmp(aggregateName, cacheEntry->aggregateName) == 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sequential scan on a list + text comparison. Could this get us into trouble ? Maybe compare argument count first
src/hll.c
Outdated
oldContext = MemoryContextSwitchTo(CacheMemoryContext); | ||
matchingCacheEntry = palloc0(sizeof(HllAggregateCacheEntry)); | ||
matchingCacheEntry->aggregateId = aggregateId; | ||
matchingCacheEntry->aggregateName = aggregateName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are storing a pointer to char whose lifetime is not know. We should copy instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it appears they are constants, then do we need to store text at all ? can we store an integer perhaps so that we can easily check the id ?
src/hll.c
Outdated
heap_close(rel, AccessShareLock); | ||
|
||
return result; | ||
/* *INDENT-ON* */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INDENT-ON ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor requests
Please re-evaluate function cache design
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
spoke too soon Travis fails left and right, you will need to add hll to |
accidentally closed the PR |
a321592
to
d22970b
Compare
src/hll.c
Outdated
HLL_UNION_AGG_ID = 1 | ||
} HllAggregationId; | ||
|
||
bool ForceGroupagg = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ForceGroupagg -> ForceGroupAgg
src/hll.c
Outdated
foreach(cacheEntryCell, hllAggregateCache) | ||
{ | ||
HllAggregateCacheEntry *cacheEntry = (HllAggregateCacheEntry *)lfirst(cacheEntryCell); | ||
if (argumentCount == cacheEntry->argumentCount && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be slightly OCD but can we change the order to be like this;
if (cacheEntry->argumentCount == argumentCount && cacheEntry->hllAggregationId == hllAggregationId)
Personally, I would love to have hllAggregationId comparison first, but with the other comment below, having count comparison first makes more sense.
src/hll.c
Outdated
Oid hllId = get_extension_oid(HLL_EXTENSION_NAME, false); | ||
Oid hllSchemaOid = get_extension_schema(hllId); | ||
const char *hllSchemaName = get_namespace_name(hllSchemaOid); | ||
char *aggregateName = hllAggregationId == HLL_ADD_AGG_ID ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end, we are still using aggregate names, so I think we don't need an extra level of abstraction (i.e. enum and aggregation ids). Without the enum, the code would be much simpler and only drawback I can see is we had to do string comparison in the loop above, which is mostly OK especially if we first compare argument counts.
src/hll.c
Outdated
// Support disabling hash aggregation functionality for PG > 9.6 | ||
#if PG_VERSION_NUM >= 90600 | ||
|
||
#define HLL_EXTENSION_NAME "hll" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for HLL prefix, this is HLL extension anyway. We don't use HLL keyword anywhere in the codebase except functions exposed to user space.
src/hll.c
Outdated
RelOptInfo *input_rel, RelOptInfo *output_rel) | ||
#endif | ||
{ | ||
Oid hllId = InvalidOid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hllId -> extension_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
src/hll.c
Outdated
{ | ||
Aggref *aggref = (Aggref *) var; | ||
int argCount = list_length(aggref->args); | ||
Oid addFunctionId = LookupAggregatorOidByName(HLL_ADD_AGG_ID, argCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm starting to think, that it would be better to have two static variables (like add_aggregate_oid[] and union_aggregate_id) for function oids which are populated at pg_init. This would save us from all caching logic.
62615dd
to
535c945
Compare
sql/disable_hashagg.sql
Outdated
HAVING hll_cardinality(hll_union_agg(sd)) > 1; | ||
|
||
-- Test hll_add_agg, first set work_mem to 256MB in order to use hash aggregate | ||
-- befire forcing group aggregate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
befire -> before
535c945
to
a4a5379
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢 after minor changes
-- ---------------------------------------------------------------- | ||
-- Regression tests for disabling hash aggregation | ||
-- ---------------------------------------------------------------- | ||
SELECT hll_set_output_version(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all disable hash_agg expectations look the same. it would be better if we put some sort of PG version information that caused us to make this differentiation
src/hll.c
Outdated
RelOptInfo *input_rel, RelOptInfo *output_rel) | ||
#endif | ||
{ | ||
Oid hllId = InvalidOid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
src/hll.c
Outdated
|
||
/* Initialize HLL_ADD_AGG with different signatures */ | ||
aggregateName = ADD_AGG_NAME; | ||
for ( ; addAggArgumentCounter < HLL_AGGREGATE_COUNT ; addAggArgumentCounter++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
odd use of for loop
02d95c0
to
9dc4c91
Compare
|
||
if(HllAggregateOid(aggref->aggfnoid)) | ||
{ | ||
path->total_cost = INT_MAX; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INT_MAX is a pretty arbitrary value.
total_cost
is not an int. If it was then setting it to the max would be a bad idea because it would likely overflow when other costs are added, now it's just arbitrary, which is probably fine?
This PR adds support for disabling hash aggregate for hll aggregate functions. In order to disable cost of the path with hash aggregate is maximized, so the planner is forced to select group aggregate. Client can utilize this functionality by setting
hll.disable_hashagg
to ON at session level.Remaining items: