-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update limit
behavior to use incrementing index
(fix last_by_index
aggregate)
#2964
Update limit
behavior to use incrementing index
(fix last_by_index
aggregate)
#2964
Conversation
7fc2d4f
to
0b83a2b
Compare
bd63e26
to
da9fb69
Compare
Signed-off-by: Timothy Bess <timbessmail@gmail.com>
Signed-off-by: Timothy Bess <timbessmail@gmail.com>
394606b
to
fd271ee
Compare
…freeing it with "free" which I believe worked in our case, but is UB I believe. Signed-off-by: Timothy Bess <timbessmail@gmail.com>
fd271ee
to
b7a4879
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks good!
You forgot to mentions his PR also fixes CI (much appreciated!) ... which has been failing due to the GHA CI machines running out of HD space during the Python build.
Test coverage looks great and covers last_by_index
aggregate behavior specifically.
Benchmarks for feature change to limit
behavior wrt Table.update()
are great too. This change does facially increase the allocation size with the new psp_old_key
column (I think - there is a lot changed including the flatten behavior so I may be wrong here), but its great to see there are no observable impact within the bounds of our CPU time benchmarking. It is also a good reminder IMO we should implement heap benchmarks!
break; | ||
|
||
switch (op) { FOREACH_T_OP(X) } | ||
#undef X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lmao
flattened_old_pkey_col->set_scalar(flattened_idx, old); | ||
flattened_old_pkey_col->set_valid(flattened_idx, true); | ||
LOG_DEBUG("DELETING OLD PKEY: " << old); | ||
// erase(old); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💀
t_uindex m_bidx; | ||
t_uindex m_eidx; | ||
t_uindex m_begin_idx; | ||
t_uindex m_edge_idx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -404,6 +418,9 @@ t_data_table::flatten_helper_1(FLATTENED_T flattened) const { | |||
std::vector<t_index> edges; | |||
edges.push_back(0); | |||
|
|||
// | pkey-1 | pkey-1 | pkey-2 | pkey-2 | |||
// ^ Is this an edge? I think so. | |||
// |-----------------| <- This is a span between those edges. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
if (delete_encountered) { | ||
d_pkey_col->push_back( | ||
d_pkey_col->set_nth( | ||
store_idx % limit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we're applying limit
to the flattened input port table. Is this is because the size of this table is later assumed to be lower bound of the new master table size (or similar)? I think this is fine but (especially compared to the current state of the limit
feature), but it has reminded me we likely have some other limit
related abnormalities in specific aggregates, e.g. AGGTYPE_HIGH_WATER_MARK
, which never decreases and thus would calculate differently if a single input update step was large than the limit, and the overwritten range removed the new high water mark value (I think).
@@ -26,6 +26,7 @@ namespace perspective { | |||
template <class FUNCTION> | |||
void | |||
parallel_for(int num_tasks, FUNCTION&& func) { | |||
#undef PSP_PARALLEL_FOR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
@@ -26,6 +26,9 @@ pub fn cmake_build() -> Result<Option<PathBuf>, std::io::Error> { | |||
let profile = std::env::var("PROFILE").unwrap(); | |||
dst.always_configure(true); | |||
dst.define("CMAKE_BUILD_TYPE", profile.as_str()); | |||
dst.define("ARROW_BUILD_EXAMPLES", "OFF"); | |||
dst.define("RAPIDJSON_BUILD_EXAMPLES", "OFF"); | |||
dst.define("ARROW_CXX_FLAGS_DEBUG", "-Wno-error"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! IMO we should move this into the CMakeLists.txt
though so it applies to the wasm builds also (assuming it doesn't already).
if (!check_version_gte(metadata.version, "3.4.3")) { | ||
// Bug with old versions of perspective segfault when you delete | ||
// a table with pending updates. | ||
await table.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, this is good to fix (fix mentioned in the PR summary)
return cmp(cmpval, 0); | ||
} // namespace perspective | ||
} | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a clang_format
error?
limit
behavior to use incrementing index
(fix last_by_index
aggregate)
This PR decouples primary keys from the limit functionality of perspective tables, allowing us to handle aggregates like "last by index" a bit better when users have a limit and want to rely on insert order to window their data.
Along the way I also fixed some memory safety issues where we had called
new
to allocate some memory and free'd it withfree
rather thandelete
. This wasn't breaking anything at the moment, but may have been relying on undefined behavior accidentally.The new benchmarks also triggered a bug in our polling implementation where if a table was dirtied and deleted before the
poll
call, it would segfault the poll call because we hadn't removed it from the dirty map in the server, so I fixed that as well.