Update `limit` behavior to use incrementing `index` (fix `last_by_index` aggregate) #2964

timbess · 2025-03-28T15:23:29Z

This PR decouples primary keys from the limit functionality of perspective tables, allowing us to handle aggregates like "last by index" a bit better when users have a limit and want to rely on insert order to window their data.

Along the way I also fixed some memory safety issues where we had called new to allocate some memory and free'd it with free rather than delete. This wasn't breaking anything at the moment, but may have been relying on undefined behavior accidentally.

The new benchmarks also triggered a bug in our polling implementation where if a table was dirtied and deleted before the poll call, it would segfault the poll call because we hadn't removed it from the dirty map in the server, so I fixed that as well.

Signed-off-by: Timothy Bess <timbessmail@gmail.com>

…freeing it with "free" which I believe worked in our case, but is UB I believe. Signed-off-by: Timothy Bess <timbessmail@gmail.com>

texodus

Thanks for the PR! Looks good!

You forgot to mentions his PR also fixes CI (much appreciated!) ... which has been failing due to the GHA CI machines running out of HD space during the Python build.

Test coverage looks great and covers last_by_index aggregate behavior specifically.

Benchmarks for feature change to limit behavior wrt Table.update() are great too. This change does facially increase the allocation size with the new psp_old_key column (I think - there is a lot changed including the flatten behavior so I may be wrong here), but its great to see there are no observable impact within the bounds of our CPU time benchmarking. It is also a good reminder IMO we should implement heap benchmarks!

texodus · 2025-04-04T03:06:22Z

cpp/perspective/src/cpp/base.cpp

+        break;
+
+    switch (op) { FOREACH_T_OP(X) }
+#undef X


texodus · 2025-04-04T03:13:19Z

cpp/perspective/src/cpp/gnode_state.cpp

+                    flattened_old_pkey_col->set_scalar(flattened_idx, old);
+                    flattened_old_pkey_col->set_valid(flattened_idx, true);
+                    LOG_DEBUG("DELETING OLD PKEY: " << old);
+                    // erase(old);


texodus · 2025-04-04T03:18:35Z

cpp/perspective/src/include/perspective/data_table.h

-    t_uindex m_bidx;
-    t_uindex m_eidx;
+    t_uindex m_begin_idx;
+    t_uindex m_edge_idx;


texodus · 2025-04-04T03:20:19Z

cpp/perspective/src/include/perspective/data_table.h

@@ -404,6 +418,9 @@ t_data_table::flatten_helper_1(FLATTENED_T flattened) const {
    std::vector<t_index> edges;
    edges.push_back(0);

+    // | pkey-1 | pkey-1 | pkey-2 | pkey-2
+    //                   ^ Is this an edge? I think so.
+    // |-----------------| <- This is a span between those edges.


texodus · 2025-04-04T03:42:24Z

cpp/perspective/src/include/perspective/data_table.h

        if (delete_encountered) {
-            d_pkey_col->push_back(
+            d_pkey_col->set_nth(
+                store_idx % limit,


It seems we're applying limit to the flattened input port table. Is this is because the size of this table is later assumed to be lower bound of the new master table size (or similar)? I think this is fine but (especially compared to the current state of the limit feature), but it has reminded me we likely have some other limit related abnormalities in specific aggregates, e.g. AGGTYPE_HIGH_WATER_MARK, which never decreases and thus would calculate differently if a single input update step was large than the limit, and the overwritten range removed the new high water mark value (I think).

texodus · 2025-04-04T03:43:43Z

cpp/perspective/src/include/perspective/parallel_for.h

@@ -26,6 +26,7 @@ namespace perspective {
 template <class FUNCTION>
 void
 parallel_for(int num_tasks, FUNCTION&& func) {
+#undef PSP_PARALLEL_FOR


texodus · 2025-04-04T03:46:26Z

rust/perspective-server/build/psp.rs

@@ -26,6 +26,9 @@ pub fn cmake_build() -> Result<Option<PathBuf>, std::io::Error> {
    let profile = std::env::var("PROFILE").unwrap();
    dst.always_configure(true);
    dst.define("CMAKE_BUILD_TYPE", profile.as_str());
+    dst.define("ARROW_BUILD_EXAMPLES", "OFF");
+    dst.define("RAPIDJSON_BUILD_EXAMPLES", "OFF");
+    dst.define("ARROW_CXX_FLAGS_DEBUG", "-Wno-error");


This is great! IMO we should move this into the CMakeLists.txt though so it applies to the wasm builds also (assuming it doesn't already).

texodus · 2025-04-04T03:48:09Z

tools/perspective-bench/cross_platform_suite.mjs

+                if (!check_version_gte(metadata.version, "3.4.3")) {
+                    // Bug with old versions of perspective segfault when you delete
+                    // a table with pending updates.
+                    await table.size();


Nice catch, this is good to fix (fix mentioned in the PR summary)

texodus · 2025-04-04T03:49:52Z

cpp/perspective/src/include/perspective/scalar.h

+return cmp(cmpval, 0);
+} // namespace perspective
+}
+;


Is this a clang_format error?

timbess force-pushed the feature/decouple-pkey-from-limit branch from 7fc2d4f to 0b83a2b Compare March 28, 2025 15:23

texodus added the enhancement Feature requests or improvements label Mar 28, 2025

timbess force-pushed the feature/decouple-pkey-from-limit branch 10 times, most recently from bd63e26 to da9fb69 Compare April 2, 2025 19:47

timbess added 2 commits April 2, 2025 15:47

Decouple Primary keys from limit of table.

81b0da7

Signed-off-by: Timothy Bess <timbessmail@gmail.com>

Fix polling referencing deleted tables.

1751722

Signed-off-by: Timothy Bess <timbessmail@gmail.com>

timbess force-pushed the feature/decouple-pkey-from-limit branch 2 times, most recently from 394606b to fd271ee Compare April 3, 2025 15:11

timbess marked this pull request as ready for review April 3, 2025 16:35

Fix memory safety issue. We were accidentally calling "new" and then …

b7a4879

…freeing it with "free" which I believe worked in our case, but is UB I believe. Signed-off-by: Timothy Bess <timbessmail@gmail.com>

timbess force-pushed the feature/decouple-pkey-from-limit branch from fd271ee to b7a4879 Compare April 3, 2025 20:55

texodus approved these changes Apr 4, 2025

View reviewed changes

texodus changed the title ~~Decouple Primary keys from limit of table.~~ Update limit behavior to use incrementing index (fix last_by_index aggregate) Apr 4, 2025

texodus merged commit b0c0177 into finos:master Apr 4, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `limit` behavior to use incrementing `index` (fix `last_by_index` aggregate) #2964

Update `limit` behavior to use incrementing `index` (fix `last_by_index` aggregate) #2964

timbess commented Mar 28, 2025 •

edited

Loading

texodus left a comment

texodus Apr 4, 2025

texodus Apr 4, 2025

texodus Apr 4, 2025

texodus Apr 4, 2025

texodus Apr 4, 2025

texodus Apr 4, 2025

texodus Apr 4, 2025

texodus Apr 4, 2025

texodus Apr 4, 2025

Update limit behavior to use incrementing index (fix last_by_index aggregate) #2964

Update limit behavior to use incrementing index (fix last_by_index aggregate) #2964

Conversation

timbess commented Mar 28, 2025 • edited Loading

texodus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Update `limit` behavior to use incrementing `index` (fix `last_by_index` aggregate) #2964

Update `limit` behavior to use incrementing `index` (fix `last_by_index` aggregate) #2964

timbess commented Mar 28, 2025 •

edited

Loading