You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are some usecases when you may want to mimic window functions using Arrays - as an optimization step, or to control the memory better / use on-disk spiling, or just if you have old ClickHouse® version.
7
+
There are cases where you may need to mimic window functions using arrays in ClickHouse. This could be for optimization purposes, to better manage memory, or to enable on-disk spilling, especially if you’re working with an older version of ClickHouse that doesn't natively support window functions.
8
8
9
-
## Running difference sample
9
+
Here’s an example demonstrating how to mimic a window function like runningDifference() using arrays:
10
+
11
+
#### Step 1: Create Sample Data
12
+
We’ll start by creating a test table with some sample data:
10
13
11
14
```sql
12
15
DROPTABLE IS EXISTS test_running_difference
@@ -20,10 +23,8 @@ SELECT
20
23
FROM numbers(100)
21
24
22
25
23
-
SELECT*FROM test_running_difference
24
-
```
26
+
SELECT*FROM test_running_difference;
25
27
26
-
```text
27
28
┌─id─┬──────────────────ts─┬────val─┐
28
29
│ 0 │ 2010-01-0100:00:00 │ -1209 │
29
30
│ 1 │ 2010-01-0100:00:00 │ 43 │
@@ -130,13 +131,15 @@ SELECT * FROM test_running_difference
130
131
100 rows inset. Elapsed: 0.003 sec.
131
132
132
133
```
134
+
This table contains IDs, timestamps (ts), and values (val), where each id appears multiple times with different timestamps.
135
+
136
+
#### Step 2: Running Difference Example
137
+
If you try using runningDifference directly, it works block by block, which can be problematic when the data needs to be ordered or when group changes occur.
138
+
133
139
134
-
runningDifference works only in blocks & require ordered data & problematic when group changes
135
140
```sql
136
141
select id, val, runningDifference(val) from (select*from test_running_difference order by id, ts);
137
-
```
138
142
139
-
```
140
143
┌─id─┬────val─┬─runningDifference(val)─┐
141
144
│ 0 │ -1209 │ 0 │
142
145
│ 0 │ 66839 │ 68048 │
@@ -244,13 +247,15 @@ select id, val, runningDifference(val) from (select * from test_running_differen
244
247
```
245
248
246
249
247
-
## Arrays !
250
+
The output may look inconsistent because runningDifference requires ordered data within blocks.
248
251
249
-
### 1. Group & Collect the data into array
252
+
#### Step 3: Using Arrays for Grouping and Calculation
253
+
Instead of using runningDifference, we can utilize arrays to group data, sort it, and apply similar logic more efficiently.
250
254
255
+
**Grouping Data into Arrays** -
256
+
You can group multiple columns into arrays by using the groupArray function. For example, to collect several columns as arrays of tuples, you can use the following query:
251
257
252
-
you can collect several column by building array of tuples:
Once the data is sorted, you can apply array functions like arrayMap and arrayDifference to calculate differences between values in the arrays:
334
341
335
-
```
342
+
```sql
336
343
WITH
337
344
groupArray(tuple(ts, val)) as window_rows,
338
345
arraySort(x ->x.1, window_rows) as sorted_window_rows,
@@ -343,10 +350,7 @@ SELECT
343
350
sorted_window_rows_val_column_diff
344
351
FROM test_running_difference
345
352
GROUP BY id
346
-
```
347
-
348
353
349
-
```
350
354
┌─id─┬─sorted_window_rows_val_column_diff─┐
351
355
│ 0 │ [0,68048,68243,72389,67860] │
352
356
│ 1 │ [0,19397,17905,16978,18345] │
@@ -376,10 +380,8 @@ GROUP BY id
376
380
You can do also a lot of magic with arrayEnumerate and accessing different values by their ids.
377
381
378
382
379
-
### Now you can return you arrays back to rows
380
-
381
-
382
-
use arrayJoin
383
+
**Reverting Arrays Back to Rows** -
384
+
You can convert the arrays back into rows using arrayJoin:
383
385
384
386
```sql
385
387
WITH
@@ -394,9 +396,7 @@ SELECT
394
396
FROM test_running_difference
395
397
GROUP BY id
396
398
```
397
-
398
-
399
-
or ARRAY JOIN
399
+
Or use ARRAY JOIN to join the arrays back to the original structure:
400
400
401
401
```sql
402
402
SELECT
@@ -417,8 +417,6 @@ FROM test_running_difference
417
417
GROUP BY id
418
418
) as t1
419
419
ARRAY JOIN sorted_window_rows_val_column_diff as diff, sorted_window_rows_ts_column as ts
420
-
421
420
```
422
421
423
-
424
-
etc.
422
+
This allows you to manipulate and analyze data within arrays effectively, using powerful functions such as arrayMap, arrayDifference, and arrayEnumerate.
0 commit comments