Skip to content

Commit f3263bd

Browse files
authored
Merge pull request #107 from Altinity/ashwini-ahire7-patch-4
Update array-functions-as-window.md
2 parents 972970e + bb72b84 commit f3263bd

File tree

1 file changed

+31
-33
lines changed

1 file changed

+31
-33
lines changed

content/en/altinity-kb-queries-and-syntax/array-functions-as-window.md

Lines changed: 31 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,12 @@ linkTitle: "Using array functions to mimic window-functions alike behavior"
44
weight: 100
55
---
66

7-
There are some usecases when you may want to mimic window functions using Arrays - as an optimization step, or to control the memory better / use on-disk spiling, or just if you have old ClickHouse® version.
7+
There are cases where you may need to mimic window functions using arrays in ClickHouse. This could be for optimization purposes, to better manage memory, or to enable on-disk spilling, especially if you’re working with an older version of ClickHouse that doesn't natively support window functions.
88

9-
## Running difference sample
9+
Here’s an example demonstrating how to mimic a window function like runningDifference() using arrays:
10+
11+
#### Step 1: Create Sample Data
12+
We’ll start by creating a test table with some sample data:
1013

1114
```sql
1215
DROP TABLE IS EXISTS test_running_difference
@@ -20,10 +23,8 @@ SELECT
2023
FROM numbers(100)
2124

2225

23-
SELECT * FROM test_running_difference
24-
```
26+
SELECT * FROM test_running_difference;
2527

26-
```text
2728
┌─id─┬──────────────────ts─┬────val─┐
2829
02010-01-01 00:00:00-1209
2930
12010-01-01 00:00:0043
@@ -130,13 +131,15 @@ SELECT * FROM test_running_difference
130131
100 rows in set. Elapsed: 0.003 sec.
131132

132133
```
134+
This table contains IDs, timestamps (ts), and values (val), where each id appears multiple times with different timestamps.
135+
136+
#### Step 2: Running Difference Example
137+
If you try using runningDifference directly, it works block by block, which can be problematic when the data needs to be ordered or when group changes occur.
138+
133139

134-
runningDifference works only in blocks & require ordered data & problematic when group changes
135140
```sql
136141
select id, val, runningDifference(val) from (select * from test_running_difference order by id, ts);
137-
```
138142

139-
```
140143
┌─id─┬────val─┬─runningDifference(val)─┐
141144
0-12090
142145
06683968048
@@ -244,13 +247,15 @@ select id, val, runningDifference(val) from (select * from test_running_differen
244247
```
245248

246249

247-
## Arrays !
250+
The output may look inconsistent because runningDifference requires ordered data within blocks.
248251

249-
### 1. Group & Collect the data into array
252+
#### Step 3: Using Arrays for Grouping and Calculation
253+
Instead of using runningDifference, we can utilize arrays to group data, sort it, and apply similar logic more efficiently.
250254

255+
**Grouping Data into Arrays** -
256+
You can group multiple columns into arrays by using the groupArray function. For example, to collect several columns as arrays of tuples, you can use the following query:
251257

252-
you can collect several column by building array of tuples:
253-
```
258+
```sql
254259
SELECT
255260
id,
256261
groupArray(tuple(ts, val))
@@ -281,10 +286,9 @@ GROUP BY id
281286
└────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
282287
```
283288

284-
### Do needed ordering in each array
285-
286-
For example - by second element of tuple:
287-
```
289+
**Sorting Arrays** -
290+
To sort the arrays by a specific element, for example, by the second element of the tuple, you can use the arraySort function:
291+
```sql
288292
SELECT
289293
id,
290294
arraySort(x -> (x.2), groupArray((ts, val)))
@@ -317,9 +321,11 @@ GROUP BY id
317321
20 rows in set. Elapsed: 0.004 sec.
318322
```
319323

320-
That can be rewritten like this:
324+
This sorts each array by the val (second element of the tuple) for each id.
321325

322-
```
326+
Simplified Sorting Example - We can rewrite the query in a more concise way using WITH clauses for better readability:
327+
328+
```sql
323329
WITH
324330
groupArray(tuple(ts, val)) as window_rows,
325331
arraySort(x -> x.1, window_rows) as sorted_window_rows
@@ -330,9 +336,10 @@ FROM test_running_difference
330336
GROUP BY id
331337
```
332338

333-
### Apply needed logic arrayMap / arrayDifference etc
339+
**Applying Calculations with Arrays** -
340+
Once the data is sorted, you can apply array functions like arrayMap and arrayDifference to calculate differences between values in the arrays:
334341

335-
```
342+
```sql
336343
WITH
337344
groupArray(tuple(ts, val)) as window_rows,
338345
arraySort(x -> x.1, window_rows) as sorted_window_rows,
@@ -343,10 +350,7 @@ SELECT
343350
sorted_window_rows_val_column_diff
344351
FROM test_running_difference
345352
GROUP BY id
346-
```
347-
348353

349-
```
350354
┌─id─┬─sorted_window_rows_val_column_diff─┐
351355
0 │ [0,68048,68243,72389,67860] │
352356
1 │ [0,19397,17905,16978,18345] │
@@ -376,10 +380,8 @@ GROUP BY id
376380
You can do also a lot of magic with arrayEnumerate and accessing different values by their ids.
377381

378382

379-
### Now you can return you arrays back to rows
380-
381-
382-
use arrayJoin
383+
**Reverting Arrays Back to Rows** -
384+
You can convert the arrays back into rows using arrayJoin:
383385

384386
```sql
385387
WITH
@@ -394,9 +396,7 @@ SELECT
394396
FROM test_running_difference
395397
GROUP BY id
396398
```
397-
398-
399-
or ARRAY JOIN
399+
Or use ARRAY JOIN to join the arrays back to the original structure:
400400

401401
```sql
402402
SELECT
@@ -417,8 +417,6 @@ FROM test_running_difference
417417
GROUP BY id
418418
) as t1
419419
ARRAY JOIN sorted_window_rows_val_column_diff as diff, sorted_window_rows_ts_column as ts
420-
421420
```
422421

423-
424-
etc.
422+
This allows you to manipulate and analyze data within arrays effectively, using powerful functions such as arrayMap, arrayDifference, and arrayEnumerate.

0 commit comments

Comments
 (0)