Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve SkipList memory usage tracking (apache#2359)
The problem with the current implementation is that all data to be inserted will be counted in memory, but for the aggregation model or some other special cases, not all data will be inserted into `MemTable`, and these data should not be counted in memory. This change makes the `SkipList` use the exclusive `MemPool`, and only the data will be inserted into the `SkipList` can use this `MemPool`. In other words, those discarded rows will not be counted by the `MemPool` of` SkipList`. In order to avoid duplicate checking whether a row already exists in `SkipList`, this change also modifies the `SkipList` interface(A `Hint` will be fetched when `Find()`, and then use it in `InsertUseHint()`), and made `SkipList` no longer aware of the aggregation logic. At present, because of the data row(`Tuple`) generated by the upper layer is different from the data row(`Row`) internally represented by the engine, when inserting `MemTable`, the data row must be copied. If the row needs to be inserted into SkipList, we need copy it again to `MemPool` of `SkipList`. And, at present, the aggregation function only supports `MemPool` when copying, so even if the data will not be inserted into` SkipList`, `MemPool` is still used (in the future, it can be replaced with an ordinary` Buffer`). However, we reuse the allocated memory in MemPool, that is, we do not reallocate new memory every time. Note: Due to the characteristics of `MemPool` (once inserted, it cannot be partially cleared), the following scenarios may still cause multiple flushes. For example, the aggregation model of a string column is `MAX`, and the data inserted at the same time is in ascending order, then for each data row, it must apply for memory from `MemPool` in `SkipList`, that is, although the old rows in SkipList` will be discarded, the memory occupied will still be counted. I did a test on my development machine using `STREAM LOAD`: a table with only one tablet and all columns are keys, the original data was 1.1G (9318799 rows), and there were 377745 rows after removing duplicates. It can be found that both the number of files and the query efficiency are greatly improved, the price paid is only a slight increase in load time. before: ``` $ ll storage/data/0/10019/1075020655/ total 4540 -rw------- 1 dev dev 393152 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_0_0.dat -rw------- 1 dev dev 1135 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_0_0.idx -rw------- 1 dev dev 421660 Dec 2 18:43 0200000000000004f5404b740288294b21e52b0786adf3be_10_0.dat -rw------- 1 dev dev 1185 Dec 2 18:43 0200000000000004f5404b740288294b21e52b0786adf3be_10_0.idx -rw------- 1 dev dev 184214 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_1_0.dat -rw------- 1 dev dev 610 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_1_0.idx -rw------- 1 dev dev 329181 Dec 2 18:43 0200000000000004f5404b740288294b21e52b0786adf3be_11_0.dat -rw------- 1 dev dev 935 Dec 2 18:43 0200000000000004f5404b740288294b21e52b0786adf3be_11_0.idx -rw------- 1 dev dev 343813 Dec 2 18:43 0200000000000004f5404b740288294b21e52b0786adf3be_12_0.dat -rw------- 1 dev dev 985 Dec 2 18:43 0200000000000004f5404b740288294b21e52b0786adf3be_12_0.idx -rw------- 1 dev dev 315364 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_2_0.dat -rw------- 1 dev dev 885 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_2_0.idx -rw------- 1 dev dev 423806 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_3_0.dat -rw------- 1 dev dev 1185 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_3_0.idx -rw------- 1 dev dev 294811 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_4_0.dat -rw------- 1 dev dev 835 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_4_0.idx -rw------- 1 dev dev 403241 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_5_0.dat -rw------- 1 dev dev 1135 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_5_0.idx -rw------- 1 dev dev 350753 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_6_0.dat -rw------- 1 dev dev 860 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_6_0.idx -rw------- 1 dev dev 266966 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_7_0.dat -rw------- 1 dev dev 735 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_7_0.idx -rw------- 1 dev dev 451191 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_8_0.dat -rw------- 1 dev dev 1235 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_8_0.idx -rw------- 1 dev dev 398439 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_9_0.dat -rw------- 1 dev dev 1110 Dec 2 18:42 0200000000000004f5404b740288294b21e52b0786adf3be_9_0.idx { "TxnId": 16, "Label": "cd9f8392-dfa0-4626-8034-22f7cb97044c", "Status": "Success", "Message": "OK", "NumberTotalRows": 9318799, "NumberLoadedRows": 9318799, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 1079581477, "LoadTimeMs": 46907 } mysql> select count(*) from xxx_before; +----------+ | count(*) | +----------+ | 377745 | +----------+ 1 row in set (0.91 sec) ``` aftr: ``` $ ll storage/data/0/10013/1075020655/ total 3612 -rw------- 1 dev dev 3328992 Dec 2 18:26 0200000000000003d44e5cc72626f95a0b196b52a05c0f8a_0_0.dat -rw------- 1 dev dev 8460 Dec 2 18:26 0200000000000003d44e5cc72626f95a0b196b52a05c0f8a_0_0.idx -rw------- 1 dev dev 350576 Dec 2 18:26 0200000000000003d44e5cc72626f95a0b196b52a05c0f8a_1_0.dat -rw------- 1 dev dev 985 Dec 2 18:26 0200000000000003d44e5cc72626f95a0b196b52a05c0f8a_1_0.idx { "TxnId": 12, "Label": "88f606d5-8095-4f15-b61d-49b7080c16b8", "Status": "Success", "Message": "OK", "NumberTotalRows": 9318799, "NumberLoadedRows": 9318799, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 1079581477, "LoadTimeMs": 48771 } mysql> select count(*) from xxx_after; +----------+ | count(*) | +----------+ | 377745 | +----------+ 1 row in set (0.38 sec) ```
- Loading branch information