Support INSERT INTO SELECT #557

jiacai2050 · 2023-01-10T15:30:22Z

Describe This Problem

Enhance SQL support.

INSERT INTO SELECT is useful to migrate table, benchmark(to quickly generate lots of data).

Proposal

Implement this SQL syntax.

INSERT INTO table2 (column1, column2, column3, ...)
SELECT column1, column2, column3, ...
FROM table1
WHERE condition;

Additional Context

The text was updated successfully, but these errors were encountered:

caicancai · 2023-12-05T08:35:33Z

Hi, I'm interested in this let me try, but this one may take a little time.
This requirement seems a little difficult for me, but I want to try it

jiacai2050 · 2023-12-05T09:21:00Z

Thanks, assigned.

You need to know how horaedb parse sql, how a plan is created and executed. This is indeed a non-trivial task, feel free to ask any questions when you are in trouble.

jiacai2050 · 2023-12-29T07:37:05Z

@caicancai Hi, a few weeks passed, how's it going? Any problems?

caicancai · 2023-12-29T07:51:08Z

@caicancai Hi, a few weeks passed, how's it going? Any problems?

There doesn't seem to be any progress. I was busy with work in December. I'm sorry for that. I may start this part of the work on New Year's Day, if this feature is not urgent.

jiacai2050 · 2023-12-29T07:54:06Z

Hi, take your time, just want to know if there is any problems.

jiacai2050 · 2023-12-29T07:57:17Z

You can join our slack channel to discuss with us.

https://github.com/apache/incubator-horaedb?tab=readme-ov-file#contributing

caicancai · 2024-01-03T08:57:17Z

@jiacai2050 Hello, I tried it during the holiday. I expected that this feature would take up a lot of my energy. Since I have been busy recently, I may not have so much energy. I chose to give up this feature. I am very sorry for taking up so much time.

jiacai2050 · 2024-01-05T12:03:21Z

Thanks for trying out, This task is a little complex, you need to understand how query and write works at the same time.

PS: I remove the good first issue tag.

dracoooooo · 2024-05-16T04:56:37Z

Hi @jiacai2050, I'd like to take this task.

jiacai2050 · 2024-05-16T06:26:26Z

👍 Much appreciated.

## Rationale Close apache#557. ## Detailed Changes When generating the insert logical plan, alse generate the select logical plan and store it in the insert plan. Then execute the select logical plan in the insert interpreter, convert the result records into RowGroup and then insert it. ## Test Plan CI

refactor: partitioned_lock's elaboration (apache#1540) Extended the `try_new` interface while keeping the old one for compatibility. * Implemented the `try_new_suggest_cap` method, while changing the old `try_new` method to `try_new_bit_len` to ensure compatibility. * Modified structs and functions that call old interfaces. * Added new unit tests * Passed CI test --------- Co-authored-by: chunhao.ch <chunhao@antgroup.com> feat: support INSERT INTO SELECT (apache#1536) Close apache#557. When generating the insert logical plan, alse generate the select logical plan and store it in the insert plan. Then execute the select logical plan in the insert interpreter, convert the result records into RowGroup and then insert it. CI refactor: insert select to stream mode (apache#1544) Close apache#1542 Do select and insert procedure in stream way. CI test. --------- Co-authored-by: jiacai2050 <dev@liujiacai.net> fix(comment): update error documentation comment for remote engine service (apache#1548) Updating an error comment in the code to reflect the correct service name is needed. No need refactor: manifest error code (apache#1546) fix: sequence overflow when dropping a table using a message queue as WAL (apache#1550) Fix the issue of sequence overflow when dropping a table using a message queue as WAL. close apache#1543 Check the maximum value of sequence to prevent overflow. CI. feat: Add a new disk-based WAL implementation for standalone deployment (apache#1552) 1. Added a struct `Segment` responsible for reading and writing segment files, and it records the offset of each record. 2. Add a struct SegmentManager responsible for managing all segments, including: 1. Reading all segments from the folder upon creation. 2. Writing only to the segment with the largest ID. 3. Maintaining a cache where segments not in the cache are closed, while segments in the cache have their files open and are memory-mapped using mmap. 3. Implement the `WalManager` trait. Unit tests. chore: upgrade object store version (apache#1541) The object store version is upgraded to 0.10.1 to prepare for access to opendal - Impl AsyncWrite for ObjectStoreMultiUpload - Impl MultipartUpload for ObkvMultiPartUpload - Adapt new api on query writing path - Existing tests --------- Co-authored-by: jiacai2050 <dev@liujiacai.net> feat: use opendal to access underlying storage (apache#1557) Use opendal to access the object store, thus unifying the access method of the underlying storage. - use opendal to access s3/oss/local file - Existed tests feat: add metric engine rfc (apache#1558) RFC for next metric engine. No need. chore: update link (apache#1561) I noticed that the previous repository has been archived, maybe it would be better to update the new link chore(horaemeta): add building docs (apache#1562) feat: Implementing cross-segment read/write for WAL based on local disk (apache#1556) Improving WAL based on local disk. This is a follow-up task for apache#1552. 1. Make MAX_FILE_SIZE configurable. 2. Allocate enough space when creating a segment to avoid remapping when appending to the segment. 3. Add `MultiSegmentLogIterator` to enable cross-segment reading. 4. When writing, if the current segment has insufficient space, create a new segment and write to the new segment. Unit test. chore: fix doc links (apache#1565) fix: disable layered memtable in overwrite mode (apache#1533) Layered memtable is only designed for append mode table now, and it shouldn't be used in overwrite mode table. - Make default values in config used. - Add `enable` field to control layered memtable's on/off. - Add check to prevent invalid options during table create/alter. - Add related it cases. Test manually. Following cases are considered: Check and intercept the invalid table options during table create/alter - enable layered memtable but mutable switch threshold is 0 - enable layered memtable for overwrite mode table Table options new field `layered_enable`'s default value when it is not found in pb - false, when whole `layered_memtable_options` not exist - false, when `layered_memtable_options` exist, and `mutable_segment_switch_threshold` == 0 - true, when `layered_memtable_options` exist, and `mutable_segment_switch_threshold` > 0 feat: init metric engine structure (apache#1554) See apache#1558 Add a new sub directory `horaedb`, all source codes for metric engine are under it. Add a new ci. feat: Implement delete operation for WAL based on local storage (apache#1566) Currently the WAL based on the local disk does not support the delete function. This PR implements that functionality. This is a follow-up task of apache#1552 and apache#1556. 1. For each `Segment`, add a hashmap to record the minimum and maximum sequence numbers of all tables within that segment. During `delete` and `write` operations, this hashmap will be updated. During read operations, logs will be filtered based on this hashmap. 2. During the `delete` operation, based on the aforementioned hashmap, if all logs of all tables in a read-only segment (a segment that is not currently being written to) are marked as deleted, the segment file will be physically deleted from the disk. Unit test, TSBS and running a script locally that repeatedly inserts data, forcibly kills, and restarts the database process to test persistence. fix: support to compat the old layered memtable options (apache#1568) We introduce the explicit flag to control should we enable layered memtable, but it has some compatibility problem when upgrading from old version. This pr add an option to support compating the old layered memtable on/off control method. Add an option to support compating the old layered memtable on/off control method. Manually. chore: record replay cost in log (apache#1569) 1. Add replay cost in log 2. Remove verbose http log 3. Recover default to shard based, which is faster in most wal implementation. fix: logs might be missed during RegionBased replay in the WAL based on local disk (apache#1570) In RegionBased replay, a batch of logs is first scanned from the WAL, and then replayed on various tables using multiple threads. This approach works fine for WALs based on tables, as the logs for each table are clustered together. However, in a WAL based on local disk, the logs for each table may be scattered across different positions within the batch. During multi-threaded replay, it is possible that for a given table, log2 is replayed before log1, resulting in missed logs. 1. Modify `split_log_batch_by_table` function to aggregate all logs for a table together. 2. Modify `tableBatch` struct to change a single range into a `Vec<Range>`. Manual testing. fix format.

jiacai2050 added feature New feature or request good first issue Good for newcomers A-SQL Area: SQL layer labels Jan 10, 2023

jiacai2050 assigned jiacai2050 and caicancai and unassigned jiacai2050 Dec 5, 2023

jiacai2050 unassigned caicancai Jan 5, 2024

jiacai2050 added contributor friendly Good for contribution and removed good first issue Good for newcomers labels Jan 5, 2024

jiacai2050 assigned dracoooooo May 16, 2024

dracoooooo mentioned this issue May 29, 2024

feat: support INSERT INTO SELECT #1536

Merged

jiacai2050 closed this as completed in #1536 Jul 15, 2024

jiacai2050 closed this as completed in fa5c286 Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support INSERT INTO SELECT #557

Support INSERT INTO SELECT #557

jiacai2050 commented Jan 10, 2023 •

edited

Loading

caicancai commented Dec 5, 2023 •

edited

Loading

jiacai2050 commented Dec 5, 2023 •

edited

Loading

jiacai2050 commented Dec 29, 2023

caicancai commented Dec 29, 2023

jiacai2050 commented Dec 29, 2023 •

edited

Loading

jiacai2050 commented Dec 29, 2023

caicancai commented Jan 3, 2024

jiacai2050 commented Jan 5, 2024 •

edited

Loading

dracoooooo commented May 16, 2024

jiacai2050 commented May 16, 2024

Support INSERT INTO SELECT #557

Support INSERT INTO SELECT #557

Comments

jiacai2050 commented Jan 10, 2023 • edited Loading

Describe This Problem

Proposal

Additional Context

caicancai commented Dec 5, 2023 • edited Loading

jiacai2050 commented Dec 5, 2023 • edited Loading

jiacai2050 commented Dec 29, 2023

caicancai commented Dec 29, 2023

jiacai2050 commented Dec 29, 2023 • edited Loading

jiacai2050 commented Dec 29, 2023

caicancai commented Jan 3, 2024

jiacai2050 commented Jan 5, 2024 • edited Loading

dracoooooo commented May 16, 2024

jiacai2050 commented May 16, 2024

jiacai2050 commented Jan 10, 2023 •

edited

Loading

caicancai commented Dec 5, 2023 •

edited

Loading

jiacai2050 commented Dec 5, 2023 •

edited

Loading

jiacai2050 commented Dec 29, 2023 •

edited

Loading

jiacai2050 commented Jan 5, 2024 •

edited

Loading