-
Notifications
You must be signed in to change notification settings - Fork 701
Optimize commit path #3095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize commit path #3095
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes the commit path by implementing asynchronous batch insertion with promise-based tracking and improving row counting logic. The changes focus on reducing synchronization overhead during data insertion operations.
Key Changes:
- Implemented asynchronous batch insertion with promise queuing to reduce blocking operations
- Replaced synchronous row counting with a cached
num_total_rows_counter for better performance - Added automatic flush when batch size reaches 512 rows
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| cpp/deeplake_pg/table_storage.cpp | Removed commented-out code for flushing streamers after batch inserts |
| cpp/deeplake_pg/table_data_impl.hpp | Refactored insertion logic to use asynchronous promises with batching and introduced cached row counting |
| cpp/deeplake_pg/table_data.hpp | Added promise queue and cached row counter, removed num_uncommitted_rows() method |
| cpp/deeplake_pg/extension_init.cpp | Disabled shared memory for refresh operations by default |
| cpp/deeplake_pg/duckdb_deeplake_scan.cpp | Eliminated redundant string construction in UUID parsing error path |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| inline void table_data::create_streamer(int32_t idx, int32_t worker_id) | ||
| { | ||
| return; |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early return at the beginning of create_streamer prevents the entire function from executing. This makes all streamer creation logic unreachable and likely breaks functionality that depends on streamers being created.
| return; |
| } | ||
| num_total_rows_ += nslots; | ||
| const auto num_inserts = insert_rows_.begin()->second.size(); | ||
| if (num_inserts >= 512) { |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic number 512 for batch flush threshold should be defined as a named constant (e.g., constexpr size_t batch_flush_threshold = 512;) to improve code maintainability and make it easier to tune this parameter.
| insert_promises_.push_back(impl::append_rows(get_dataset(), std::move(deeplake_rows), num_inserts)); | ||
| } | ||
| try { | ||
| constexpr size_t max_pending_insert_promises = 1024; |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constant is defined inside the function. Consider moving it to class-level scope or a configuration parameter to improve maintainability and allow easier tuning of this threshold.
7bc3aaa to
12224db
Compare
|



🚀 🚀 Pull Request
Impact
Description
Things to be aware of
Things to worry about
Additional Context