Skip to content

Conversation

@ashotshakhkyan
Copy link
Contributor

🚀 🚀 Pull Request

Impact

  • Bug fix (non-breaking change which fixes expected existing functionality)
  • Enhancement/New feature (adds functionality without impacting existing logic)
  • Breaking change (fix or feature that would cause existing functionality to change)

Description

Things to be aware of

Things to worry about

Additional Context

Copilot AI review requested due to automatic review settings December 24, 2025 15:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the commit path by implementing asynchronous batch insertion with promise-based tracking and improving row counting logic. The changes focus on reducing synchronization overhead during data insertion operations.

Key Changes:

  • Implemented asynchronous batch insertion with promise queuing to reduce blocking operations
  • Replaced synchronous row counting with a cached num_total_rows_ counter for better performance
  • Added automatic flush when batch size reaches 512 rows

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cpp/deeplake_pg/table_storage.cpp Removed commented-out code for flushing streamers after batch inserts
cpp/deeplake_pg/table_data_impl.hpp Refactored insertion logic to use asynchronous promises with batching and introduced cached row counting
cpp/deeplake_pg/table_data.hpp Added promise queue and cached row counter, removed num_uncommitted_rows() method
cpp/deeplake_pg/extension_init.cpp Disabled shared memory for refresh operations by default
cpp/deeplake_pg/duckdb_deeplake_scan.cpp Eliminated redundant string construction in UUID parsing error path

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


inline void table_data::create_streamer(int32_t idx, int32_t worker_id)
{
return;
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early return at the beginning of create_streamer prevents the entire function from executing. This makes all streamer creation logic unreachable and likely breaks functionality that depends on streamers being created.

Suggested change
return;

Copilot uses AI. Check for mistakes.
}
num_total_rows_ += nslots;
const auto num_inserts = insert_rows_.begin()->second.size();
if (num_inserts >= 512) {
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number 512 for batch flush threshold should be defined as a named constant (e.g., constexpr size_t batch_flush_threshold = 512;) to improve code maintainability and make it easier to tune this parameter.

Copilot uses AI. Check for mistakes.
insert_promises_.push_back(impl::append_rows(get_dataset(), std::move(deeplake_rows), num_inserts));
}
try {
constexpr size_t max_pending_insert_promises = 1024;
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant is defined inside the function. Consider moving it to class-level scope or a configuration parameter to improve maintainability and allow easier tuning of this threshold.

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link

@khustup2 khustup2 merged commit a0e9696 into main Dec 24, 2025
6 checks passed
@khustup2 khustup2 deleted the optimize_commit_path branch December 24, 2025 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants