Skip to content

Conversation

@swamirishi
Copy link
Contributor

What changes were proposed in this pull request?

In order to rewrite objectId mapping phase in snapshot diff we need tableMergeIterator to avoid randomized gets for each objectId and also removing the requirement for a separate objectIds table to track all the objectIds witnessed while iterating through the keyTable/fileTable/directoryTable.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14053

How was this patch tested?

Existing unit tests should be good since this is already used by SstFileSetReader

@swamirishi swamirishi added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Dec 1, 2025
@adoroszlai adoroszlai changed the title HDDS-14053. Make MinHeapIterator in SstFileSetReader more generic to work with any Autoclosable iterator HDDS-14053. Extract generic MinHeapMergeIterator from SstFileSetReader Dec 2, 2025
@jojochuang jojochuang requested a review from Copilot December 4, 2025 01:06
Copilot finished reviewing on behalf of jojochuang December 4, 2025 01:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extracts a generic MinHeapMergeIterator class from SstFileSetReader to enable its reuse in other parts of the codebase, specifically for rewriting the objectId mapping phase in snapshot diff operations. The refactoring generalizes the heap-based iterator merging logic while maintaining backward compatibility with existing SST file reading functionality.

Key changes:

  • Extracted generic MinHeapMergeIterator abstract class that merges multiple sorted iterators using a min-heap
  • Refactored SstFileSetReader.MultipleSstFileIterator to extend the new generic iterator
  • Added DBStore.getMergeIterator() method to merge multiple database tables

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
MinHeapMergeIterator.java New generic abstract class providing min-heap based merging of multiple sorted iterators with resource management
SstFileSetReader.java Refactored to use the new MinHeapMergeIterator, removing the old HeapEntry inner class
DBStore.java Added default getMergeIterator() method to merge multiple tables into a single iterator
ClosableIterator.java Changed from extending AutoCloseable to Closeable for more specific exception handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

for (int i = 0; i < tableValues.size(); i++) {
tableValues.set(i, keysToMerge.getOrDefault(i, defaultNullValue).getValue());
}
return newKeyValue(key, tableValues);
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tableValues list is reused and mutated on every call to merge(). This means all returned Collection<Object> values from the iterator share the same underlying list, and subsequent calls to next() will modify the contents of previously returned collections. This can lead to subtle bugs if callers retain references to these collections. Consider creating a new list for each merge result: return newKeyValue(key, new ArrayList<>(tableValues));

Suggested change
return newKeyValue(key, tableValues);
return newKeyValue(key, new java.util.ArrayList<>(tableValues));

Copilot uses AI. Check for mistakes.
assert minHeap.peek() != null;
// Get current key from heap
K currentKey = minHeap.peek().getCurrentKey();
// Clear the keys list by setting all entries to null.
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 113 says "Clear the keys list by setting all entries to null" but the code actually calls keys.clear() which removes all entries, not sets them to null. The comment should be updated to: "Clear the keys map from the previous iteration."

Suggested change
// Clear the keys list by setting all entries to null.
// Clear the keys map from the previous iteration.

Copilot uses AI. Check for mistakes.
Comment on lines 230 to 235
/**
* The MultipleSstFileIterator class is an abstract base for iterating over multiple SST files.
* It uses a PriorityQueue to merge keys from all files in sorted order.
* Each file's iterator is wrapped in a HeapEntryWithFileIdx object,
* which ensures stable ordering for identical keys by considering the file index.
* @param <T>
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment still mentions "HeapEntryWithFileIdx" and "stable ordering for identical keys by considering the file index", but the current implementation no longer maintains stable ordering by file index. The HeapEntry class was moved to MinHeapMergeIterator and doesn't use file index for comparison. This comment should be updated to reflect the current behavior.

Copilot uses AI. Check for mistakes.
* using a min-heap. The {@code MinHeapMergeIterator} ensures the merged output is in sorted order
* by repeatedly polling the smallest element from the heap of iterators.
*
* @param <K> the type of keys being merged, must be {@link Comparable}
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class javadoc states that K "must be {@link Comparable}" but this is not accurate. The type parameter K doesn't have a Comparable bound, and the class accepts any type with a Comparator. The documentation should be updated to reflect that K needs a Comparator provided via the constructor.

Suggested change
* @param <K> the type of keys being merged, must be {@link Comparable}
* @param <K> the type of keys being merged; a {@link Comparator} for K must be provided via the constructor

Copilot uses AI. Check for mistakes.
* to a collection of values from the tables
*/
default <KEY> ClosableIterator<KeyValue<KEY, Collection<Object>>> getMergeIterator(
Comparator<KEY> keyComparator, KEY prefix, Table<KEY, Object>... table) {
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The varargs parameter table is not validated for null or empty. If an empty array or null is passed, this will cause issues: an empty array will create an iterator that never returns elements but won't fail gracefully, and null will cause a NullPointerException. Consider adding validation at the start of the method.

Suggested change
Comparator<KEY> keyComparator, KEY prefix, Table<KEY, Object>... table) {
Comparator<KEY> keyComparator, KEY prefix, Table<KEY, Object>... table) {
if (table == null || table.length == 0) {
throw new IllegalArgumentException("At least one table must be provided to getMergeIterator.");
}

Copilot uses AI. Check for mistakes.
Comment on lines +143 to +153
IOException exception = null;
for (int idx = 0; idx < iterators.size(); idx++) {
try {
closeItrAtIndex(idx);
} catch (IOException e) {
exception = e;
}
}
if (exception != null) {
throw new UncheckedIOException(exception);
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The close() method only preserves the last IOException that occurred during cleanup. If multiple iterators throw exceptions during close(), only the last one is thrown, potentially hiding other important errors. Consider collecting all exceptions or using a suppressed exception pattern to preserve all error information.

Suggested change
IOException exception = null;
for (int idx = 0; idx < iterators.size(); idx++) {
try {
closeItrAtIndex(idx);
} catch (IOException e) {
exception = e;
}
}
if (exception != null) {
throw new UncheckedIOException(exception);
IOException firstException = null;
for (int idx = 0; idx < iterators.size(); idx++) {
try {
closeItrAtIndex(idx);
} catch (IOException e) {
if (firstException == null) {
firstException = e;
} else {
firstException.addSuppressed(e);
}
}
}
if (firstException != null) {
throw new UncheckedIOException(firstException);

Copilot uses AI. Check for mistakes.
Comment on lines +142 to +155
public void close() {
IOException exception = null;
for (int idx = 0; idx < iterators.size(); idx++) {
try {
closeItrAtIndex(idx);
} catch (IOException e) {
exception = e;
}
}
if (exception != null) {
throw new UncheckedIOException(exception);
}
}
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The close() method doesn't clear the minHeap after closing all iterators. While the iterators themselves are closed and set to null in the list, HeapEntry objects remain in the minHeap, preventing proper garbage collection and potentially causing issues if the iterator is accidentally used after close().

Copilot uses AI. Check for mistakes.
KeyValue<KEY, Object> defaultNullValue = newKeyValue(null, null);
Comparator<KeyValue<KEY, Object>> comparator = Comparator.comparing(KeyValue::getKey, keyComparator);
return new MinHeapMergeIterator<KeyValue<KEY, Object>, Table.KeyValueIterator<KEY, Object>,
KeyValue<KEY, Collection<Object>>>(table.length + 1, comparator) {
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor creates a PriorityQueue with size Math.max(numberOfIterators, 1) but then creates a MinHeapMergeIterator with table.length + 1 on line 202. This results in a heap size one larger than the number of tables, which is wasteful and inconsistent. The size should be table.length not table.length + 1.

Suggested change
KeyValue<KEY, Collection<Object>>>(table.length + 1, comparator) {
KeyValue<KEY, Collection<Object>>>(table.length, comparator) {

Copilot uses AI. Check for mistakes.
iterators.set(idx, null);
}
}

Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method overrides ClosableIterator.close; it is advisable to add an Override annotation.

Suggested change
@Override

Copilot uses AI. Check for mistakes.
Change-Id: I2118dca4d36001dcb80be870e2ac0ff34228892c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant