Move SortKeyCursor and RowIndex into modules, add sort_key_cursor test#2645
Move SortKeyCursor and RowIndex into modules, add sort_key_cursor test#2645alamb merged 7 commits intoapache:masterfrom
SortKeyCursor and RowIndex into modules, add sort_key_cursor test#2645Conversation
| use std::cmp::Ordering; | ||
| use std::sync::Arc; | ||
|
|
||
| /// A `SortKeyCursor` is created from a `RecordBatch`, and a set of |
There was a problem hiding this comment.
This code is moved, and I added stream_id() and batch_idx() accessors
| @@ -0,0 +1,42 @@ | |||
| /// A `RowIndex` identifies a specific row in a logical stream. | |||
| /// | |||
| /// Each stream is identified by an `stream_idx` and is formed from a | |||
There was a problem hiding this comment.
This code is moved into a new module, and I added some comments / ascii art to help my future self not have to read the code the next time
| pub mod sort; | ||
| pub mod sort_preserving_merge; | ||
|
|
||
| /// A `SortKeyCursor` is created from a `RecordBatch`, and a set of |
There was a problem hiding this comment.
This code is just moved
| @@ -0,0 +1,135 @@ | |||
| //! Contains tests for SortKeyCursor | |||
There was a problem hiding this comment.
These tests are new -- I plan to expand them over time
| use datafusion_physical_expr::expressions::col; | ||
|
|
||
| /// Compares [`RowIndex`]es with a vector of strings, the result of | ||
| /// pretty formatting the [`RowIndex`]es. This is a macro so errors |
There was a problem hiding this comment.
I mean unwrap() will give you a stack trace... I'm not a massive fan of macros as they are hard to read, and slow to compile...
|
|
||
| /// Runs the two cursors to completion, sorting them, and | ||
| /// returning the sorted order of rows that would have produced | ||
| fn run(cursor1: &mut SortKeyCursor, cursor2: &mut SortKeyCursor) -> Vec<RowIndex> { |
There was a problem hiding this comment.
This code doesn't seem to be being used by SortPreservingMerge, is this intentional?
There was a problem hiding this comment.
At the moment, yes -- I want to unit test SoryKeyCursor. SortPreservingMerge already has reasonable test coverage in my mind
| let mut cursor2 = | ||
| SortKeyCursor::new(2, 0, &batch2, &sort_key, Arc::clone(&sort_options)).unwrap(); | ||
|
|
||
| let expected = vec![ |
There was a problem hiding this comment.
If you were feeling fancy you could test stability, i.e. ties are broken by the lower stream idx
Which issue does this PR close?
re : #2427
Rationale for this change
SortKeyCursor::compareis the bottleneck for merge performance (details on #2427 (comment))As I prepare to improve the performance of this code (and make it more complicated), I want it more isolated and independent so I can:
What changes are included in this PR?
SortKeyCursorinto its own moduleRowIndexinto its own modulesort_key_cursortestAre there any user-facing changes?
No
Does this PR break compatibility with Ballista?
no