Skip to content

Conversation

@wesm
Copy link
Member

@wesm wesm commented May 21, 2020

This patch is a major reworking of our development strategy for implementing array-valued functions and applying them in a query processing setting.

The design was partly inspired by my previous work designing Ibis (https://github.com/ibis-project/ibis -- the "expr" subsystem and the way that operators validate input types and resolve output types). Using only function names and input types, you can determine the output types of each function and resolve the "execute" function that performs a unit of work processing a batch of data. This will allow us to build deferred column expressions and then (eventually) do parallel execution.

There are a ton of details, but one nice thing is that there is now a single API entry point for invoking any function by its name:

Result<Datum> CallFunction(ExecContext* ctx, const std::string& func_name,
                           const std::vector<Datum>& args,
                           const FunctionOptions* options = NULLPTR);

What occurs when you do this:

  • A Function instance is looked up in the global FunctionRegistry
  • Given the descriptors of args (their types and shapes -- array or scalar), the Function searches for Kernel that is able to process those types and shapes. A kernel might be able to do array[T0], array[T1] or only scalar[T0], scalar[T1], for example. This permits kernel specialization to treat different type and shape combinations
  • The kernel is executed iteratively against args based on what args contains -- if there are ChunkedArrays, they will be split into contiguous pieces. Kernels never see ChunkedArray, only Array or Scalar
  • The Executor implementation is able to split contiguous Array inputs into smaller chunks, which is important for parallel execution. See ExecContext::set_exec_chunksize

To summarize: the REGISTRY contains FUNCTIONS. A FUNCTION contains KERNELS. A KERNEL is a specific implementation of a function that services a particular type combination.

An additional effort in this patch is to radically simplify the process of creating kernels that are based on a scalar function. To do this, there is a growing collection of template-based kernel generation classes in compute/kernels/codegen_internal.h that will surely be the topic of much debate. I want to make it a lot easier for people to add new kernels.

There are some other incidental changes in the PR, such as changing the convenience APIs like Cast to return Result. I'm afraid we may have to live with the API breakage unless someone else wants to add backward compatibility code for the old APIs.

I have to apologize for making such a large PR. I've been working long hours on this for nearly a month and the process of porting all of our existing functionality and making the unit tests pass caused much iteration in the "framework" part of the code, such that it would have been a huge time drain to review incomplete iterations of the framework that had not been proven to capture the functionality that previously existed in the project.

Given the size of this PR and that fact that it completely blocks any work into src/arrow/compute, I don't think we should let this sit unmerged for more than 4 or 5 days, tops. I'm committed to responding to all of your questions and working to address your feedback about the design and improving the documentation and code comments. I tried to leave copious comments to explain my thought process in various places. Feel free to make any and all comments in this PR or in whatever form you like. I don't think that merging should be blocked on stylistic issues.

@wesm
Copy link
Member Author

wesm commented May 21, 2020

For code reviewers:

  • The code diff is a bit deceiving because many kernel files were renamed to make their taxonomy more clear
  • I suggest focusing your energies on the files in the arrow/compute top level directory. This is almost entirely new code, what was there before is almost all gone

I am aware that there are some obviously messy things that will need to be cleaned up a bit, but we have to start somewhere.

@wesm wesm force-pushed the ARROW-8792-kernels-revamp branch from d1dca6d to eb7f776 Compare May 21, 2020 04:57
@github-actions
Copy link

@wesm
Copy link
Member Author

wesm commented May 21, 2020

Aside from fixing the builds, I'm going to do a few things to help with this:

  • Add as many comments as possible to the new header files
  • Add some high level explanation to compute/README.md about how kernels work and how to begin to develop a new one
  • Add simple Python bindings for FunctionRegistry, Function, and Kernel so that we can inspect the contents of the registry and the available kernels (and their type signatures) for each function

Copy link
Member Author

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving a handful of cleaning TODOs for myself

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions that came while reading through the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird to have this as part of the cast options.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm aware, but I wasn't able to figure out another way to communicate the output type to the standard kernel APIs

@nealrichardson
Copy link
Member

The R failure:

  ── 1. Error: filter() on timestamp columns (@test-dataset.R#381)  ──────────────
  NotImplemented: Function >= has no kernel matching input types (array[timestamp[us, tz=UTC]], scalar[timestamp[us, tz=UTC]])
  Backtrace:
    1. arrow:::expect_equivalent(...)
    2. dplyr::filter(., ts >= lubridate::ymd_hms("2015-05-04 03:12:39"))
    2. dplyr::filter(., part == 1)
    2. dplyr::select(., ts)
   10. dplyr::collect(.)
   13. Scanner$create(x)$ToTable()
   16. arrow:::dataset___Scanner__ToTable(self)

I'm happy to help debug but that sounds pretty straightforward.

@wesm
Copy link
Member Author

wesm commented May 21, 2020

@nealrichardson yeah, I will fix (wasn't thinking about timezones when adding the dispatch rules for comparisons)

@kou
Copy link
Member

kou commented May 21, 2020

Port GLib bindings

Can I start working on this on this branch?

@wesm
Copy link
Member Author

wesm commented May 21, 2020

@kou yes please go ahead, I'm sure you'll be careful about rebasing since I'm going to push some commits to the branch later today probably in response to the comments above

@kou
Copy link
Member

kou commented May 21, 2020

OK!

@wesm wesm force-pushed the ARROW-8792-kernels-revamp branch from eb7f776 to b3286d3 Compare May 22, 2020 01:37
@wesm
Copy link
Member Author

wesm commented May 22, 2020

I rebased without pushing any new changes

@emkornfield
Copy link
Contributor

CC @brills I'm not sure if this impacts TFX at all.

@wesm
Copy link
Member Author

wesm commented May 22, 2020

@emkornfield @brills no Python APIs are changed (not sure how much of the C++ API TFX uses)


/// \brief Contains the number of required arguments for the function
struct ARROW_EXPORT FunctionArity {
static FunctionArity Nullary() { return FunctionArity(0, false); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Void or Null for name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static FunctionArity Unary() { return FunctionArity(1, false); }
static FunctionArity Binary() { return FunctionArity(2, false); }
static FunctionArity Ternary() { return FunctionArity(3, false); }
static FunctionArity Varargs(int min_args = 1) { return FunctionArity(min_args, true); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: VarArgs

It also seems like it might be more readable for callers to have to always specify min_args (especially because it isn't clear to me why 1 is the default and not 0 or 2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need at least one argument spec for type checking

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm changing the name, but will leave the default issue for follow up work

@wesm
Copy link
Member Author

wesm commented May 22, 2020

I got sucked into some other things today, I'll focus on getting the CI passing tomorrow so we can hopefully go into the weekend with a green build

… bindings for FilterOptions for Python, support to call_function
@wesm wesm force-pushed the ARROW-8792-kernels-revamp branch from 1f9b758 to 877618f Compare May 24, 2020 13:13
@wesm
Copy link
Member Author

wesm commented May 24, 2020

@jorisvandenbossche I fixed the segfault you hit and now have changed the implementation of Array.filter to be

pc = _pc()
options = pc.FilterOptions(null_selection_behavior)
return pc.call_function('filter', [self, mask], options)

I also rebased -- I am going to merge this when the build is passing so we do not continue to stack changes that need to be reviewed on this PR. I welcome further code reviews on this PR or follow up JIRAs

@wesm
Copy link
Member Author

wesm commented May 24, 2020

+1

@wesm wesm closed this in 7ad49ee May 24, 2020
@wesm wesm reopened this May 24, 2020
@wesm
Copy link
Member Author

wesm commented May 24, 2020

The PR has been merged, but the code review feature is still available. Please continue to leave comments and I will address them

pprudhvi pushed a commit to pprudhvi/arrow that referenced this pull request May 26, 2020
…tation and execution framework

This patch is a major reworking of our development strategy for implementing array-valued functions and applying them in a query processing setting.

The design was partly inspired by my previous work designing Ibis (https://github.com/ibis-project/ibis -- the "expr" subsystem and the way that operators validate input types and resolve output types). Using only function names and input types, you can determine the output types of each function and resolve the "execute" function that performs a unit of work processing a batch of data. This will allow us to build deferred column expressions and then (eventually) do parallel execution.

There are a ton of details, but one nice thing is that there is now a single API entry point for invoking any function by its name:

```c++
Result<Datum> CallFunction(ExecContext* ctx, const std::string& func_name,
                           const std::vector<Datum>& args,
                           const FunctionOptions* options = NULLPTR);
```

What occurs when you do this:

* A `Function` instance is looked up in the global `FunctionRegistry`
* Given the descriptors of `args` (their types and shapes -- array or scalar), the Function searches for `Kernel` that is able to process those types and shapes. A kernel might be able to do `array[T0], array[T1]` or only `scalar[T0], scalar[T1]`, for example. This permits kernel specialization to treat different type and shape combinations
* The kernel is executed iteratively against `args` based on what `args` contains -- if there are ChunkedArrays, they will be split into contiguous pieces. Kernels never see ChunkedArray, only Array or Scalar
* The Executor implementation is able to split contiguous Array inputs into smaller chunks, which is important for parallel execution. See `ExecContext::set_exec_chunksize`

To summarize: the REGISTRY contains FUNCTIONS. A FUNCTION contains KERNELS. A KERNEL is a specific implementation of a function that services a particular type combination.

An additional effort in this patch is to radically simplify the process of creating kernels that are based on a scalar function. To do this, there is a growing collection of template-based kernel generation classes in compute/kernels/codegen_internal.h that will surely be the topic of much debate. I want to make it a lot easier for people to add new kernels.

There are some other incidental changes in the PR, such as changing the convenience APIs like `Cast` to return `Result`. I'm afraid we may have to live with the API breakage unless someone else wants to add backward compatibility code for the old APIs.

I have to apologize for making such a large PR. I've been working long hours on this for nearly a month and the process of porting all of our existing functionality and making the unit tests pass caused much iteration in the "framework" part of the code, such that it would have been a huge time drain to review incomplete iterations of the framework that had not been proven to capture the functionality that previously existed in the project.

Given the size of this PR and that fact that it completely blocks any work into src/arrow/compute, I don't think we should let this sit unmerged for more than 4 or 5 days, tops. I'm committed to responding to all of your questions and working to address your feedback about the design and improving the documentation and code comments. I tried to leave copious comments to explain my thought process in various places. Feel free to make any and all comments in this PR or in whatever form you like. I don't think that merging should be blocked on stylistic issues.

Closes apache#7240 from wesm/ARROW-8792-kernels-revamp

Lead-authored-by: Wes McKinney <wesm+git@apache.org>
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
wesm added a commit that referenced this pull request May 28, 2020
…nd outdated language

I also fixed the `Arity` ctor to be explicit which I thought I'd done in #7240 but it is taken care of here.

Closes #7264 from wesm/ARROW-8926

Authored-by: Wes McKinney <wesm+git@apache.org>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
@wesm
Copy link
Member Author

wesm commented May 31, 2020

I'll leave this PR open until next week sometime to allow more time for comments

@wesm
Copy link
Member Author

wesm commented Jun 7, 2020

Closing the PR again. Please open JIRA issues or review other PRs to provide more feedback.

@wesm wesm closed this Jun 7, 2020
@wesm wesm deleted the ARROW-8792-kernels-revamp branch June 7, 2020 17:13
pitrou pushed a commit to kou/arrow that referenced this pull request Nov 24, 2020
Summary:

  * Deprecate SortToIndices()

  * Add SortIndices() because we use "sort_indices" as kernel name in
    apache#7240

  * Add support for sort indices in descending order on Array

  * Rename existing "sort_indices" kernel to "array_sort_indices" and
    introduce "sort_indices" meta function to support RecordBatch and
    Table

Benchmark:

Summary:

  * No performance regression in existing sort on array

  * Same performance as sort on array when the number of sort keys is 1

  * 1.5x-100x slower than sort on array when the number of sort keys is 2

Before:

    ----------------------------------------------------------------------------------
    Benchmark                                                     Time             CPU
    ----------------------------------------------------------------------------------
    SortToIndicesInt64Count/32768/10000/min_time:1.000        15685 ns        15682 ns
    SortToIndicesInt64Count/32768/100/min_time:1.000          15961 ns        15957 ns
    SortToIndicesInt64Count/32768/10/min_time:1.000           16473 ns        16469 ns
    SortToIndicesInt64Count/32768/2/min_time:1.000            27993 ns        27987 ns
    SortToIndicesInt64Count/32768/1/min_time:1.000             5609 ns         5608 ns
    SortToIndicesInt64Count/32768/0/min_time:1.000            13143 ns        13141 ns
    SortToIndicesInt64Count/1048576/1/min_time:1.000         134695 ns       134670 ns
    SortToIndicesInt64Count/8388608/1/min_time:1.000        1243992 ns      1243260 ns
    SortToIndicesInt64Compare/32768/10000/min_time:1.000     193632 ns       193595 ns
    SortToIndicesInt64Compare/32768/100/min_time:1.000       194876 ns       194837 ns
    SortToIndicesInt64Compare/32768/10/min_time:1.000        182362 ns       182324 ns
    SortToIndicesInt64Compare/32768/2/min_time:1.000         111607 ns       111584 ns
    SortToIndicesInt64Compare/32768/1/min_time:1.000           5642 ns         5641 ns
    SortToIndicesInt64Compare/32768/0/min_time:1.000         190302 ns       190268 ns
    SortToIndicesInt64Compare/1048576/1/min_time:1.000       134743 ns       134718 ns
    SortToIndicesInt64Compare/8388608/1/min_time:1.000      1261404 ns      1249362 ns

After:

    -------------------------------------------------------------------------------------
    Benchmark                                                        Time             CPU
    -------------------------------------------------------------------------------------
    ArraySortIndicesInt64Count/32768/10000/min_time:1.000        14769 ns        14766 ns
    ArraySortIndicesInt64Count/32768/100/min_time:1.000          15207 ns        15204 ns
    ArraySortIndicesInt64Count/32768/10/min_time:1.000           15892 ns        15889 ns
    ArraySortIndicesInt64Count/32768/2/min_time:1.000            30107 ns        30100 ns
    ArraySortIndicesInt64Count/32768/1/min_time:1.000             5509 ns         5508 ns
    ArraySortIndicesInt64Count/32768/0/min_time:1.000            12699 ns        12696 ns
    ArraySortIndicesInt64Count/1048576/1/min_time:1.000         132585 ns       132557 ns
    ArraySortIndicesInt64Count/8388608/1/min_time:1.000        1236596 ns      1235842 ns
    ArraySortIndicesInt64Compare/32768/10000/min_time:1.000     193259 ns       193217 ns
    ArraySortIndicesInt64Compare/32768/100/min_time:1.000       190010 ns       189973 ns
    ArraySortIndicesInt64Compare/32768/10/min_time:1.000        179923 ns       179879 ns
    ArraySortIndicesInt64Compare/32768/2/min_time:1.000         111100 ns       111074 ns
    ArraySortIndicesInt64Compare/32768/1/min_time:1.000           5660 ns         5659 ns
    ArraySortIndicesInt64Compare/32768/0/min_time:1.000         186521 ns       186476 ns
    ArraySortIndicesInt64Compare/1048576/1/min_time:1.000       132863 ns       132832 ns
    ArraySortIndicesInt64Compare/8388608/1/min_time:1.000      1266383 ns      1265765 ns
    TableSortIndicesInt64Count/32768/10000/min_time:1.000        21812 ns        21807 ns
    TableSortIndicesInt64Count/32768/100/min_time:1.000          22494 ns        22490 ns
    TableSortIndicesInt64Count/32768/10/min_time:1.000           17300 ns        17296 ns
    TableSortIndicesInt64Count/32768/2/min_time:1.000            29927 ns        29921 ns
    TableSortIndicesInt64Count/32768/1/min_time:1.000             5877 ns         5875 ns
    TableSortIndicesInt64Count/32768/0/min_time:1.000            20394 ns        20390 ns
    TableSortIndicesInt64Count/1048576/1/min_time:1.000         132904 ns       132871 ns
    TableSortIndicesInt64Count/8388608/1/min_time:1.000        1342693 ns      1341943 ns
    TableSortIndicesInt64Compare/32768/10000/min_time:1.000     203163 ns       203106 ns
    TableSortIndicesInt64Compare/32768/100/min_time:1.000       199531 ns       199477 ns
    TableSortIndicesInt64Compare/32768/10/min_time:1.000        185968 ns       185916 ns
    TableSortIndicesInt64Compare/32768/2/min_time:1.000         113571 ns       113540 ns
    TableSortIndicesInt64Compare/32768/1/min_time:1.000           6251 ns         6249 ns
    TableSortIndicesInt64Compare/32768/0/min_time:1.000         183650 ns       183609 ns
    TableSortIndicesInt64Compare/1048576/1/min_time:1.000       131701 ns       131674 ns
    TableSortIndicesInt64Compare/8388608/1/min_time:1.000      1264413 ns      1263622 ns
    TableSortIndicesInt64Int64/32768/10000/min_time:1.000       313368 ns       313310 ns
    TableSortIndicesInt64Int64/32768/100/min_time:1.000         313425 ns       313361 ns
    TableSortIndicesInt64Int64/32768/10/min_time:1.000          337051 ns       336987 ns
    TableSortIndicesInt64Int64/32768/2/min_time:1.000           402063 ns       401973 ns
    TableSortIndicesInt64Int64/32768/1/min_time:1.000           254660 ns       254612 ns
    TableSortIndicesInt64Int64/32768/0/min_time:1.000           305887 ns       305815 ns
    TableSortIndicesInt64Int64/1048576/1/min_time:1.000       11157920 ns     11155020 ns
    TableSortIndicesInt64Int64/8388608/1/min_time:1.000      106529839 ns    106501576 ns

Follow-up tasks:

  * Improve performance when the number of sort keys is 2 or greater

    * Idea1: Use counting sort for small range data like on array

    * Idea2: Use threading and merge sort

  * Add support multi-column partition Nth indices on Table
pitrou added a commit that referenced this pull request Nov 24, 2020
Summary:

  * Deprecate SortToIndices()

  * Add SortIndices() because we use "sort_indices" as kernel name in
    #7240

  * Add support for sort indices in descending order on Array

  * Rename existing "sort_indices" kernel to "array_sort_indices" and
    introduce "sort_indices" meta function to support ChunkedArray,
    RecordBatch and Table

  * Require benchmark 1.5.2 or later

Benchmark:

Summary:

  * No performance regression in existing sort on array

  * Same performance as sort on array when the number of sort keys is 1

  * 1.5x-100x slower than sort on array when the number of sort keys is 2

Before:

    ----------------------------------------------------------------------------------
    Benchmark                                                     Time             CPU
    ----------------------------------------------------------------------------------
    SortToIndicesInt64Count/32768/10000/min_time:1.000        16057 ns        16054 ns
    SortToIndicesInt64Count/32768/100/min_time:1.000          16592 ns        16589 ns
    SortToIndicesInt64Count/32768/10/min_time:1.000           15979 ns        15975 ns
    SortToIndicesInt64Count/32768/2/min_time:1.000            28379 ns        28372 ns
    SortToIndicesInt64Count/32768/1/min_time:1.000             5767 ns         5766 ns
    SortToIndicesInt64Count/32768/0/min_time:1.000            12560 ns        12557 ns
    SortToIndicesInt64Count/1048576/100/min_time:1.000       729683 ns       729497 ns
    SortToIndicesInt64Count/8388608/100/min_time:1.000      6696415 ns      6693711 ns
    SortToIndicesInt64Compare/32768/10000/min_time:1.000     190488 ns       190442 ns
    SortToIndicesInt64Compare/32768/100/min_time:1.000       190879 ns       190838 ns
    SortToIndicesInt64Compare/32768/10/min_time:1.000        179156 ns       179115 ns
    SortToIndicesInt64Compare/32768/2/min_time:1.000         109098 ns       109074 ns
    SortToIndicesInt64Compare/32768/1/min_time:1.000           5682 ns         5681 ns
    SortToIndicesInt64Compare/32768/0/min_time:1.000         185022 ns       184984 ns
    SortToIndicesInt64Compare/1048576/100/min_time:1.000    9275270 ns      9273414 ns
    SortToIndicesInt64Compare/8388608/100/min_time:1.000   93126006 ns     93095276 ns

After:

    ------------------------------------------------------------------------------------------
    Benchmark                                                             Time             CPU
    ------------------------------------------------------------------------------------------
    ArraySortIndicesInt64Narrow/32768/10000/min_time:1.000            15722 ns        15721 ns
    ArraySortIndicesInt64Narrow/32768/100/min_time:1.000              16007 ns        16006 ns
    ArraySortIndicesInt64Narrow/32768/10/min_time:1.000               15178 ns        15177 ns
    ArraySortIndicesInt64Narrow/32768/2/min_time:1.000                29886 ns        29885 ns
    ArraySortIndicesInt64Narrow/32768/1/min_time:1.000                 5968 ns         5968 ns
    ArraySortIndicesInt64Narrow/32768/0/min_time:1.000                12681 ns        12681 ns
    ArraySortIndicesInt64Narrow/1048576/100/min_time:1.000           813376 ns       813331 ns
    ArraySortIndicesInt64Narrow/8388608/100/min_time:1.000          6543874 ns      6543621 ns
    ArraySortIndicesInt64Wide/32768/10000/min_time:1.000             189957 ns       189956 ns
    ArraySortIndicesInt64Wide/32768/100/min_time:1.000               190269 ns       190267 ns
    ArraySortIndicesInt64Wide/32768/10/min_time:1.000                175425 ns       175422 ns
    ArraySortIndicesInt64Wide/32768/2/min_time:1.000                 107976 ns       107975 ns
    ArraySortIndicesInt64Wide/32768/1/min_time:1.000                   5941 ns         5941 ns
    ArraySortIndicesInt64Wide/32768/0/min_time:1.000                 184858 ns       184857 ns
    ArraySortIndicesInt64Wide/1048576/100/min_time:1.000            9355194 ns      9354942 ns
    ArraySortIndicesInt64Wide/8388608/100/min_time:1.000           94101796 ns     94099551 ns
    TableSortIndicesInt64Narrow/1048576/100/16/32/min_time:1.000 1386231938 ns   1386176541 ns
    TableSortIndicesInt64Narrow/1048576/0/16/32/min_time:1.000   1350031141 ns   1349982623 ns
    TableSortIndicesInt64Narrow/1048576/100/8/32/min_time:1.000  1401741018 ns   1401632081 ns
    TableSortIndicesInt64Narrow/1048576/0/8/32/min_time:1.000    1373174328 ns   1373145968 ns
    TableSortIndicesInt64Narrow/1048576/100/2/32/min_time:1.000  1035377805 ns   1035362806 ns
    TableSortIndicesInt64Narrow/1048576/0/2/32/min_time:1.000    1035218085 ns   1035201824 ns
    TableSortIndicesInt64Narrow/1048576/100/1/32/min_time:1.000     6859319 ns      6859251 ns
    TableSortIndicesInt64Narrow/1048576/0/1/32/min_time:1.000       6847314 ns      6847048 ns
    TableSortIndicesInt64Narrow/1048576/100/16/4/min_time:1.000   626909516 ns    626904696 ns
    TableSortIndicesInt64Narrow/1048576/0/16/4/min_time:1.000     591632035 ns    591602144 ns
    TableSortIndicesInt64Narrow/1048576/100/8/4/min_time:1.000    613197155 ns    613182727 ns
    TableSortIndicesInt64Narrow/1048576/0/8/4/min_time:1.000      595568690 ns    595554829 ns
    TableSortIndicesInt64Narrow/1048576/100/2/4/min_time:1.000    424472984 ns    424453915 ns
    TableSortIndicesInt64Narrow/1048576/0/2/4/min_time:1.000      397339109 ns    397335604 ns
    TableSortIndicesInt64Narrow/1048576/100/1/4/min_time:1.000      7027310 ns      7027241 ns
    TableSortIndicesInt64Narrow/1048576/0/1/4/min_time:1.000        6891364 ns      6891272 ns
    TableSortIndicesInt64Narrow/1048576/100/16/1/min_time:1.000   516832749 ns    516823293 ns
    TableSortIndicesInt64Narrow/1048576/0/16/1/min_time:1.000     495273237 ns    495269975 ns
    TableSortIndicesInt64Narrow/1048576/100/8/1/min_time:1.000    515550360 ns    515531501 ns
    TableSortIndicesInt64Narrow/1048576/0/8/1/min_time:1.000      496670125 ns    496663084 ns
    TableSortIndicesInt64Narrow/1048576/100/2/1/min_time:1.000    340060863 ns    340053172 ns
    TableSortIndicesInt64Narrow/1048576/0/2/1/min_time:1.000      331281499 ns    331277004 ns
    TableSortIndicesInt64Narrow/1048576/100/1/1/min_time:1.000      6691062 ns      6690924 ns
    TableSortIndicesInt64Narrow/1048576/0/1/1/min_time:1.000        6384382 ns      6384323 ns
    TableSortIndicesInt64Wide/1048576/100/16/32/min_time:1.000    622544467 ns    622531557 ns
    TableSortIndicesInt64Wide/1048576/0/16/32/min_time:1.000      619193843 ns    619155597 ns
    TableSortIndicesInt64Wide/1048576/100/8/32/min_time:1.000     615885889 ns    615884764 ns
    TableSortIndicesInt64Wide/1048576/0/8/32/min_time:1.000       589917731 ns    589912005 ns
    TableSortIndicesInt64Wide/1048576/100/2/32/min_time:1.000     600951149 ns    600947256 ns
    TableSortIndicesInt64Wide/1048576/0/2/32/min_time:1.000       592304437 ns    592286953 ns
    TableSortIndicesInt64Wide/1048576/100/1/32/min_time:1.000      98781239 ns     98777123 ns
    TableSortIndicesInt64Wide/1048576/0/1/32/min_time:1.000        94136230 ns     94085863 ns
    TableSortIndicesInt64Wide/1048576/100/16/4/min_time:1.000     232323308 ns    232319347 ns
    TableSortIndicesInt64Wide/1048576/0/16/4/min_time:1.000       223708791 ns    223700834 ns
    TableSortIndicesInt64Wide/1048576/100/8/4/min_time:1.000      221975360 ns    221968035 ns
    TableSortIndicesInt64Wide/1048576/0/8/4/min_time:1.000        218214843 ns    218210306 ns
    TableSortIndicesInt64Wide/1048576/100/2/4/min_time:1.000      224756430 ns    224745751 ns
    TableSortIndicesInt64Wide/1048576/0/2/4/min_time:1.000        220809969 ns    220809044 ns
    TableSortIndicesInt64Wide/1048576/100/1/4/min_time:1.000       96159427 ns     96156899 ns
    TableSortIndicesInt64Wide/1048576/0/1/4/min_time:1.000         92307749 ns     92304526 ns
    TableSortIndicesInt64Wide/1048576/100/16/1/min_time:1.000     166390841 ns    166382427 ns
    TableSortIndicesInt64Wide/1048576/0/16/1/min_time:1.000       164576492 ns    164570581 ns
    TableSortIndicesInt64Wide/1048576/100/8/1/min_time:1.000      165724919 ns    165718584 ns
    TableSortIndicesInt64Wide/1048576/0/8/1/min_time:1.000        164048003 ns    164046222 ns
    TableSortIndicesInt64Wide/1048576/100/2/1/min_time:1.000      170131788 ns    170124950 ns
    TableSortIndicesInt64Wide/1048576/0/2/1/min_time:1.000        170874129 ns    170871245 ns
    TableSortIndicesInt64Wide/1048576/100/1/1/min_time:1.000       92314674 ns     92312953 ns
    TableSortIndicesInt64Wide/1048576/0/1/1/min_time:1.000         92023019 ns     92022643 ns

Follow-up tasks:

  * Improve performance when the number of sort keys is 2 or greater

    * Idea1: Use counting sort for small range data like on array

    * Idea2: Use threading for radix sort

  * Add support for multi-column partition Nth indices on Table

Closes #8612 from kou/cpp-compute-sort-indices-table

Lead-authored-by: Sutou Kouhei <kou@clear-code.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants