Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Arrow Interop Benchmarks #17194

Open
wants to merge 24 commits into
base: branch-24.12
Choose a base branch
from

Conversation

lamarrr
Copy link
Contributor

@lamarrr lamarrr commented Oct 28, 2024

Description

This merge request adds benchmarks for the Arrow Interop APIs:

  • from_arrow_host
  • to_arrow_host
  • from_arrow_device
  • to_arrow_device

Closes #17104

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Oct 28, 2024
@lamarrr lamarrr added feature request New feature or request non-breaking Non-breaking change labels Oct 28, 2024
@lamarrr lamarrr changed the title Arrow Interop benchmarks Arrow Interop Benchmarks Oct 28, 2024
@lamarrr lamarrr changed the title Arrow Interop Benchmarks Added Arrow Interop Benchmarks Oct 28, 2024
@vyasr
Copy link
Contributor

vyasr commented Oct 31, 2024

Could you post some results? I'm expecting to_arrow_device to be almost free and to_arrow_host to be bounded by memcpy basically.

@lamarrr lamarrr marked this pull request as ready for review November 4, 2024 18:07
@lamarrr lamarrr requested a review from a team as a code owner November 4, 2024 18:07
@lamarrr lamarrr requested a review from ttnghia November 4, 2024 18:07
@ttnghia
Copy link
Contributor

ttnghia commented Nov 4, 2024

Can you post the benchmark comparison output by NVBENCH compare tool please? We cannot see anything from just the numbers like these.

@lamarrr
Copy link
Contributor Author

lamarrr commented Nov 4, 2024

Can you post the benchmark comparison output by NVBENCH compare tool please? We cannot see anything from just the numbers like these.

sure!

@lamarrr lamarrr marked this pull request as draft November 4, 2024 18:18
@lamarrr lamarrr marked this pull request as ready for review November 6, 2024 03:31
@GregoryKimball
Copy link
Contributor

I think this work is great to merge! Nice work

Some follow-ups I would consider:

  • why is LIST slower than flat types in from_arrow_device?
  • is it possible (and reasonable) to add support for DECIMAL32 and DECIMAL64?

void BM_to_arrow_device(nvbench::state& state, nvbench::type_list<nvbench::enum_type<data_type>>)
{
auto const num_rows = static_cast<cudf::size_type>(state.get_int64("num_rows"));
int32_t const num_columns = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend making this an axis parameter with a default value of 1.
Example:

 .add_int64_axis("num_cols", {1});

Then it could be modified from the command-line without changing the source code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add support for multiple columns initially. I've added that now and also updated the benchmarks with numbers for DECIMAL32 and DECIMAL64. thanks!

cc: @GregoryKimball

lamarrr and others added 2 commits November 15, 2024 10:31
Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>
@GregoryKimball
Copy link
Contributor

@karthikeyann would you please share your review?

@PointKernel
Copy link
Member

Including extensive benchmark results in the PR description may not be ideal, as they will become part of the squash commit message.

Also, @lamarrr could you please update the PR description to outline the changes introduced in this PR?

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Will approve it once the PR description is updated.

cpp/benchmarks/interop/interop.cpp Outdated Show resolved Hide resolved
Co-authored-by: Yunsong Wang <yunsongw@nvidia.com>
Comment on lines +40 to +42
std::vector<cudf::type_id> types;

std::fill_n(std::back_inserter(types), num_columns, data_type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: similar suggestion at other places too.

Suggested change
std::vector<cudf::type_id> types;
std::fill_n(std::back_inserter(types), num_columns, data_type);
std::vector<cudf::type_id> types(num_columns, data_type);

is this better than using fill_n with back_inserter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't trust std::vector's constructors, they easily become ambiguous and do the wrong thing

Comment on lines +105 to +109
std::vector<cudf::column_metadata> children_metadata;
std::fill_n(std::back_inserter(children_metadata),
table->get_column(column).num_children(),
cudf::column_metadata{""});
column_metadata.children_meta = children_metadata;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar suggestion here to use vector constructor.

Suggested change
std::vector<cudf::column_metadata> children_metadata;
std::fill_n(std::back_inserter(children_metadata),
table->get_column(column).num_children(),
cudf::column_metadata{""});
column_metadata.children_meta = children_metadata;
auto num_children = table->get_column(column).num_children();
column_metadata.children_meta = std::vector<cudf::column_metadata>(num_children, cudf::column_metadata{""});

Comment on lines +199 to +234
static char const* stringify_type(cudf::type_id value)
{
switch (value) {
case cudf::type_id::INT8: return "INT8";
case cudf::type_id::INT16: return "INT16";
case cudf::type_id::INT32: return "INT32";
case cudf::type_id::INT64: return "INT64";
case cudf::type_id::UINT8: return "UINT8";
case cudf::type_id::UINT16: return "UINT16";
case cudf::type_id::UINT32: return "UINT32";
case cudf::type_id::UINT64: return "UINT64";
case cudf::type_id::FLOAT32: return "FLOAT32";
case cudf::type_id::FLOAT64: return "FLOAT64";
case cudf::type_id::BOOL8: return "BOOL8";
case cudf::type_id::TIMESTAMP_DAYS: return "TIMESTAMP_DAYS";
case cudf::type_id::TIMESTAMP_SECONDS: return "TIMESTAMP_SECONDS";
case cudf::type_id::TIMESTAMP_MILLISECONDS: return "TIMESTAMP_MILLISECONDS";
case cudf::type_id::TIMESTAMP_MICROSECONDS: return "TIMESTAMP_MICROSECONDS";
case cudf::type_id::TIMESTAMP_NANOSECONDS: return "TIMESTAMP_NANOSECONDS";
case cudf::type_id::DURATION_DAYS: return "DURATION_DAYS";
case cudf::type_id::DURATION_SECONDS: return "DURATION_SECONDS";
case cudf::type_id::DURATION_MILLISECONDS: return "DURATION_MILLISECONDS";
case cudf::type_id::DURATION_MICROSECONDS: return "DURATION_MICROSECONDS";
case cudf::type_id::DURATION_NANOSECONDS: return "DURATION_NANOSECONDS";
case cudf::type_id::DICTIONARY32: return "DICTIONARY32";
case cudf::type_id::STRING: return "STRING";
case cudf::type_id::LIST: return "LIST";
case cudf::type_id::DECIMAL32: return "DECIMAL32";
case cudf::type_id::DECIMAL64: return "DECIMAL64";
case cudf::type_id::DECIMAL128: return "DECIMAL128";
case cudf::type_id::STRUCT: return "STRUCT";
default: return "unknown";
}
}

NVBENCH_DECLARE_ENUM_TYPE_STRINGS(cudf::type_id, stringify_type, stringify_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be used by other benchmarks too.
Please consider moving these to benchmark common header file or common utilities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. which header would you suggest that be moved to? same header as type_id?

Copy link
Contributor

@ttnghia ttnghia Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a good idea. Many times, I had to implement this function for my debugging. Let's make a follow up PR, moving this into types.hpp (same header as type_id), or cudf_test/debug_utilities.hpp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a similar file here: https://github.com/rapidsai/cudf/blob/branch-24.12/cpp/benchmarks/io/nvbench_helpers.hpp
Worth considering a new file in https://github.com/rapidsai/cudf/tree/branch-24.12/cpp/benchmarks/common.

Declaring an NVBench-specific utility in a non-cuDF namespace within types.hpp is inappropriate, as it also forces types.hpp to include an NVBench header, which is something we want to avoid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with a new header in the benchmarks/common.
We could refactor some of the elements from io/nvbench_helpers.hpp with the new header as well.
Though the refactor may be too much for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That utility function will be used not just in benchmark but also testing/debugging.

Could you elaborate on this further? How does this feature specifically benefit unit tests?

Benchmarks and unit tests are treated as "external" users of libcudf, which is why they are not included within the cudf:: namespace. The cudf/utilities directory is reserved for use cases within libcudf or those under the cudf:: namespace.

Copy link
Contributor

@ttnghia ttnghia Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this feature specifically benefit unit tests?

I didn't mean it will benefit unit tests, but meant it will benefit developers in debugging/testing.

I usually debug cudf APIs by printing out the type id of the input column. For example:

printf("input type: %s\n", type_to_string(input.type().id()));

However, most of the time, for fast iteration with small issues, I just printed the type id integer value, and look up the enum, trying to identify the enum type from value. For example:

printf("input type: %d\n",(int)input.type().id());

Output: input type: 23 

(Looking at the enum class type_id, and count from the first to the 23rd line, I can know type 23 is STRING)

As such, having such type_to_string utility is very helpful. This debug print statement can be inserted anywhere. Most of the time I insert it into the middle of a cudf API. Sometimes I use it in unit tests too. It would be most beneficial to add it into libcudf's utilities module. We can either use it for debugging libcudf APIs or for external usage in benchmark/unit tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. It's a different feature request.

You are referring to the fact that stringify_type can be a common utility for both benchmarks and tests. A host-device print utility like print_type_id together with stringify_type can be added to https://github.com/rapidsai/cudf/blob/branch-24.12/cpp/include/cudf_test/print_utilities.cuh. Once it's done, this file can simply include that print utility header and add a single line for nvbench readability enhancement:

NVBENCH_DECLARE_ENUM_TYPE_STRINGS(cudf::type_id, cudf::test::print::stringify_type, stringify_type)

I’m inclined to leave this refactoring for a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #17376 to track this issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for filing the issue 👍

@lamarrr
Copy link
Contributor Author

lamarrr commented Nov 19, 2024

Benchmark Results

to_arrow_host

[0] NVIDIA RTX A6000

data_type num_rows num_columns num_elements Samples CPU Time Noise GPU Time Noise Elem/s GlobalMem BW BWUtil
INT8 10000 1 10000 28000x 21.914 us 40.04% 17.866 us 20.88% 559.714M 1.259 GB/s 0.16%
INT8 100000 1 100000 14608x 38.287 us 21.73% 34.256 us 12.14% 2.919G 6.568 GB/s 0.86%
INT8 1000000 1 1000000 3248x 158.927 us 7.88% 154.582 us 3.84% 6.469G 14.555 GB/s 1.89%
INT8 10000000 1 10000000 1456x 965.120 us 1.61% 960.873 us 1.38% 10.407G 23.416 GB/s 3.05%
INT16 10000 1 10000 25632x 23.523 us 37.82% 19.517 us 20.31% 512.368M 2.178 GB/s 0.28%
INT16 100000 1 100000 10640x 51.106 us 19.14% 47.042 us 10.47% 2.126G 9.035 GB/s 1.18%
INT16 1000000 1 1000000 2192x 233.574 us 4.88% 229.306 us 2.81% 4.361G 18.534 GB/s 2.41%
INT16 10000000 1 10000000 2128x 1.730 ms 1.89% 1.725 ms 1.80% 5.796G 24.635 GB/s 3.21%
INT32 10000 1 10000 26016x 23.167 us 36.52% 19.226 us 21.19% 520.120M 4.291 GB/s 0.56%
INT32 100000 1 100000 7552x 70.496 us 14.93% 66.290 us 7.31% 1.509G 12.445 GB/s 1.62%
INT32 1000000 1 1000000 1328x 382.046 us 2.74% 377.926 us 2.17% 2.646G 21.830 GB/s 2.84%
INT32 10000000 1 10000000 800x 5.899 ms 1.51% 5.895 ms 1.51% 1.696G 13.996 GB/s 1.82%
INT64 10000 1 10000 18832x 30.535 us 28.88% 26.555 us 16.07% 376.580M 6.119 GB/s 0.80%
INT64 100000 1 100000 4176x 124.024 us 9.38% 119.743 us 4.06% 835.123M 13.571 GB/s 1.77%
INT64 1000000 1 1000000 1568x 684.974 us 2.12% 680.640 us 1.63% 1.469G 23.875 GB/s 3.11%
INT64 10000000 1 10000000 544x 11.239 ms 1.48% 11.235 ms 1.48% 890.086M 14.464 GB/s 1.88%
UINT8 10000 1 10000 29824x 20.745 us 42.90% 16.772 us 25.29% 596.216M 1.341 GB/s 0.17%
UINT8 100000 1 100000 15872x 35.519 us 24.44% 31.511 us 13.71% 3.174G 7.140 GB/s 0.93%
UINT8 1000000 1 1000000 3168x 162.406 us 6.89% 158.153 us 3.69% 6.323G 14.227 GB/s 1.85%
UINT8 10000000 1 10000000 560x 963.161 us 2.04% 958.645 us 1.60% 10.431G 23.471 GB/s 3.06%
UINT16 10000 1 10000 31152x 20.030 us 45.67% 16.056 us 25.27% 622.812M 2.647 GB/s 0.34%
UINT16 100000 1 100000 9792x 55.103 us 16.32% 51.079 us 10.51% 1.958G 8.321 GB/s 1.08%
UINT16 1000000 1 1000000 2128x 240.248 us 5.78% 235.793 us 3.25% 4.241G 18.024 GB/s 2.35%
UINT16 10000000 1 10000000 688x 1.731 ms 1.49% 1.726 ms 1.21% 5.794G 24.623 GB/s 3.21%
UINT32 10000 1 10000 24848x 24.123 us 39.23% 20.125 us 21.55% 496.900M 4.099 GB/s 0.53%
UINT32 100000 1 100000 7424x 71.440 us 11.10% 67.416 us 6.74% 1.483G 12.237 GB/s 1.59%
UINT32 1000000 1 1000000 1328x 381.840 us 3.95% 377.172 us 2.08% 2.651G 21.873 GB/s 2.85%
UINT32 10000000 1 10000000 816x 5.883 ms 1.82% 5.879 ms 1.82% 1.701G 14.034 GB/s 1.83%
UINT64 10000 1 10000 17584x 32.369 us 26.24% 28.437 us 16.19% 351.648M 5.714 GB/s 0.74%
UINT64 100000 1 100000 4272x 121.385 us 8.80% 117.174 us 4.54% 853.428M 13.868 GB/s 1.81%
UINT64 1000000 1 1000000 784x 686.690 us 3.39% 681.295 us 1.99% 1.468G 23.852 GB/s 3.11%
UINT64 10000000 1 10000000 1168x 11.283 ms 1.83% 11.278 ms 1.82% 886.668M 14.408 GB/s 1.88%
FLOAT32 10000 1 10000 25648x 23.500 us 40.61% 19.499 us 21.61% 512.834M 4.231 GB/s 0.55%
FLOAT32 100000 1 100000 7632x 69.737 us 14.71% 65.567 us 6.89% 1.525G 12.583 GB/s 1.64%
FLOAT32 1000000 1 1000000 1344x 380.176 us 2.70% 376.093 us 2.17% 2.659G 21.936 GB/s 2.86%
FLOAT32 10000000 1 10000000 640x 5.877 ms 1.68% 5.873 ms 1.68% 1.703G 14.047 GB/s 1.83%
FLOAT64 10000 1 10000 18320x 31.289 us 29.81% 27.300 us 16.38% 366.306M 5.952 GB/s 0.77%
FLOAT64 100000 1 100000 4176x 124.345 us 8.78% 120.091 us 3.98% 832.701M 13.531 GB/s 1.76%
FLOAT64 1000000 1 1000000 752x 677.249 us 2.27% 672.707 us 1.50% 1.487G 24.156 GB/s 3.14%
FLOAT64 10000000 1 10000000 944x 11.235 ms 1.46% 11.231 ms 1.46% 890.394M 14.469 GB/s 1.88%
BOOL8 10000 1 10000 16192x 34.912 us 27.73% 30.906 us 18.29% 323.558M 728.006 MB/s 0.09%
BOOL8 100000 1 100000 16080x 35.133 us 27.70% 31.114 us 17.26% 3.214G 7.231 GB/s 0.94%
BOOL8 1000000 1 1000000 7360x 72.175 us 16.37% 68.059 us 11.40% 14.693G 33.060 GB/s 4.30%
BOOL8 10000000 1 10000000 1648x 412.586 us 4.67% 408.103 us 3.34% 24.504G 55.133 GB/s 7.18%
TIMESTAMP_SECONDS 10000 1 10000 19632x 29.413 us 27.02% 25.485 us 14.85% 392.394M 6.376 GB/s 0.83%
TIMESTAMP_SECONDS 100000 1 100000 4288x 120.946 us 8.64% 116.699 us 4.44% 856.909M 13.925 GB/s 1.81%
TIMESTAMP_SECONDS 1000000 1 1000000 1088x 676.752 us 2.34% 672.273 us 1.64% 1.487G 24.172 GB/s 3.15%
TIMESTAMP_SECONDS 10000000 1 10000000 720x 11.249 ms 1.60% 11.245 ms 1.60% 889.289M 14.451 GB/s 1.88%
TIMESTAMP_MILLISECONDS 10000 1 10000 18736x 30.742 us 32.91% 26.691 us 15.96% 374.653M 6.088 GB/s 0.79%
TIMESTAMP_MILLISECONDS 100000 1 100000 4272x 121.663 us 9.75% 117.411 us 4.45% 851.712M 13.840 GB/s 1.80%
TIMESTAMP_MILLISECONDS 1000000 1 1000000 1136x 676.368 us 2.66% 671.699 us 1.62% 1.489G 24.192 GB/s 3.15%
TIMESTAMP_MILLISECONDS 10000000 1 10000000 1248x 11.226 ms 1.60% 11.221 ms 1.60% 891.170M 14.482 GB/s 1.89%
TIMESTAMP_MICROSECONDS 10000 1 10000 18192x 31.493 us 28.56% 27.508 us 16.22% 363.531M 5.907 GB/s 0.77%
TIMESTAMP_MICROSECONDS 100000 1 100000 4352x 119.086 us 7.95% 114.966 us 3.96% 869.826M 14.135 GB/s 1.84%
TIMESTAMP_MICROSECONDS 1000000 1 1000000 1040x 676.785 us 1.98% 672.482 us 1.52% 1.487G 24.164 GB/s 3.15%
TIMESTAMP_MICROSECONDS 10000000 1 10000000 1327x 11.291 ms 2.10% 11.286 ms 2.09% 886.069M 14.399 GB/s 1.87%
TIMESTAMP_NANOSECONDS 10000 1 10000 19072x 30.219 us 31.30% 26.225 us 15.40% 381.313M 6.196 GB/s 0.81%
TIMESTAMP_NANOSECONDS 100000 1 100000 4368x 118.972 us 9.72% 114.649 us 4.10% 872.228M 14.174 GB/s 1.85%
TIMESTAMP_NANOSECONDS 1000000 1 1000000 864x 677.270 us 3.10% 671.873 us 1.49% 1.488G 24.186 GB/s 3.15%
TIMESTAMP_NANOSECONDS 10000000 1 10000000 1120x 11.238 ms 1.77% 11.234 ms 1.77% 890.186M 14.466 GB/s 1.88%
DURATION_SECONDS 10000 1 10000 19088x 30.214 us 29.13% 26.214 us 15.47% 381.480M 6.199 GB/s 0.81%
DURATION_SECONDS 100000 1 100000 4304x 120.746 us 8.84% 116.549 us 4.48% 858.011M 13.943 GB/s 1.82%
DURATION_SECONDS 1000000 1 1000000 784x 678.339 us 2.85% 673.494 us 1.55% 1.485G 24.128 GB/s 3.14%
DURATION_SECONDS 10000000 1 10000000 800x 11.261 ms 1.39% 11.257 ms 1.39% 888.360M 14.436 GB/s 1.88%
DURATION_MILLISECONDS 10000 1 10000 18032x 31.778 us 30.86% 27.746 us 16.51% 360.408M 5.857 GB/s 0.76%
DURATION_MILLISECONDS 100000 1 100000 4384x 118.576 us 8.69% 114.371 us 4.00% 874.347M 14.208 GB/s 1.85%
DURATION_MILLISECONDS 1000000 1 1000000 768x 679.275 us 2.78% 674.324 us 1.54% 1.483G 24.098 GB/s 3.14%
DURATION_MILLISECONDS 10000000 1 10000000 576x 11.232 ms 1.41% 11.228 ms 1.41% 890.638M 14.473 GB/s 1.88%
DURATION_MICROSECONDS 10000 1 10000 19792x 29.281 us 31.57% 25.276 us 15.06% 395.628M 6.429 GB/s 0.84%
DURATION_MICROSECONDS 100000 1 100000 4288x 120.949 us 8.56% 116.737 us 4.32% 856.624M 13.920 GB/s 1.81%
DURATION_MICROSECONDS 1000000 1 1000000 1232x 680.710 us 2.81% 675.880 us 1.78% 1.480G 24.043 GB/s 3.13%
DURATION_MICROSECONDS 10000000 1 10000000 624x 11.256 ms 1.56% 11.251 ms 1.55% 888.809M 14.443 GB/s 1.88%
DURATION_NANOSECONDS 10000 1 10000 18848x 30.482 us 28.84% 26.531 us 16.84% 376.920M 6.125 GB/s 0.80%
DURATION_NANOSECONDS 100000 1 100000 4288x 120.801 us 8.42% 116.654 us 4.47% 857.233M 13.930 GB/s 1.81%
DURATION_NANOSECONDS 1000000 1 1000000 768x 685.292 us 2.70% 680.604 us 1.65% 1.469G 23.876 GB/s 3.11%
DURATION_NANOSECONDS 10000000 1 10000000 1024x 11.265 ms 1.74% 11.260 ms 1.73% 888.063M 14.431 GB/s 1.88%
STRING 10000 1 10000 10704x 50.779 us 15.91% 46.781 us 9.66% 213.763M 8.486 GB/s 1.10%
STRING 100000 1 100000 2192x 242.103 us 4.55% 237.863 us 2.43% 420.410M 16.651 GB/s 2.17%
STRING 1000000 1 1000000 768x 1.622 ms 1.34% 1.617 ms 1.15% 618.364M 24.494 GB/s 3.19%
STRING 10000000 1 10000000 547x 27.419 ms 1.37% 27.414 ms 1.37% 364.772M 14.459 GB/s 1.88%
LIST 10000 1 10000 1344x 1.037 ms 1.96% 1.032 ms 1.51% 9.689M 23.003 GB/s 2.99%
LIST 100000 1 100000 768x 16.066 ms 1.46% 16.061 ms 1.46% 6.226M 14.829 GB/s 1.93%
LIST 1000000 1 1000000 94x 161.224 ms 1.12% 161.219 ms 1.12% 6.203M 14.771 GB/s 1.92%
DECIMAL32 10000 1 10000 9520x 56.694 us 19.40% 52.581 us 11.50% 190.184M 1.569 GB/s 0.20%
DECIMAL32 100000 1 100000 2624x 195.617 us 6.85% 191.439 us 4.37% 522.359M 4.309 GB/s 0.56%
DECIMAL32 1000000 1 1000000 768x 1.333 ms 1.61% 1.329 ms 1.30% 752.613M 6.209 GB/s 0.81%
DECIMAL32 10000000 1 10000000 672x 22.308 ms 1.29% 22.303 ms 1.29% 448.367M 3.699 GB/s 0.48%
DECIMAL64 10000 1 10000 9376x 57.513 us 21.44% 53.345 us 13.80% 187.458M 3.046 GB/s 0.40%
DECIMAL64 100000 1 100000 2624x 195.778 us 5.83% 191.629 us 3.18% 521.842M 8.480 GB/s 1.10%
DECIMAL64 1000000 1 1000000 768x 1.326 ms 1.36% 1.322 ms 1.22% 756.635M 12.295 GB/s 1.60%
DECIMAL64 10000000 1 10000000 656x 22.253 ms 1.25% 22.249 ms 1.25% 449.462M 7.304 GB/s 0.95%
DECIMAL128 10000 1 10000 11232x 48.661 us 18.26% 44.547 us 8.84% 224.480M 7.239 GB/s 0.94%
DECIMAL128 100000 1 100000 2800x 183.396 us 6.10% 179.229 us 2.20% 557.945M 17.994 GB/s 2.34%
DECIMAL128 1000000 1 1000000 688x 1.286 ms 1.24% 1.282 ms 1.02% 780.196M 25.161 GB/s 3.28%
DECIMAL128 10000000 1 10000000 656x 21.931 ms 1.34% 21.926 ms 1.33% 456.080M 14.709 GB/s 1.91%
STRUCT 10000 1 10000 2976x 173.018 us 12.32% 168.624 us 9.96% 59.303M 3.357 GB/s 0.44%
STRUCT 100000 1 100000 1168x 458.507 us 7.80% 454.242 us 6.44% 220.147M 12.448 GB/s 1.62%
STRUCT 1000000 1 1000000 800x 2.489 ms 1.31% 2.484 ms 1.21% 402.564M 22.766 GB/s 2.96%
STRUCT 10000000 1 10000000 381x 39.430 ms 1.34% 39.426 ms 1.34% 253.639M 14.350 GB/s 1.87%

to_arrow_device

[0] NVIDIA RTX A6000

data_type num_rows num_columns num_elements Samples CPU Time Noise GPU Time Noise Elem/s GlobalMem BW BWUtil
INT8 10000 1 10000 186944x 6.488 us 206.16% 2.675 us 36.93% 3.739G 8.412 GB/s 1.10%
INT8 100000 1 100000 187152x 6.203 us 207.04% 2.672 us 39.95% 37.428G 84.213 GB/s 10.96%
INT8 1000000 1 1000000 187424x 6.368 us 206.30% 2.668 us 38.30% 374.827G 843.361 GB/s 109.80%
INT8 10000000 1 10000000 187088x 6.414 us 197.75% 2.673 us 37.23% 3.742T 8.419 TB/s 1096.06%
INT16 10000 1 10000 187072x 6.515 us 215.20% 2.673 us 30.95% 3.741G 15.901 GB/s 2.07%
INT16 100000 1 100000 187136x 6.220 us 208.01% 2.672 us 27.73% 37.425G 159.057 GB/s 20.71%
INT16 1000000 1 1000000 186832x 6.406 us 218.32% 2.676 us 46.98% 373.644G 1.588 TB/s 206.74%
INT16 10000000 1 10000000 187120x 6.449 us 199.08% 2.672 us 32.73% 3.742T 15.905 TB/s 2070.68%
INT32 10000 1 10000 187168x 6.501 us 209.75% 2.671 us 31.97% 3.743G 30.882 GB/s 4.02%
INT32 100000 1 100000 187072x 6.188 us 197.98% 2.673 us 46.91% 37.411G 308.644 GB/s 40.18%
INT32 1000000 1 1000000 187104x 6.375 us 201.94% 2.672 us 42.21% 374.205G 3.087 TB/s 401.93%
INT32 10000000 1 10000000 187088x 6.451 us 209.20% 2.673 us 36.52% 3.741T 30.867 TB/s 4018.65%
INT64 10000 1 10000 187168x 6.495 us 211.62% 2.672 us 40.25% 3.743G 60.825 GB/s 7.92%
INT64 100000 1 100000 186720x 6.203 us 200.64% 2.678 us 30.99% 37.344G 606.834 GB/s 79.00%
INT64 1000000 1 1000000 186896x 6.392 us 207.95% 2.675 us 31.42% 373.766G 6.074 TB/s 790.75%
INT64 10000000 1 10000000 186672x 6.455 us 205.12% 2.679 us 37.36% 3.733T 60.668 TB/s 7898.47%
UINT8 10000 1 10000 186896x 6.494 us 208.16% 2.676 us 34.23% 3.738G 8.410 GB/s 1.09%
UINT8 100000 1 100000 187392x 6.197 us 201.80% 2.668 us 25.95% 37.477G 84.324 GB/s 10.98%
UINT8 1000000 1 1000000 187296x 6.368 us 196.84% 2.670 us 27.85% 374.583G 842.812 GB/s 109.73%
UINT8 10000000 1 10000000 187168x 6.455 us 212.08% 2.672 us 35.89% 3.743T 8.422 TB/s 1096.49%
UINT16 10000 1 10000 187200x 6.497 us 211.71% 2.671 us 31.17% 3.744G 15.911 GB/s 2.07%
UINT16 100000 1 100000 187136x 6.206 us 199.27% 2.672 us 43.72% 37.424G 159.052 GB/s 20.71%
UINT16 1000000 1 1000000 187296x 6.386 us 198.54% 2.670 us 30.22% 374.584G 1.592 TB/s 207.26%
UINT16 10000000 1 10000000 187392x 6.462 us 209.42% 2.668 us 27.34% 3.748T 15.928 TB/s 2073.69%
UINT32 10000 1 10000 187088x 6.496 us 208.91% 2.673 us 36.79% 3.742G 30.869 GB/s 4.02%
UINT32 100000 1 100000 187424x 6.195 us 208.76% 2.668 us 29.99% 37.482G 309.225 GB/s 40.26%
UINT32 1000000 1 1000000 187376x 6.380 us 207.19% 2.669 us 31.29% 374.734G 3.092 TB/s 402.50%
UINT32 10000000 1 10000000 187184x 6.450 us 216.97% 2.671 us 38.42% 3.744T 30.884 TB/s 4020.87%
UINT64 10000 1 10000 187296x 6.490 us 211.25% 2.670 us 29.39% 3.746G 60.870 GB/s 7.92%
UINT64 100000 1 100000 186592x 6.205 us 201.74% 2.680 us 39.04% 37.318G 606.414 GB/s 78.95%
UINT64 1000000 1 1000000 186864x 6.393 us 204.87% 2.676 us 33.75% 373.699G 6.073 TB/s 790.61%
UINT64 10000000 1 10000000 186592x 6.474 us 216.20% 2.680 us 31.76% 3.732T 60.639 TB/s 7894.77%
FLOAT32 10000 1 10000 186928x 6.203 us 201.41% 2.675 us 34.50% 3.738G 30.842 GB/s 4.02%
FLOAT32 100000 1 100000 187104x 6.205 us 206.79% 2.673 us 31.74% 37.418G 308.697 GB/s 40.19%
FLOAT32 1000000 1 1000000 187360x 6.378 us 205.80% 2.669 us 36.39% 374.689G 3.091 TB/s 402.45%
FLOAT32 10000000 1 10000000 187056x 6.455 us 206.74% 2.673 us 35.59% 3.741T 30.862 TB/s 4017.94%
FLOAT64 10000 1 10000 187296x 6.199 us 201.49% 2.670 us 35.74% 3.746G 60.867 GB/s 7.92%
FLOAT64 100000 1 100000 187296x 6.203 us 198.67% 2.670 us 30.77% 37.459G 608.707 GB/s 79.25%
FLOAT64 1000000 1 1000000 187200x 6.392 us 209.06% 2.671 us 36.01% 374.394G 6.084 TB/s 792.08%
FLOAT64 10000000 1 10000000 187168x 6.457 us 209.93% 2.672 us 36.00% 3.743T 60.825 TB/s 7918.98%
BOOL8 10000 1 10000 24496x 24.236 us 33.85% 20.421 us 13.73% 489.695M 1.102 GB/s 0.14%
BOOL8 100000 1 100000 23840x 24.967 us 37.62% 20.979 us 16.34% 4.767G 10.725 GB/s 1.40%
BOOL8 1000000 1 1000000 17440x 32.745 us 23.50% 28.693 us 10.05% 34.852G 78.417 GB/s 10.21%
BOOL8 10000000 1 10000000 4832x 107.886 us 10.16% 103.626 us 3.57% 96.501G 217.127 GB/s 28.27%
TIMESTAMP_SECONDS 10000 1 10000 187488x 6.177 us 190.08% 2.667 us 36.24% 3.749G 60.929 GB/s 7.93%
TIMESTAMP_SECONDS 100000 1 100000 186464x 6.210 us 202.12% 2.682 us 35.07% 37.292G 605.991 GB/s 78.90%
TIMESTAMP_SECONDS 1000000 1 1000000 186416x 6.397 us 205.75% 2.682 us 44.20% 372.811G 6.058 TB/s 788.73%
TIMESTAMP_SECONDS 10000000 1 10000000 186736x 6.452 us 204.33% 2.678 us 28.03% 3.735T 60.689 TB/s 7901.22%
TIMESTAMP_MILLISECONDS 10000 1 10000 186544x 6.202 us 195.34% 2.680 us 45.22% 3.731G 60.625 GB/s 7.89%
TIMESTAMP_MILLISECONDS 100000 1 100000 187040x 6.192 us 195.37% 2.673 us 39.49% 37.406G 607.851 GB/s 79.14%
TIMESTAMP_MILLISECONDS 1000000 1 1000000 187200x 6.378 us 199.89% 2.671 us 33.00% 374.377G 6.084 TB/s 792.04%
TIMESTAMP_MILLISECONDS 10000000 1 10000000 187328x 6.451 us 203.34% 2.669 us 36.10% 3.747T 60.881 TB/s 7926.27%
TIMESTAMP_MICROSECONDS 10000 1 10000 187296x 6.194 us 210.56% 2.670 us 35.39% 3.746G 60.869 GB/s 7.92%
TIMESTAMP_MICROSECONDS 100000 1 100000 187104x 6.181 us 193.40% 2.673 us 39.77% 37.418G 608.042 GB/s 79.16%
TIMESTAMP_MICROSECONDS 1000000 1 1000000 187072x 6.368 us 195.34% 2.673 us 28.30% 374.124G 6.080 TB/s 791.50%
TIMESTAMP_MICROSECONDS 10000000 1 10000000 187248x 6.453 us 211.65% 2.670 us 38.39% 3.745T 60.855 TB/s 7922.82%
TIMESTAMP_NANOSECONDS 10000 1 10000 187024x 6.217 us 199.77% 2.676 us 40.00% 3.737G 60.724 GB/s 7.91%
TIMESTAMP_NANOSECONDS 100000 1 100000 187120x 6.217 us 202.55% 2.672 us 35.15% 37.421G 608.091 GB/s 79.17%
TIMESTAMP_NANOSECONDS 1000000 1 1000000 187184x 6.384 us 213.85% 2.671 us 43.16% 374.364G 6.083 TB/s 792.01%
TIMESTAMP_NANOSECONDS 10000000 1 10000000 187408x 6.448 us 212.66% 2.668 us 39.82% 3.748T 60.903 TB/s 7929.14%
DURATION_SECONDS 10000 1 10000 187120x 6.203 us 204.62% 2.672 us 32.73% 3.742G 60.810 GB/s 7.92%
DURATION_SECONDS 100000 1 100000 186752x 6.206 us 200.53% 2.677 us 34.99% 37.349G 606.927 GB/s 79.02%
DURATION_SECONDS 1000000 1 1000000 186752x 6.394 us 206.13% 2.677 us 35.21% 373.493G 6.069 TB/s 790.17%
DURATION_SECONDS 10000000 1 10000000 186720x 6.487 us 207.46% 2.678 us 33.49% 3.734T 60.679 TB/s 7899.99%
DURATION_MILLISECONDS 10000 1 10000 186944x 6.203 us 199.94% 2.675 us 26.49% 3.739G 60.755 GB/s 7.91%
DURATION_MILLISECONDS 100000 1 100000 187136x 6.197 us 200.23% 2.672 us 42.73% 37.425G 608.152 GB/s 79.18%
DURATION_MILLISECONDS 1000000 1 1000000 187360x 6.368 us 197.82% 2.669 us 32.31% 374.697G 6.089 TB/s 792.72%
DURATION_MILLISECONDS 10000000 1 10000000 187280x 6.449 us 208.78% 2.670 us 31.03% 3.746T 60.865 TB/s 7924.13%
DURATION_MICROSECONDS 10000 1 10000 187104x 6.193 us 199.09% 2.672 us 45.05% 3.742G 60.805 GB/s 7.92%
DURATION_MICROSECONDS 100000 1 100000 187280x 6.204 us 208.30% 2.670 us 47.31% 37.454G 608.633 GB/s 79.24%
DURATION_MICROSECONDS 1000000 1 1000000 187472x 6.393 us 208.68% 2.667 us 31.01% 374.923G 6.092 TB/s 793.19%
DURATION_MICROSECONDS 10000000 1 10000000 186960x 6.469 us 210.12% 2.674 us 43.62% 3.739T 60.760 TB/s 7910.50%
DURATION_NANOSECONDS 10000 1 10000 186736x 6.348 us 290.98% 2.678 us 46.50% 3.735G 60.688 GB/s 7.90%
DURATION_NANOSECONDS 100000 1 100000 186400x 6.410 us 309.45% 2.683 us 54.53% 37.277G 605.753 GB/s 78.86%
DURATION_NANOSECONDS 1000000 1 1000000 186736x 6.588 us 308.04% 2.678 us 44.19% 373.444G 6.068 TB/s 790.07%
DURATION_NANOSECONDS 10000000 1 10000000 186640x 6.639 us 307.77% 2.679 us 55.49% 3.733T 60.657 TB/s 7897.10%
STRING 10000 1 10000 66432x 11.717 us 152.18% 7.528 us 15.86% 1.328G 52.731 GB/s 6.87%
STRING 100000 1 100000 65696x 11.789 us 158.73% 7.612 us 14.50% 13.138G 520.333 GB/s 67.74%
STRING 1000000 1 1000000 67424x 11.805 us 164.61% 7.416 us 20.92% 134.835G 5.341 TB/s 695.34%
STRING 10000000 1 10000000 59776x 12.834 us 146.60% 8.366 us 36.27% 1.195T 47.380 TB/s 6168.45%
LIST 10000 1 10000 182544x 6.392 us 294.85% 2.739 us 89.81% 3.651G 8.667 TB/s 1128.40%
LIST 100000 1 100000 182768x 6.414 us 310.58% 2.736 us 105.47% 36.551G 87.055 TB/s 11333.92%
LIST 1000000 1 1000000 185424x 6.569 us 302.90% 2.697 us 76.93% 370.814G 883.058 TB/s 114967.12%
DECIMAL32 10000 1 10000 57792x 12.898 us 177.62% 8.654 us 103.61% 1.156G 9.534 GB/s 1.24%
DECIMAL32 100000 1 100000 41936x 16.280 us 151.55% 11.923 us 95.10% 8.387G 69.191 GB/s 9.01%
DECIMAL32 1000000 1 1000000 11072x 49.480 us 49.45% 45.217 us 36.81% 22.116G 182.455 GB/s 23.75%
DECIMAL32 10000000 1 10000000 2592x 376.033 us 15.63% 371.364 us 14.59% 26.928G 222.154 GB/s 28.92%
DECIMAL64 10000 1 10000 58336x 12.801 us 179.94% 8.572 us 102.23% 1.167G 18.956 GB/s 2.47%
DECIMAL64 100000 1 100000 42240x 16.261 us 146.74% 11.839 us 79.50% 8.447G 137.260 GB/s 17.87%
DECIMAL64 1000000 1 1000000 11568x 47.370 us 49.52% 43.250 us 38.10% 23.121G 375.724 GB/s 48.92%
DECIMAL64 10000000 1 10000000 2672x 358.663 us 14.30% 352.807 us 11.95% 28.344G 460.592 GB/s 59.97%
DECIMAL128 10000 1 10000 186896x 6.616 us 305.00% 2.675 us 53.05% 3.738G 120.546 GB/s 15.69%
DECIMAL128 100000 1 100000 186320x 6.393 us 301.97% 2.684 us 48.75% 37.262G 1.202 TB/s 156.45%
DECIMAL128 1000000 1 1000000 186032x 6.406 us 303.71% 2.688 us 55.13% 372.037G 11.998 TB/s 1562.07%
DECIMAL128 10000000 1 10000000 186112x 6.558 us 305.41% 2.687 us 54.35% 3.722T 120.040 TB/s 15628.30%
STRUCT 10000 1 10000 58960x 13.012 us 136.72% 8.481 us 12.33% 1.179G 66.736 GB/s 8.69%
STRUCT 100000 1 100000 58096x 12.855 us 138.37% 8.609 us 13.59% 11.616G 656.837 GB/s 85.51%
STRUCT 1000000 1 1000000 59040x 12.681 us 134.54% 8.469 us 14.12% 118.074G 6.677 TB/s 869.35%
STRUCT 10000000 1 10000000 58976x 12.989 us 143.89% 8.479 us 14.10% 1.179T 66.724 TB/s 8686.89%

from_arrow_host

[0] NVIDIA RTX A6000

data_type num_rows num_columns num_elements Samples CPU Time Noise GPU Time Noise Elem/s GlobalMem BW BWUtil
INT8 10000 1 10000 139840x 8.014 us 261.37% 3.576 us 27.89% 2.797G 6.293 GB/s 0.82%
INT8 100000 1 100000 25104x 24.501 us 93.20% 19.919 us 62.06% 5.020G 11.296 GB/s 1.47%
INT8 1000000 1 1000000 4336x 120.692 us 14.31% 115.587 us 2.16% 8.652G 19.466 GB/s 2.53%
INT8 10000000 1 10000000 1056x 986.494 us 3.68% 977.318 us 0.79% 10.232G 23.022 GB/s 3.00%
INT16 10000 1 10000 110608x 8.864 us 207.78% 4.521 us 31.46% 2.212G 9.400 GB/s 1.22%
INT16 100000 1 100000 15744x 36.676 us 62.01% 31.759 us 40.29% 3.149G 13.382 GB/s 1.74%
INT16 1000000 1 1000000 2480x 206.413 us 6.16% 201.637 us 1.83% 4.959G 21.078 GB/s 2.74%
INT16 10000000 1 10000000 2112x 1.846 ms 1.67% 1.840 ms 1.24% 5.435G 23.097 GB/s 3.01%
INT32 10000 1 10000 75568x 10.845 us 171.90% 6.617 us 60.41% 1.511G 12.469 GB/s 1.62%
INT32 100000 1 100000 9776x 56.004 us 43.20% 51.215 us 27.93% 1.953G 16.108 GB/s 2.10%
INT32 1000000 1 1000000 1376x 375.345 us 3.44% 370.464 us 1.26% 2.699G 22.269 GB/s 2.90%
INT32 10000000 1 10000000 944x 3.569 ms 1.10% 3.562 ms 0.83% 2.808G 23.163 GB/s 3.02%
INT64 10000 1 10000 34096x 19.360 us 133.87% 14.666 us 77.77% 681.830M 11.080 GB/s 1.44%
INT64 100000 1 100000 5840x 93.968 us 25.63% 89.579 us 19.68% 1.116G 18.140 GB/s 2.36%
INT64 1000000 1 1000000 1056x 717.536 us 3.46% 710.367 us 0.99% 1.408G 22.876 GB/s 2.98%
INT64 10000000 1 10000000 672x 6.970 ms 0.71% 6.963 ms 0.66% 1.436G 23.338 GB/s 3.04%
UINT8 10000 1 10000 139520x 8.044 us 257.70% 3.584 us 40.28% 2.790G 6.278 GB/s 0.82%
UINT8 100000 1 100000 24864x 24.947 us 99.78% 20.111 us 62.96% 4.972G 11.188 GB/s 1.46%
UINT8 1000000 1 1000000 4336x 120.604 us 14.59% 115.417 us 2.79% 8.664G 19.495 GB/s 2.54%
UINT8 10000000 1 10000000 1008x 987.518 us 4.13% 977.434 us 0.95% 10.231G 23.019 GB/s 3.00%
UINT16 10000 1 10000 110832x 8.822 us 214.10% 4.512 us 32.51% 2.216G 9.420 GB/s 1.23%
UINT16 100000 1 100000 15760x 36.474 us 62.43% 31.737 us 42.78% 3.151G 13.391 GB/s 1.74%
UINT16 1000000 1 1000000 2496x 206.715 us 6.77% 201.591 us 2.22% 4.961G 21.082 GB/s 2.74%
UINT16 10000000 1 10000000 1664x 1.845 ms 1.35% 1.839 ms 0.82% 5.439G 23.114 GB/s 3.01%
UINT32 10000 1 10000 75776x 10.836 us 173.45% 6.599 us 52.41% 1.515G 12.502 GB/s 1.63%
UINT32 100000 1 100000 9616x 56.694 us 38.87% 52.064 us 28.63% 1.921G 15.846 GB/s 2.06%
UINT32 1000000 1 1000000 1360x 376.378 us 4.63% 370.762 us 1.38% 2.697G 22.251 GB/s 2.90%
UINT32 10000000 1 10000000 656x 3.582 ms 1.06% 3.575 ms 0.85% 2.797G 23.079 GB/s 3.00%
UINT64 10000 1 10000 33520x 19.397 us 131.01% 14.923 us 78.10% 670.100M 10.889 GB/s 1.42%
UINT64 100000 1 100000 5488x 97.201 us 23.47% 92.609 us 17.91% 1.080G 17.547 GB/s 2.28%
UINT64 1000000 1 1000000 1600x 720.439 us 5.34% 709.711 us 1.07% 1.409G 22.897 GB/s 2.98%
UINT64 10000000 1 10000000 864x 6.963 ms 0.69% 6.957 ms 0.66% 1.437G 23.356 GB/s 3.04%
FLOAT32 10000 1 10000 75936x 10.831 us 173.28% 6.586 us 46.08% 1.518G 12.527 GB/s 1.63%
FLOAT32 100000 1 100000 9760x 55.810 us 37.35% 51.256 us 29.94% 1.951G 16.096 GB/s 2.10%
FLOAT32 1000000 1 1000000 1472x 376.142 us 4.43% 370.801 us 1.33% 2.697G 22.249 GB/s 2.90%
FLOAT32 10000000 1 10000000 656x 3.574 ms 1.01% 3.566 ms 0.79% 2.804G 23.133 GB/s 3.01%
FLOAT64 10000 1 10000 32240x 20.017 us 129.93% 15.514 us 80.85% 644.593M 10.475 GB/s 1.36%
FLOAT64 100000 1 100000 5360x 97.988 us 26.14% 93.311 us 19.89% 1.072G 17.415 GB/s 2.27%
FLOAT64 1000000 1 1000000 912x 716.796 us 3.59% 709.875 us 0.90% 1.409G 22.891 GB/s 2.98%
FLOAT64 10000000 1 10000000 848x 6.960 ms 0.72% 6.953 ms 0.64% 1.438G 23.371 GB/s 3.04%
BOOL8 10000 1 10000 47872x 14.683 us 169.28% 10.448 us 95.90% 957.147M 2.154 GB/s 0.28%
BOOL8 100000 1 100000 40560x 16.990 us 141.73% 12.332 us 87.29% 8.109G 18.246 GB/s 2.38%
BOOL8 1000000 1 1000000 9792x 55.473 us 45.77% 51.070 us 35.75% 19.581G 44.057 GB/s 5.74%
BOOL8 10000000 1 10000000 2912x 329.898 us 14.26% 325.543 us 13.82% 30.718G 69.115 GB/s 9.00%
TIMESTAMP_SECONDS 10000 1 10000 34208x 19.115 us 128.69% 14.617 us 84.77% 684.143M 11.117 GB/s 1.45%
TIMESTAMP_SECONDS 100000 1 100000 5600x 94.130 us 26.40% 89.473 us 20.56% 1.118G 18.162 GB/s 2.36%
TIMESTAMP_SECONDS 1000000 1 1000000 1680x 719.785 us 4.93% 709.833 us 0.99% 1.409G 22.893 GB/s 2.98%
TIMESTAMP_SECONDS 10000000 1 10000000 832x 6.965 ms 0.74% 6.959 ms 0.71% 1.437G 23.350 GB/s 3.04%
TIMESTAMP_MILLISECONDS 10000 1 10000 34544x 19.083 us 143.38% 14.475 us 90.75% 690.866M 11.227 GB/s 1.46%
TIMESTAMP_MILLISECONDS 100000 1 100000 5312x 98.903 us 21.70% 94.276 us 16.36% 1.061G 17.237 GB/s 2.24%
TIMESTAMP_MILLISECONDS 1000000 1 1000000 816x 716.060 us 2.49% 710.401 us 1.02% 1.408G 22.874 GB/s 2.98%
TIMESTAMP_MILLISECONDS 10000000 1 10000000 912x 6.963 ms 0.76% 6.955 ms 0.66% 1.438G 23.364 GB/s 3.04%
TIMESTAMP_MICROSECONDS 10000 1 10000 34448x 18.948 us 129.74% 14.516 us 84.23% 688.899M 11.195 GB/s 1.46%
TIMESTAMP_MICROSECONDS 100000 1 100000 5392x 97.438 us 25.19% 92.774 us 19.15% 1.078G 17.516 GB/s 2.28%
TIMESTAMP_MICROSECONDS 1000000 1 1000000 1392x 719.066 us 4.46% 710.365 us 1.05% 1.408G 22.876 GB/s 2.98%
TIMESTAMP_MICROSECONDS 10000000 1 10000000 704x 6.962 ms 0.71% 6.956 ms 0.63% 1.438G 23.363 GB/s 3.04%
TIMESTAMP_NANOSECONDS 10000 1 10000 34336x 19.042 us 133.04% 14.567 us 81.02% 686.503M 11.156 GB/s 1.45%
TIMESTAMP_NANOSECONDS 100000 1 100000 5360x 97.897 us 26.38% 93.317 us 20.79% 1.072G 17.414 GB/s 2.27%
TIMESTAMP_NANOSECONDS 1000000 1 1000000 896x 716.365 us 2.48% 710.618 us 0.95% 1.407G 22.867 GB/s 2.98%
TIMESTAMP_NANOSECONDS 10000000 1 10000000 1088x 6.964 ms 0.68% 6.958 ms 0.65% 1.437G 23.353 GB/s 3.04%
DURATION_SECONDS 10000 1 10000 34352x 19.052 us 135.61% 14.560 us 81.14% 686.809M 11.161 GB/s 1.45%
DURATION_SECONDS 100000 1 100000 5632x 93.949 us 23.86% 89.648 us 19.84% 1.115G 18.126 GB/s 2.36%
DURATION_SECONDS 1000000 1 1000000 1248x 718.485 us 4.64% 709.293 us 0.81% 1.410G 22.910 GB/s 2.98%
DURATION_SECONDS 10000000 1 10000000 73x 6.944 ms 0.32% 6.940 ms 0.31% 1.441G 23.415 GB/s 3.05%
DURATION_MILLISECONDS 10000 1 10000 35408x 18.734 us 138.38% 14.121 us 84.26% 708.159M 11.508 GB/s 1.50%
DURATION_MILLISECONDS 100000 1 100000 5328x 97.849 us 22.06% 93.862 us 18.70% 1.065G 17.313 GB/s 2.25%
DURATION_MILLISECONDS 1000000 1 1000000 896x 717.464 us 3.48% 710.477 us 1.00% 1.408G 22.872 GB/s 2.98%
DURATION_MILLISECONDS 10000000 1 10000000 832x 6.964 ms 0.68% 6.958 ms 0.63% 1.437G 23.355 GB/s 3.04%
DURATION_MICROSECONDS 10000 1 10000 36192x 18.661 us 139.87% 13.818 us 81.09% 723.690M 11.760 GB/s 1.53%
DURATION_MICROSECONDS 100000 1 100000 5568x 94.341 us 24.54% 89.974 us 20.08% 1.111G 18.061 GB/s 2.35%
DURATION_MICROSECONDS 1000000 1 1000000 1424x 721.001 us 6.09% 710.274 us 1.20% 1.408G 22.878 GB/s 2.98%
DURATION_MICROSECONDS 10000000 1 10000000 784x 6.968 ms 0.74% 6.961 ms 0.68% 1.437G 23.343 GB/s 3.04%
DURATION_NANOSECONDS 10000 1 10000 36144x 18.743 us 138.64% 13.835 us 80.03% 722.802M 11.746 GB/s 1.53%
DURATION_NANOSECONDS 100000 1 100000 5280x 99.660 us 22.54% 94.989 us 17.51% 1.053G 17.107 GB/s 2.23%
DURATION_NANOSECONDS 1000000 1 1000000 864x 716.360 us 3.30% 709.397 us 1.02% 1.410G 22.907 GB/s 2.98%
DURATION_NANOSECONDS 10000000 1 10000000 800x 6.969 ms 0.69% 6.963 ms 0.66% 1.436G 23.337 GB/s 3.04%
STRING 10000 1 10000 19648x 30.333 us 81.66% 25.451 us 53.16% 392.915M 15.597 GB/s 2.03%
STRING 100000 1 100000 3344x 191.122 us 12.58% 186.708 us 10.99% 535.596M 21.213 GB/s 2.76%
STRING 1000000 1 1000000 736x 1.709 ms 1.69% 1.701 ms 0.81% 587.816M 23.284 GB/s 3.03%
STRING 10000000 1 10000000 30x 16.869 ms 0.37% 16.864 ms 0.37% 592.968M 23.504 GB/s 3.06%
LIST 10000 1 10000 5376x 100.355 us 25.51% 96.085 us 23.64% 104.074M 14.951 GB/s 1.95%
LIST 100000 1 100000 1824x 668.759 us 9.29% 663.740 us 9.01% 150.661M 21.708 GB/s 2.83%
LIST 1000000 1 1000000 1456x 6.188 ms 0.92% 6.184 ms 0.89% 161.711M 23.310 GB/s 3.03%
LIST 10000000 1 10000000 11x 61.200 ms 0.32% 61.196 ms 0.32% 163.410M 23.544 GB/s 3.07%
DECIMAL32 10000 1 10000 21312x 28.082 us 86.28% 23.469 us 54.12% 426.097M 3.515 GB/s 0.46%
DECIMAL32 100000 1 100000 4064x 172.039 us 16.45% 167.489 us 14.66% 597.053M 4.926 GB/s 0.64%
DECIMAL32 1000000 1 1000000 1008x 1.401 ms 2.15% 1.392 ms 0.81% 718.587M 5.928 GB/s 0.77%
DECIMAL32 10000000 1 10000000 300x 13.708 ms 0.52% 13.700 ms 0.50% 729.923M 6.022 GB/s 0.78%
DECIMAL64 10000 1 10000 19072x 31.073 us 72.24% 26.235 us 49.01% 381.177M 6.194 GB/s 0.81%
DECIMAL64 100000 1 100000 4208x 171.974 us 16.68% 167.114 us 14.30% 598.393M 9.724 GB/s 1.27%
DECIMAL64 1000000 1 1000000 992x 1.399 ms 2.38% 1.389 ms 0.69% 719.817M 11.697 GB/s 1.52%
DECIMAL64 10000000 1 10000000 784x 13.704 ms 0.53% 13.698 ms 0.52% 730.018M 11.863 GB/s 1.54%
DECIMAL128 10000 1 10000 19072x 31.115 us 76.67% 26.217 us 47.79% 381.437M 12.301 GB/s 1.60%
DECIMAL128 100000 1 100000 3632x 171.550 us 14.85% 166.542 us 12.42% 600.448M 19.364 GB/s 2.52%
DECIMAL128 1000000 1 1000000 912x 1.399 ms 2.22% 1.390 ms 0.64% 719.602M 23.207 GB/s 3.02%
DECIMAL128 10000000 1 10000000 44x 13.700 ms 0.50% 13.691 ms 0.50% 730.408M 23.556 GB/s 3.07%
STRUCT 10000 1 10000 5264x 122.561 us 28.39% 118.037 us 25.62% 84.719M 4.753 GB/s 0.62%
STRUCT 100000 1 100000 2224x 354.619 us 18.69% 349.998 us 18.09% 285.716M 16.013 GB/s 2.08%
STRUCT 1000000 1 1000000 2384x 2.548 ms 2.41% 2.544 ms 2.40% 393.069M 22.033 GB/s 2.87%
STRUCT 10000000 1 10000000 21x 24.207 ms 0.46% 24.192 ms 0.45% 413.366M 23.180 GB/s 3.02%

from_arrow_device

[0] NVIDIA RTX A6000

data_type num_rows num_columns num_elements Samples CPU Time Noise GPU Time Noise Elem/s GlobalMem BW BWUtil
INT8 10000 1 10000 245962x 5.244 us 581.43% 1.320 us 51.18% 7.577G 17.049 GB/s 2.22%
INT8 100000 1 100000 246023x 5.294 us 593.38% 1.319 us 63.42% 75.795G 170.538 GB/s 22.20%
INT8 1000000 1 1000000 246541x 5.218 us 578.91% 1.317 us 48.83% 759.157G 1.708 TB/s 222.38%
INT8 10000000 1 10000000 246337x 5.227 us 567.74% 1.322 us 52.60% 7.567T 17.025 TB/s 2216.57%
INT16 10000 1 10000 246063x 5.267 us 603.68% 1.319 us 49.73% 7.580G 32.217 GB/s 4.19%
INT16 100000 1 100000 246336x 5.273 us 600.29% 1.316 us 47.20% 76.000G 323.002 GB/s 42.05%
INT16 1000000 1 1000000 246458x 5.221 us 604.25% 1.318 us 50.04% 758.782G 3.225 TB/s 419.85%
INT16 10000000 1 10000000 246429x 5.215 us 586.21% 1.317 us 48.96% 7.591T 32.262 TB/s 4200.22%
INT32 10000 1 10000 245998x 5.248 us 594.06% 1.317 us 48.77% 7.596G 62.666 GB/s 8.16%
INT32 100000 1 100000 246232x 5.304 us 593.85% 1.317 us 47.64% 75.957G 626.644 GB/s 81.58%
INT32 1000000 1 1000000 246473x 5.214 us 579.86% 1.315 us 47.31% 760.405G 6.273 TB/s 816.74%
INT32 10000000 1 10000000 246508x 5.238 us 591.51% 1.318 us 50.42% 7.585T 62.573 TB/s 8146.52%
INT64 10000 1 10000 246016x 5.267 us 590.19% 1.320 us 50.89% 7.575G 123.102 GB/s 16.03%
INT64 100000 1 100000 246137x 5.294 us 596.01% 1.318 us 49.95% 75.899G 1.233 TB/s 160.57%
INT64 1000000 1 1000000 246558x 5.224 us 578.84% 1.318 us 55.54% 758.816G 12.331 TB/s 1605.37%
INT64 10000000 1 10000000 246224x 5.240 us 585.21% 1.323 us 54.42% 7.556T 122.781 TB/s 15985.13%
UINT8 10000 1 10000 246092x 5.260 us 576.27% 1.318 us 49.99% 7.585G 17.066 GB/s 2.22%
UINT8 100000 1 100000 246284x 5.291 us 596.25% 1.319 us 50.42% 75.829G 170.616 GB/s 22.21%
UINT8 1000000 1 1000000 245824x 5.280 us 640.78% 1.316 us 58.42% 759.723G 1.709 TB/s 222.55%
UINT8 10000000 1 10000000 246455x 5.210 us 587.70% 1.317 us 48.93% 7.591T 17.081 TB/s 2223.76%
UINT16 10000 1 10000 239437x 5.954 us 1089.05% 1.339 us 44.57% 7.466G 31.730 GB/s 4.13%
UINT16 100000 1 100000 245986x 5.312 us 585.31% 1.324 us 61.46% 75.521G 320.964 GB/s 41.79%
UINT16 1000000 1 1000000 245883x 5.248 us 578.67% 1.323 us 54.91% 756.031G 3.213 TB/s 418.32%
UINT16 10000000 1 10000000 246113x 5.255 us 578.30% 1.323 us 66.78% 7.556T 32.114 TB/s 4181.00%
UINT32 10000 1 10000 245627x 5.288 us 586.52% 1.326 us 55.06% 7.541G 62.213 GB/s 8.10%
UINT32 100000 1 100000 245715x 5.321 us 588.84% 1.325 us 55.62% 75.454G 622.492 GB/s 81.04%
UINT32 1000000 1 1000000 246073x 5.248 us 568.53% 1.322 us 51.73% 756.496G 6.241 TB/s 812.54%
UINT32 10000000 1 10000000 245915x 5.272 us 602.89% 1.323 us 54.37% 7.557T 62.343 TB/s 8116.57%
UINT64 10000 1 10000 245628x 5.287 us 584.85% 1.327 us 57.54% 7.535G 122.445 GB/s 15.94%
UINT64 100000 1 100000 245936x 5.331 us 608.98% 1.322 us 52.49% 75.671G 1.230 TB/s 160.09%
UINT64 1000000 1 1000000 246077x 5.260 us 589.97% 1.325 us 54.97% 755.001G 12.269 TB/s 1597.29%
UINT64 10000000 1 10000000 246047x 5.254 us 584.35% 1.324 us 55.42% 7.551T 122.706 TB/s 15975.37%
FLOAT32 10000 1 10000 245713x 5.276 us 571.60% 1.321 us 52.17% 7.569G 62.446 GB/s 8.13%
FLOAT32 100000 1 100000 245793x 5.332 us 602.81% 1.325 us 62.84% 75.447G 622.441 GB/s 81.04%
FLOAT32 1000000 1 1000000 246096x 5.251 us 586.81% 1.324 us 62.38% 755.425G 6.232 TB/s 811.39%
FLOAT32 10000000 1 10000000 245817x 5.256 us 589.69% 1.324 us 54.99% 7.551T 62.298 TB/s 8110.77%
FLOAT64 10000 1 10000 245433x 5.291 us 584.88% 1.322 us 51.11% 7.567G 122.961 GB/s 16.01%
FLOAT64 100000 1 100000 245836x 5.301 us 587.23% 1.321 us 53.24% 75.723G 1.231 TB/s 160.20%
FLOAT64 1000000 1 1000000 246022x 5.251 us 580.16% 1.322 us 54.95% 756.184G 12.288 TB/s 1599.80%
FLOAT64 10000000 1 10000000 245903x 5.251 us 565.87% 1.322 us 54.35% 7.565T 122.936 TB/s 16005.27%
BOOL8 10000 1 10000 53552x 13.601 us 181.70% 9.339 us 98.72% 1.071G 2.409 GB/s 0.31%
BOOL8 100000 1 100000 52576x 13.859 us 172.67% 9.513 us 102.30% 10.512G 23.653 GB/s 3.08%
BOOL8 1000000 1 1000000 37232x 17.849 us 144.82% 13.433 us 87.10% 74.443G 167.496 GB/s 21.81%
BOOL8 10000000 1 10000000 9776x 55.692 us 44.67% 51.183 us 34.05% 195.375G 439.595 GB/s 57.23%
TIMESTAMP_SECONDS 10000 1 10000 245711x 5.296 us 608.87% 1.327 us 57.92% 7.538G 122.490 GB/s 15.95%
TIMESTAMP_SECONDS 100000 1 100000 245069x 5.352 us 628.42% 1.327 us 57.48% 75.365G 1.225 TB/s 159.44%
TIMESTAMP_SECONDS 1000000 1 1000000 245850x 5.244 us 585.07% 1.322 us 59.75% 756.609G 12.295 TB/s 1600.70%
TIMESTAMP_SECONDS 10000000 1 10000000 242869x 5.508 us 830.51% 1.332 us 52.58% 7.508T 122.008 TB/s 15884.47%
TIMESTAMP_MILLISECONDS 10000 1 10000 246137x 5.168 us 430.76% 1.326 us 58.57% 7.540G 122.528 GB/s 15.95%
TIMESTAMP_MILLISECONDS 100000 1 100000 246200x 5.209 us 443.51% 1.325 us 56.61% 75.456G 1.226 TB/s 159.64%
TIMESTAMP_MILLISECONDS 1000000 1 1000000 246369x 5.125 us 431.44% 1.327 us 71.57% 753.731G 12.248 TB/s 1594.61%
TIMESTAMP_MILLISECONDS 10000000 1 10000000 246473x 5.122 us 414.39% 1.328 us 58.64% 7.532T 122.394 TB/s 15934.67%
TIMESTAMP_MICROSECONDS 10000 1 10000 246029x 5.155 us 421.77% 1.324 us 55.58% 7.552G 122.713 GB/s 15.98%
TIMESTAMP_MICROSECONDS 100000 1 100000 246341x 5.174 us 409.84% 1.324 us 56.52% 75.539G 1.228 TB/s 159.81%
TIMESTAMP_MICROSECONDS 1000000 1 1000000 246374x 5.114 us 417.67% 1.322 us 52.68% 756.497G 12.293 TB/s 1600.46%
TIMESTAMP_MICROSECONDS 10000000 1 10000000 246408x 5.111 us 419.38% 1.321 us 53.00% 7.568T 122.982 TB/s 16011.31%
TIMESTAMP_NANOSECONDS 10000 1 10000 245946x 5.162 us 431.89% 1.323 us 53.34% 7.558G 122.819 GB/s 15.99%
TIMESTAMP_NANOSECONDS 100000 1 100000 246334x 5.177 us 417.66% 1.321 us 52.16% 75.697G 1.230 TB/s 160.15%
TIMESTAMP_NANOSECONDS 1000000 1 1000000 246137x 5.124 us 421.97% 1.324 us 54.43% 755.012G 12.269 TB/s 1597.32%
TIMESTAMP_NANOSECONDS 10000000 1 10000000 246291x 5.132 us 424.32% 1.324 us 54.60% 7.554T 122.752 TB/s 15981.28%
DURATION_SECONDS 10000 1 10000 245911x 5.160 us 427.29% 1.322 us 53.08% 7.566G 122.948 GB/s 16.01%
DURATION_SECONDS 100000 1 100000 245953x 5.196 us 438.05% 1.321 us 52.61% 75.698G 1.230 TB/s 160.15%
DURATION_SECONDS 1000000 1 1000000 246365x 5.120 us 432.50% 1.323 us 53.64% 755.650G 12.279 TB/s 1598.67%
DURATION_SECONDS 10000000 1 10000000 246384x 5.116 us 418.73% 1.320 us 57.20% 7.575T 123.098 TB/s 16026.41%
DURATION_MILLISECONDS 10000 1 10000 245691x 5.182 us 487.48% 1.324 us 54.52% 7.552G 122.712 GB/s 15.98%
DURATION_MILLISECONDS 100000 1 100000 246119x 5.188 us 422.06% 1.327 us 57.38% 75.361G 1.225 TB/s 159.44%
DURATION_MILLISECONDS 1000000 1 1000000 246416x 5.119 us 421.60% 1.324 us 54.90% 755.242G 12.273 TB/s 1597.81%
DURATION_MILLISECONDS 10000000 1 10000000 246094x 5.128 us 423.20% 1.325 us 56.34% 7.546T 122.619 TB/s 15964.03%
DURATION_MICROSECONDS 10000 1 10000 245963x 5.171 us 425.81% 1.329 us 59.56% 7.525G 122.280 GB/s 15.92%
DURATION_MICROSECONDS 100000 1 100000 246279x 5.187 us 418.26% 1.325 us 56.80% 75.449G 1.226 TB/s 159.62%
DURATION_MICROSECONDS 1000000 1 1000000 246442x 5.129 us 437.43% 1.323 us 55.00% 755.772G 12.281 TB/s 1598.93%
DURATION_MICROSECONDS 10000000 1 10000000 246361x 5.133 us 429.49% 1.324 us 56.01% 7.553T 122.733 TB/s 15978.91%
DURATION_NANOSECONDS 10000 1 10000 246111x 5.177 us 442.13% 1.327 us 56.82% 7.537G 122.473 GB/s 15.94%
DURATION_NANOSECONDS 100000 1 100000 246222x 5.201 us 435.23% 1.326 us 57.56% 75.394G 1.225 TB/s 159.50%
DURATION_NANOSECONDS 1000000 1 1000000 246239x 5.136 us 438.96% 1.323 us 54.31% 755.725G 12.281 TB/s 1598.83%
DURATION_NANOSECONDS 10000000 1 10000000 246324x 5.133 us 425.18% 1.322 us 53.89% 7.567T 122.960 TB/s 16008.39%
STRING 10000 1 10000 245942x 5.163 us 427.16% 1.327 us 58.32% 7.538G 299.213 GB/s 38.96%
STRING 100000 1 100000 246127x 5.185 us 424.29% 1.327 us 64.37% 75.367G 2.985 TB/s 388.62%
STRING 1000000 1 1000000 246093x 5.130 us 426.44% 1.327 us 57.28% 753.808G 29.859 TB/s 3887.37%
STRING 10000000 1 10000000 246100x 5.143 us 432.19% 1.325 us 63.56% 7.548T 299.174 TB/s 38950.11%
LIST 10000 1 10000 22704x 26.171 us 39.25% 22.029 us 20.44% 453.957M 65.215 GB/s 8.49%
LIST 100000 1 100000 16992x 33.539 us 32.87% 29.433 us 20.77% 3.397G 489.524 GB/s 63.73%
LIST 1000000 1 1000000 5856x 89.472 us 9.47% 85.430 us 4.96% 11.706G 1.687 TB/s 219.67%
LIST 10000000 1 10000000 1824x 654.550 us 1.12% 650.588 us 0.93% 15.371G 2.215 TB/s 288.32%
DECIMAL32 10000 1 10000 246072x 5.170 us 438.18% 1.328 us 58.70% 7.530G 62.121 GB/s 8.09%
DECIMAL32 100000 1 100000 246208x 5.195 us 433.22% 1.326 us 59.45% 75.428G 622.278 GB/s 81.02%
DECIMAL32 1000000 1 1000000 246201x 5.132 us 441.19% 1.325 us 56.88% 754.638G 6.226 TB/s 810.54%
DECIMAL32 10000000 1 10000000 246156x 5.145 us 444.84% 1.323 us 54.85% 7.558T 62.350 TB/s 8117.44%
DECIMAL64 10000 1 10000 245988x 5.174 us 442.65% 1.327 us 55.33% 7.536G 122.467 GB/s 15.94%
DECIMAL64 100000 1 100000 246212x 5.201 us 439.45% 1.323 us 53.09% 75.604G 1.229 TB/s 159.95%
DECIMAL64 1000000 1 1000000 246304x 5.118 us 419.80% 1.322 us 55.79% 756.215G 12.288 TB/s 1599.86%
DECIMAL64 10000000 1 10000000 246096x 5.151 us 446.96% 1.323 us 54.89% 7.561T 122.867 TB/s 15996.34%
DECIMAL128 10000 1 10000 246164x 5.163 us 426.44% 1.321 us 53.03% 7.571G 244.154 GB/s 31.79%
DECIMAL128 100000 1 100000 246029x 5.189 us 432.17% 1.322 us 54.48% 75.644G 2.440 TB/s 317.60%
DECIMAL128 1000000 1 1000000 246327x 5.106 us 414.28% 1.323 us 59.73% 756.080G 24.384 TB/s 3174.55%
DECIMAL128 10000000 1 10000000 246394x 5.124 us 419.56% 1.325 us 56.61% 7.546T 243.346 TB/s 31681.72%
STRUCT 10000 1 10000 246144x 5.166 us 424.91% 1.325 us 54.34% 7.547G 423.408 GB/s 55.12%
STRUCT 100000 1 100000 246200x 5.202 us 433.64% 1.326 us 60.02% 75.439G 4.228 TB/s 550.44%
STRUCT 1000000 1 1000000 246449x 5.128 us 415.98% 1.325 us 62.90% 754.534G 42.294 TB/s 5506.36%
STRUCT 10000000 1 10000000 246420x 5.135 us 409.66% 1.326 us 74.23% 7.542T 422.912 TB/s 55059.79%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Status: In Progress
Status: Burndown
Development

Successfully merging this pull request may close these issues.

[FEA] Add nvbench benchmarks for arrow interop functions
7 participants