Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Gandiva] Enhance random data generation #38569

Open
1 task
llama90 opened this issue Nov 3, 2023 · 8 comments · May be fixed by #38526
Open
1 task

[C++][Gandiva] Enhance random data generation #38569

llama90 opened this issue Nov 3, 2023 · 8 comments · May be fixed by #38526

Comments

@llama90
Copy link
Contributor

llama90 commented Nov 3, 2023

Describe the enhancement requested

Refactor random generation utilizing random.h instead of generate_data.h.

This addresses the issue.

Improvement

  • Code reusability
  • Facilitates additional tests for various data types.

Remaining tasks

The following issues still need to be resolved.

  • Large Decimal for DecimalAdd2Large and DecimalAdd3Large

Question

  • Some metric values (Time, CPU) in the benchmarks are varying. It's concerning whether this is alright.
    • TimedTestAllocs, TimedTestOutputStringAllocs
    • DecimalAdd2LeadingZeroes, DecimalAdd2LeadingZeroesWithDiv, DecimalAdd3LeadingZeroes, DecimalAdd3LeadingZeroesWithDiv,
as-is
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2023-11-03T20:27:46+09:00
Running /Users/lama/workspace/arrow-build-test/cpp/cmake-build-debug/debug/gandiva-micro-benchmarks
Run on (10 X 24.0942 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 4.01, 3.35, 3.58
***WARNING*** Library was built as DEBUG. Timings may be affected.
/Users/lama/workspace/arrow-build-test/cpp/src/gandiva/cache.cc:50: Creating gandiva cache with capacity of 500
/Users/lama/workspace/arrow-build-test/cpp/src/gandiva/engine.cc:129: Detected CPU Name : apple-m1
/Users/lama/workspace/arrow-build-test/cpp/src/gandiva/engine.cc:130: Detected CPU Features:
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
TimedTestAdd3/min_time:1.000                         2740 us         2730 us          508
TimedTestBigNested/min_time:1.000                    9750 us         9685 us          149
TimedTestExtractYear/min_time:1.000                  8294 us         8210 us          172
TimedTestFilterAdd2/min_time:1.000                   4181 us         4172 us          334
TimedTestFilterLike/min_time:1.000                  13706 us        13669 us          102
TimedTestCastFloatFromString/min_time:1.000         71072 us        70941 us           20
TimedTestCastIntFromString/min_time:1.000           48323 us        42640 us           35
TimedTestAllocs/min_time:1.000                     140487 us       137767 us           10
TimedTestOutputStringAllocs/min_time:1.000         228228 us       226211 us            6
TimedTestMultiOr/min_time:1.000                     12905 us        12853 us          102
TimedTestInExpr/min_time:1.000                      23907 us        23854 us           58
DecimalAdd2Fast/min_time:1.000                       3868 us         3848 us          370
DecimalAdd2LeadingZeroes/min_time:1.000              7332 us         7252 us          195
DecimalAdd2LeadingZeroesWithDiv/min_time:1.000      26231 us        26121 us           54
DecimalAdd2Large/min_time:1.000                    126812 us       126515 us           11
DecimalAdd3Fast/min_time:1.000                       4282 us         4266 us          334
DecimalAdd3LeadingZeroes/min_time:1.000             10651 us        10635 us          131
DecimalAdd3LeadingZeroesWithDiv/min_time:1.000      64148 us        63833 us           22
DecimalAdd3Large/min_time:1.000                    253900 us       251054 us            6

to-be
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2023-11-03T20:28:41+09:00
Running /Users/lama/workspace/arrow-latest/cpp/cmake-build-debug/debug/gandiva-micro-benchmarks
Run on (10 X 24.0028 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 4.57, 3.62, 3.67
***WARNING*** Library was built as DEBUG. Timings may be affected.
/Users/lama/workspace/arrow-latest/cpp/src/gandiva/cache.cc:50: Creating gandiva cache with capacity of 500
/Users/lama/workspace/arrow-latest/cpp/src/gandiva/engine.cc:129: Detected CPU Name : apple-m1
/Users/lama/workspace/arrow-latest/cpp/src/gandiva/engine.cc:130: Detected CPU Features:
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
TimedTestAdd3/min_time:1.000                         3232 us         2958 us          487
TimedTestBigNested/min_time:1.000                    6359 us         6327 us          217
TimedTestExtractYear/min_time:1.000                  8252 us         8228 us          172
TimedTestFilterAdd2/min_time:1.000                   5819 us         5810 us          241
TimedTestFilterLike/min_time:1.000                  14109 us        14092 us           99
TimedTestCastFloatFromString/min_time:1.000         79837 us        79717 us           17
TimedTestCastIntFromString/min_time:1.000           45557 us        45439 us           31
TimedTestAllocs/min_time:1.000                     243130 us       242760 us            6
TimedTestOutputStringAllocs/min_time:1.000         332357 us       331799 us            4
TimedTestMultiOr/min_time:1.000                     11269 us        10963 us          118
TimedTestInExpr/min_time:1.000                      24069 us        23862 us           57
DecimalAdd2Fast/min_time:1.000                       3771 us         3757 us          371
DecimalAdd2LeadingZeroes/min_time:1.000             40692 us        40636 us           34
DecimalAdd2LeadingZeroesWithDiv/min_time:1.000     110656 us       110515 us           13
DecimalAdd2Large/min_time:1.000                    112098 us       111152 us           13
DecimalAdd3Fast/min_time:1.000                       4151 us         4137 us          324
DecimalAdd3LeadingZeroes/min_time:1.000             78732 us        78590 us           18
DecimalAdd3LeadingZeroesWithDiv/min_time:1.000     236894 us       236280 us            6
DecimalAdd3Large/min_time:1.000                    235912 us       235529 us            6

Component(s)

C++ - Gandiva

@llama90
Copy link
Contributor Author

llama90 commented Nov 3, 2023

Umm... in TimedTestAllocs, TimedTestOutputStringAllocs case metric values are related to character range.

  • generate_data.h: 'a' - 'z'
  • random.h: 'A' - 'z'

When random.h is generating range 'a' - 'z'

result
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
TimedTestAdd3/min_time:1.000                         4176 us         3081 us          437
TimedTestBigNested/min_time:1.000                    6810 us         6538 us          214
TimedTestExtractYear/min_time:1.000                  8837 us         8465 us          164
TimedTestFilterAdd2/min_time:1.000                   5976 us         5955 us          234
TimedTestFilterLike/min_time:1.000                  14700 us        14581 us           95
TimedTestCastFloatFromString/min_time:1.000         81841 us        81529 us           17
TimedTestCastIntFromString/min_time:1.000           46860 us        46665 us           30
+ TimedTestAllocs/min_time:1.000                     128267 us       127612 us           11
+ TimedTestOutputStringAllocs/min_time:1.000         223284 us       222095 us            6
TimedTestMultiOr/min_time:1.000                     10881 us        10841 us          125
TimedTestInExpr/min_time:1.000                      24227 us        24171 us           57
DecimalAdd2Fast/min_time:1.000                       3989 us         3968 us          353
DecimalAdd2LeadingZeroes/min_time:1.000             41976 us        41825 us           34
DecimalAdd2LeadingZeroesWithDiv/min_time:1.000     114398 us       113747 us           12
DecimalAdd2Large/min_time:1.000                    118118 us       114793 us           12
DecimalAdd3Fast/min_time:1.000                       4461 us         4411 us          322
DecimalAdd3LeadingZeroes/min_time:1.000             81679 us        81066 us           17
DecimalAdd3LeadingZeroesWithDiv/min_time:1.000     239198 us       238394 us            6
DecimalAdd3Large/min_time:1.000                    240675 us       239581 us            6

@kou
Copy link
Member

kou commented Nov 4, 2023

Why is the range important in them?
Is the upper function used in them related?

@llama90
Copy link
Contributor Author

llama90 commented Nov 5, 2023

To conclude, it seems you are correct that the upper function being called within micro_benchmarks is the cause. When the test subject strings are configured to distinguish between uppercase and lowercase (A-z, Z-a), differences appear in the results.

It seems correct that for the tests, the strings should be composed of lowercase letters only since the upper function is being called. What do you think?

Thank you. Through the aspects you mentioned, I was able to understand the cause more clearly.


It was possible to observe a performance difference for AsciiLower and AsciiUpper when mixed cases were involved. (There were also differences for SplitPattern and IsAlphaNumericUnicode, but these will be ignored here as they do not represent a common performance difference across all tests.)

I have reviewed the following to understand why the range affects the benchmark:

For clear verification, I performed the following actions on arrow-compute-scalar-string-benchmark which uses the random.cc. The logic for generating random strings for the String type utilizes the following function, applying only the string length within a fixed character range (A - z). I conducted tests by altering these values.

template <typename TypeClass, typename offset_type = typename TypeClass::offset_type>
static std::shared_ptr<Array> GenerateBinaryArray(
RandomArrayGenerator* gen, int64_t size, int32_t min_length, int32_t max_length,
double null_probability, std::optional<int64_t> max_data_buffer_length,
int64_t alignment, MemoryPool* memory_pool) {
using BuilderType = typename TypeTraits<TypeClass>::BuilderType;
using OffsetArrowType = typename CTypeTraits<offset_type>::ArrowType;
using OffsetArrayType = typename TypeTraits<OffsetArrowType>::ArrayType;
if (null_probability < 0 || null_probability > 1) {
ABORT_NOT_OK(Status::Invalid("null_probability must be between 0 and 1"));
}
auto lengths = std::dynamic_pointer_cast<OffsetArrayType>(gen->Numeric<OffsetArrowType>(
size, min_length, max_length, null_probability, alignment, memory_pool));
// Visual Studio does not implement uniform_int_distribution for char types.
using GenOpt = GenerateOptions<uint8_t, std::uniform_int_distribution<uint16_t>>;
GenOpt options(gen->seed(), static_cast<uint8_t>('A'), static_cast<uint8_t>('z'),
/*null_probability=*/0);
std::vector<uint8_t> str_buffer(max_length);
BuilderType builder{memory_pool, alignment};
if constexpr (std::is_base_of_v<BinaryViewType, TypeClass>) {
if (max_data_buffer_length) {
builder.SetBlockSize(*max_data_buffer_length);
}
}
for (int64_t i = 0; i < size; ++i) {
if (lengths->IsValid(i)) {
options.GenerateData(str_buffer.data(), lengths->Value(i));
ABORT_NOT_OK(builder.Append(str_buffer.data(), lengths->Value(i)));
} else {
ABORT_NOT_OK(builder.AppendNull());
}
}
std::shared_ptr<Array> result;
ABORT_NOT_OK(builder.Finish(&result));
return result;
}

  • Executed the benchmark arrow-compute-scalar-string-benchmark using the range in random.cc (random.h) as follows:
    • A-z (range: 52)
    • A-Z (range: 26)
    • a-z (range:26)
    • only A, a, B, b (range: 1)
    • Z-a, A-B, a-b (range: 2)
      • Check the overlap between upper case and lower case

Benchmark results:

Comparing it this way is helpful.

  • A - z vs. A - Z, a-z
  • A - z vs. Z - a vs. A - B, a - b
benchmark AsciiLower AsciiUpper
A - z 155M/s 154M/s
A - Z 344M/s 403M/s
a - z 331M/s 344M/s
Z - a 237M/s 274M/s
A - B 323M/s 401M/s
a - b 329M/s 350M/s
A - z
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2023-11-05T12:51:50+09:00
Running /Users/lama/workspace/arrow-build-test/cpp/cmake-build-debug/debug/arrow-compute-scalar-string-benchmark
Run on (10 X 24.1204 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 19.62, 13.09, 21.27
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------
AsciiLower                            126596351 ns    102234857 ns            7 bytes_per_second=155.014M/s items_per_second=10.2565M/s
AsciiUpper                            108163111 ns    102306667 ns            6 bytes_per_second=154.905M/s items_per_second=10.2493M/s
IsAlphaNumericAscii                    99619232 ns     98232429 ns            7 bytes_per_second=161.33M/s items_per_second=10.6744M/s
MatchSubstring                        265327153 ns    257578000 ns            3 bytes_per_second=61.5264M/s items_per_second=4.07091M/s
SplitPattern                          349064062 ns    346818000 ns            2 bytes_per_second=45.695M/s items_per_second=3.02342M/s
TrimSingleAscii                       112687937 ns    112247500 ns            6 bytes_per_second=141.187M/s items_per_second=9.34164M/s
TrimManyAscii                         135451817 ns    135040600 ns            5 bytes_per_second=117.356M/s items_per_second=7.76489M/s
MatchLike                             118694500 ns    118310167 ns            6 bytes_per_second=133.952M/s items_per_second=8.86294M/s
MatchLikeSubstring                    254574667 ns    253288000 ns            3 bytes_per_second=62.5685M/s items_per_second=4.13986M/s
MatchLikePrefix                        56798872 ns     56452833 ns           12 bytes_per_second=280.727M/s items_per_second=18.5744M/s
MatchLikeSuffix                        59460042 ns     57332083 ns           12 bytes_per_second=276.422M/s items_per_second=18.2895M/s
Utf8Lower                             124983305 ns    122885833 ns            6 bytes_per_second=128.964M/s items_per_second=8.53293M/s
Utf8Upper                             122891783 ns    121961000 ns            5 bytes_per_second=129.942M/s items_per_second=8.59763M/s
IsAlphaNumericUnicode                 242970972 ns    224661333 ns            3 bytes_per_second=70.541M/s items_per_second=4.66736M/s
TrimSingleUtf8                        105753631 ns    105216143 ns            7 bytes_per_second=150.622M/s items_per_second=9.96592M/s
TrimManyUtf8                          146198633 ns    145622200 ns            5 bytes_per_second=108.828M/s items_per_second=7.20066M/s
BinaryJoinArrayScalar                    513846 ns       512393 ns         1370 bytes_per_second=225.339M/s
BinaryJoinArrayArray                     595282 ns       592727 ns         1190 bytes_per_second=194.798M/s
BinaryJoinElementWiseArrayScalar/2      3774192 ns      3764715 ns          186 bytes_per_second=61.8489M/s
BinaryJoinElementWiseArrayScalar/8      8379259 ns      8346762 ns           84 bytes_per_second=111.905M/s
BinaryJoinElementWiseArrayScalar/64    30608786 ns     30485435 ns           23 bytes_per_second=245.379M/s
BinaryJoinElementWiseArrayScalar/128   48874914 ns     48757286 ns           14 bytes_per_second=306.722M/s
BinaryJoinElementWiseArrayArray/2       2496855 ns      2491751 ns          281 bytes_per_second=93.4457M/s
BinaryJoinElementWiseArrayArray/8       6885788 ns      6866119 ns          101 bytes_per_second=136.037M/s
BinaryJoinElementWiseArrayArray/64     29454262 ns     29395833 ns           24 bytes_per_second=254.474M/s
BinaryJoinElementWiseArrayArray/128    47788347 ns     47710533 ns           15 bytes_per_second=313.451M/s
BinaryRepeat                          119525042 ns    119285333 ns            6 bytes_per_second=132.857M/s items_per_second=8.79049M/s
A - Z
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2023-11-05T13:29:17+09:00
Running /Users/lama/workspace/arrow-build-test/cpp/cmake-build-debug/debug/arrow-compute-scalar-string-benchmark
Run on (10 X 24.1206 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 21.22, 8.88, 7.84
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------
AsciiLower                             50217964 ns     46028933 ns           15 bytes_per_second=344.302M/s items_per_second=22.7808M/s
AsciiUpper                             39421998 ns     39257765 ns           17 bytes_per_second=403.687M/s items_per_second=26.71M/s
IsAlphaNumericAscii                   120946236 ns    120427833 ns            6 bytes_per_second=131.596M/s items_per_second=8.70709M/s
MatchSubstring                        252679167 ns    248814000 ns            3 bytes_per_second=63.6935M/s items_per_second=4.2143M/s
SplitPattern                          299022229 ns    298483000 ns            2 bytes_per_second=53.0946M/s items_per_second=3.51302M/s
TrimSingleAscii                       112433847 ns    111774000 ns            6 bytes_per_second=141.785M/s items_per_second=9.38122M/s
TrimManyAscii                         149166142 ns    140845000 ns            5 bytes_per_second=112.52M/s items_per_second=7.44489M/s
MatchLike                             120614361 ns    120173500 ns            6 bytes_per_second=131.875M/s items_per_second=8.72552M/s
MatchLikeSubstring                    250514431 ns    249707333 ns            3 bytes_per_second=63.4657M/s items_per_second=4.19922M/s
MatchLikePrefix                        56792788 ns     56595000 ns           12 bytes_per_second=280.022M/s items_per_second=18.5277M/s
MatchLikeSuffix                        56761417 ns     56597917 ns           12 bytes_per_second=280.008M/s items_per_second=18.5268M/s
Utf8Lower                             123031313 ns    122381000 ns            6 bytes_per_second=129.496M/s items_per_second=8.56813M/s
Utf8Upper                             122140590 ns    121736167 ns            6 bytes_per_second=130.182M/s items_per_second=8.61351M/s
IsAlphaNumericUnicode                 410607062 ns    409221000 ns            2 bytes_per_second=38.7269M/s items_per_second=2.56237M/s
TrimSingleUtf8                        111638805 ns    111270000 ns            6 bytes_per_second=142.427M/s items_per_second=9.42371M/s
TrimManyUtf8                          145255867 ns    144850200 ns            5 bytes_per_second=109.408M/s items_per_second=7.23904M/s
BinaryJoinArrayScalar                    517262 ns       515486 ns         1347 bytes_per_second=223.987M/s
BinaryJoinArrayArray                     599597 ns       597625 ns         1172 bytes_per_second=193.202M/s
BinaryJoinElementWiseArrayScalar/2      3839304 ns      3825164 ns          183 bytes_per_second=60.8715M/s
BinaryJoinElementWiseArrayScalar/8      8526469 ns      8401663 ns           83 bytes_per_second=111.174M/s
BinaryJoinElementWiseArrayScalar/64    30989161 ns     30885739 ns           23 bytes_per_second=242.198M/s
BinaryJoinElementWiseArrayScalar/128   51342012 ns     51161357 ns           14 bytes_per_second=292.309M/s
BinaryJoinElementWiseArrayArray/2       2711756 ns      2560080 ns          275 bytes_per_second=90.9516M/s
BinaryJoinElementWiseArrayArray/8       7016266 ns      6995290 ns          100 bytes_per_second=133.525M/s
BinaryJoinElementWiseArrayArray/64     30042234 ns     29939261 ns           23 bytes_per_second=249.855M/s
BinaryJoinElementWiseArrayArray/128    48674205 ns     48520286 ns           14 bytes_per_second=308.22M/s
BinaryRepeat                          117393298 ns    117199833 ns            6 bytes_per_second=135.221M/s items_per_second=8.94691M/s
a - z
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2023-11-05T12:55:36+09:00
Running /Users/lama/workspace/arrow-build-test/cpp/cmake-build-debug/debug/arrow-compute-scalar-string-benchmark
Run on (10 X 24.1203 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 17.92, 12.78, 19.09
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------
AsciiLower                             68504931 ns     47875600 ns           15 bytes_per_second=331.021M/s items_per_second=21.9021M/s
AsciiUpper                             46894272 ns     46014600 ns           15 bytes_per_second=344.409M/s items_per_second=22.7879M/s
IsAlphaNumericAscii                   109179389 ns    108756000 ns            6 bytes_per_second=145.719M/s items_per_second=9.64155M/s
MatchSubstring                        262322180 ns    261283333 ns            3 bytes_per_second=60.6539M/s items_per_second=4.01318M/s
SplitPattern                          399403687 ns    398345500 ns            2 bytes_per_second=39.7842M/s items_per_second=2.63233M/s
TrimSingleAscii                       113099604 ns    112723167 ns            6 bytes_per_second=140.591M/s items_per_second=9.30222M/s
TrimManyAscii                         137971083 ns    137643200 ns            5 bytes_per_second=115.137M/s items_per_second=7.61807M/s
MatchLike                             119181104 ns    118868333 ns            6 bytes_per_second=133.323M/s items_per_second=8.82132M/s
MatchLikeSubstring                    262609319 ns    261861333 ns            3 bytes_per_second=60.52M/s items_per_second=4.00432M/s
MatchLikePrefix                        57234142 ns     57110167 ns           12 bytes_per_second=277.496M/s items_per_second=18.3606M/s
MatchLikeSuffix                        57068629 ns     56907000 ns           12 bytes_per_second=278.487M/s items_per_second=18.4261M/s
Utf8Lower                             122208132 ns    121717000 ns            6 bytes_per_second=130.202M/s items_per_second=8.61487M/s
Utf8Upper                             152733583 ns    123730500 ns            6 bytes_per_second=128.084M/s items_per_second=8.47468M/s
IsAlphaNumericUnicode                 412612563 ns    409943000 ns            2 bytes_per_second=38.6586M/s items_per_second=2.55786M/s
TrimSingleUtf8                         93089458 ns     92609000 ns            8 bytes_per_second=171.126M/s items_per_second=11.3226M/s
TrimManyUtf8                          129182483 ns    127529000 ns            5 bytes_per_second=124.269M/s items_per_second=8.22226M/s
BinaryJoinArrayScalar                    521784 ns       518275 ns         1342 bytes_per_second=222.782M/s
BinaryJoinArrayArray                     635901 ns       609174 ns         1164 bytes_per_second=189.539M/s
BinaryJoinElementWiseArrayScalar/2      4034207 ns      3929483 ns          180 bytes_per_second=59.2555M/s
BinaryJoinElementWiseArrayScalar/8      8474070 ns      8427341 ns           82 bytes_per_second=110.835M/s
BinaryJoinElementWiseArrayScalar/64    31594422 ns     31024435 ns           23 bytes_per_second=241.116M/s
BinaryJoinElementWiseArrayScalar/128   53196176 ns     51912143 ns           14 bytes_per_second=288.082M/s
BinaryJoinElementWiseArrayArray/2       2562493 ns      2550762 ns          273 bytes_per_second=91.2839M/s
BinaryJoinElementWiseArrayArray/8       7039630 ns      6992545 ns           99 bytes_per_second=133.577M/s
BinaryJoinElementWiseArrayArray/64     29908516 ns     29812565 ns           23 bytes_per_second=250.917M/s
BinaryJoinElementWiseArrayArray/128    48417548 ns     48254357 ns           14 bytes_per_second=309.919M/s
BinaryRepeat                          116653729 ns    116475667 ns            6 bytes_per_second=136.061M/s items_per_second=9.00253M/s
A
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2023-11-05T16:50:40+09:00
Running /Users/lama/workspace/arrow-build-test/cpp/cmake-build-debug/debug/arrow-compute-scalar-string-benchmark
Run on (10 X 24.1206 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 24.58, 10.63, 7.73
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------
AsciiLower                             59260619 ns     45435200 ns           15 bytes_per_second=348.801M/s items_per_second=23.0785M/s
AsciiUpper                             54146987 ns     40933000 ns           16 bytes_per_second=387.165M/s items_per_second=25.6169M/s
IsAlphaNumericAscii                   137384979 ns    130968167 ns            6 bytes_per_second=121.005M/s items_per_second=8.00634M/s
MatchSubstring                        258750625 ns    252653667 ns            3 bytes_per_second=62.7256M/s items_per_second=4.15025M/s
SplitPattern                          299539709 ns    298784000 ns            2 bytes_per_second=53.0411M/s items_per_second=3.50948M/s
TrimSingleAscii                       111507625 ns    111294667 ns            6 bytes_per_second=142.395M/s items_per_second=9.42162M/s
TrimManyAscii                         203941069 ns    203470000 ns            3 bytes_per_second=77.8879M/s items_per_second=5.15347M/s
MatchLike                             119806375 ns    119497500 ns            6 bytes_per_second=132.621M/s items_per_second=8.77488M/s
MatchLikeSubstring                    248376431 ns    247746000 ns            3 bytes_per_second=63.9681M/s items_per_second=4.23246M/s
MatchLikePrefix                        56100764 ns     55995583 ns           12 bytes_per_second=283.019M/s items_per_second=18.726M/s
MatchLikeSuffix                        56816561 ns     56097538 ns           13 bytes_per_second=282.505M/s items_per_second=18.692M/s
Utf8Lower                             122156556 ns    121713833 ns            6 bytes_per_second=130.206M/s items_per_second=8.61509M/s
Utf8Upper                             122179403 ns    121698667 ns            6 bytes_per_second=130.222M/s items_per_second=8.61617M/s
IsAlphaNumericUnicode                 410486979 ns    409009000 ns            2 bytes_per_second=38.7469M/s items_per_second=2.5637M/s
TrimSingleUtf8                        111251986 ns    110930667 ns            6 bytes_per_second=142.863M/s items_per_second=9.45253M/s
TrimManyUtf8                          304767895 ns    303984500 ns            2 bytes_per_second=52.1337M/s items_per_second=3.44944M/s
BinaryJoinArrayScalar                    516491 ns       514441 ns         1359 bytes_per_second=224.442M/s
BinaryJoinArrayArray                     597666 ns       595395 ns         1174 bytes_per_second=193.925M/s
BinaryJoinElementWiseArrayScalar/2      3828987 ns      3817743 ns          183 bytes_per_second=60.9898M/s
BinaryJoinElementWiseArrayScalar/8      8424908 ns      8400627 ns           83 bytes_per_second=111.188M/s
BinaryJoinElementWiseArrayScalar/64    36747569 ns     32232043 ns           23 bytes_per_second=232.082M/s
BinaryJoinElementWiseArrayScalar/128   58483125 ns     54018571 ns           14 bytes_per_second=276.848M/s
BinaryJoinElementWiseArrayArray/2       2692385 ns      2591653 ns          265 bytes_per_second=89.8436M/s
BinaryJoinElementWiseArrayArray/8       7018530 ns      7000495 ns           99 bytes_per_second=133.426M/s
BinaryJoinElementWiseArrayArray/64     29864237 ns     29787478 ns           23 bytes_per_second=251.128M/s
BinaryJoinElementWiseArrayArray/128    49300289 ns     49028643 ns           14 bytes_per_second=305.024M/s
BinaryRepeat                          120564430 ns    120146000 ns            6 bytes_per_second=131.905M/s items_per_second=8.72751M/s
a
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2023-11-05T16:52:57+09:00
Running /Users/lama/workspace/arrow-build-test/cpp/cmake-build-debug/debug/arrow-compute-scalar-string-benchmark
Run on (10 X 24.1246 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 37.83, 16.60, 10.30
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------
AsciiLower                             60577072 ns     46872800 ns           15 bytes_per_second=338.103M/s items_per_second=22.3707M/s
AsciiUpper                             48606701 ns     41226706 ns           17 bytes_per_second=384.407M/s items_per_second=25.4344M/s
IsAlphaNumericAscii                   119629299 ns    119414000 ns            6 bytes_per_second=132.713M/s items_per_second=8.78101M/s
MatchSubstring                        257597486 ns    256771000 ns            3 bytes_per_second=61.7197M/s items_per_second=4.0837M/s
SplitPattern                          300296125 ns    299404500 ns            2 bytes_per_second=52.9312M/s items_per_second=3.50221M/s
TrimSingleAscii                       112323972 ns    111810167 ns            6 bytes_per_second=141.739M/s items_per_second=9.37818M/s
TrimManyAscii                         203778403 ns    203120000 ns            3 bytes_per_second=78.0221M/s items_per_second=5.16235M/s
MatchLike                             119820972 ns    119458833 ns            6 bytes_per_second=132.664M/s items_per_second=8.77772M/s
MatchLikeSubstring                    256275458 ns    255825333 ns            3 bytes_per_second=61.9479M/s items_per_second=4.0988M/s
MatchLikePrefix                        56712590 ns     56558500 ns           12 bytes_per_second=280.203M/s items_per_second=18.5397M/s
MatchLikeSuffix                        56969875 ns     56785167 ns           12 bytes_per_second=279.084M/s items_per_second=18.4657M/s
Utf8Lower                             121201930 ns    121057167 ns            6 bytes_per_second=130.912M/s items_per_second=8.66183M/s
Utf8Upper                             122345653 ns    121983833 ns            6 bytes_per_second=129.918M/s items_per_second=8.59602M/s
IsAlphaNumericUnicode                 413550229 ns    411320500 ns            2 bytes_per_second=38.5292M/s items_per_second=2.54929M/s
TrimSingleUtf8                        113653875 ns    113339000 ns            6 bytes_per_second=139.827M/s items_per_second=9.25168M/s
TrimManyUtf8                          304111230 ns    291555500 ns            2 bytes_per_second=54.3562M/s items_per_second=3.59649M/s
BinaryJoinArrayScalar                    515313 ns       512924 ns         1366 bytes_per_second=225.106M/s
BinaryJoinArrayArray                     592849 ns       591778 ns         1170 bytes_per_second=195.111M/s
BinaryJoinElementWiseArrayScalar/2      3849662 ns      3802697 ns          185 bytes_per_second=61.2311M/s
BinaryJoinElementWiseArrayScalar/8      8359060 ns      8343250 ns           84 bytes_per_second=111.952M/s
BinaryJoinElementWiseArrayScalar/64    30720440 ns     30650478 ns           23 bytes_per_second=244.057M/s
BinaryJoinElementWiseArrayScalar/128   49606673 ns     49499643 ns           14 bytes_per_second=302.122M/s
BinaryJoinElementWiseArrayArray/2       2527602 ns      2521259 ns          278 bytes_per_second=92.352M/s
BinaryJoinElementWiseArrayArray/8       6932439 ns      6916228 ns          101 bytes_per_second=135.051M/s
BinaryJoinElementWiseArrayArray/64     29666691 ns     29603042 ns           24 bytes_per_second=252.693M/s
BinaryJoinElementWiseArrayArray/128    48173015 ns     48073286 ns           14 bytes_per_second=311.086M/s
BinaryRepeat                          120577007 ns    120241167 ns            6 bytes_per_second=131.8M/s items_per_second=8.72061M/s

B
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2023-11-05T16:55:46+09:00
Running /Users/lama/workspace/arrow-build-test/cpp/cmake-build-debug/debug/arrow-compute-scalar-string-benchmark
Run on (10 X 24.3912 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 21.24, 15.90, 11.03
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------
AsciiLower                             66595109 ns     45477750 ns           16 bytes_per_second=348.475M/s items_per_second=23.0569M/s
AsciiUpper                             67166255 ns     41177118 ns           17 bytes_per_second=384.87M/s items_per_second=25.465M/s
IsAlphaNumericAscii                   162978063 ns    129192000 ns            6 bytes_per_second=122.669M/s items_per_second=8.11642M/s
MatchSubstring                        254909944 ns    251145667 ns            3 bytes_per_second=63.1022M/s items_per_second=4.17517M/s
SplitPattern                          306430188 ns    302196000 ns            2 bytes_per_second=52.4423M/s items_per_second=3.46985M/s
TrimSingleAscii                       112436854 ns    112052667 ns            6 bytes_per_second=141.432M/s items_per_second=9.35789M/s
TrimManyAscii                         208635278 ns    205369333 ns            3 bytes_per_second=77.1675M/s items_per_second=5.10581M/s
MatchLike                             120709285 ns    120153000 ns            6 bytes_per_second=131.897M/s items_per_second=8.72701M/s
MatchLikeSubstring                    252236653 ns    250617333 ns            3 bytes_per_second=63.2352M/s items_per_second=4.18397M/s
MatchLikePrefix                        56806969 ns     56557917 ns           12 bytes_per_second=280.206M/s items_per_second=18.5399M/s
MatchLikeSuffix                        59267385 ns     57512250 ns           12 bytes_per_second=275.556M/s items_per_second=18.2322M/s
Utf8Lower                             122572944 ns    122257333 ns            6 bytes_per_second=129.627M/s items_per_second=8.57679M/s
Utf8Upper                             122658451 ns    122209000 ns            6 bytes_per_second=129.678M/s items_per_second=8.58019M/s
IsAlphaNumericUnicode                 410729333 ns    409563000 ns            2 bytes_per_second=38.6945M/s items_per_second=2.56023M/s
TrimSingleUtf8                        111459104 ns    111060000 ns            6 bytes_per_second=142.696M/s items_per_second=9.44153M/s
TrimManyUtf8                          305294500 ns    304472000 ns            2 bytes_per_second=52.0502M/s items_per_second=3.44392M/s
BinaryJoinArrayScalar                    521329 ns       518370 ns         1341 bytes_per_second=222.741M/s
BinaryJoinArrayArray                     604294 ns       601321 ns         1166 bytes_per_second=192.014M/s
BinaryJoinElementWiseArrayScalar/2      3846224 ns      3830492 ns          183 bytes_per_second=60.7868M/s
BinaryJoinElementWiseArrayScalar/8      8438597 ns      8407434 ns           83 bytes_per_second=111.097M/s
BinaryJoinElementWiseArrayScalar/64    32029102 ns     31324273 ns           22 bytes_per_second=238.808M/s
BinaryJoinElementWiseArrayScalar/128   49774158 ns     49610786 ns           14 bytes_per_second=301.445M/s
BinaryJoinElementWiseArrayArray/2       2551518 ns      2541444 ns          275 bytes_per_second=91.6186M/s
BinaryJoinElementWiseArrayArray/8       7059063 ns      7008170 ns          100 bytes_per_second=133.279M/s
BinaryJoinElementWiseArrayArray/64     30176525 ns     29983304 ns           23 bytes_per_second=249.488M/s
BinaryJoinElementWiseArrayArray/128    48993896 ns     48834929 ns           14 bytes_per_second=306.234M/s
BinaryRepeat                          120613792 ns    120028000 ns            6 bytes_per_second=132.035M/s items_per_second=8.73609M/s
a
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2023-11-05T17:06:50+09:00
Running /Users/lama/workspace/arrow-build-test/cpp/cmake-build-debug/debug/arrow-compute-scalar-string-benchmark
Run on (10 X 24.2828 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 22.22, 12.26, 11.78
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------
AsciiLower                             69841256 ns     46085733 ns           15 bytes_per_second=343.877M/s items_per_second=22.7527M/s
AsciiUpper                             56986850 ns     46607067 ns           15 bytes_per_second=340.031M/s items_per_second=22.4982M/s
IsAlphaNumericAscii                   150377437 ns    117138667 ns            6 bytes_per_second=135.291M/s items_per_second=8.95158M/s
MatchSubstring                        505749417 ns    472646500 ns            2 bytes_per_second=33.53M/s items_per_second=2.21852M/s
SplitPattern                         1847715750 ns   1804689000 ns            1 bytes_per_second=8.78148M/s items_per_second=581.029k/s
TrimSingleAscii                       206645833 ns    205643000 ns            3 bytes_per_second=77.0648M/s items_per_second=5.09901M/s
TrimManyAscii                         207149347 ns    205301333 ns            3 bytes_per_second=77.1931M/s items_per_second=5.1075M/s
MatchLike                             131151854 ns    122571500 ns            6 bytes_per_second=129.295M/s items_per_second=8.55481M/s
MatchLikeSubstring                    459364917 ns    456693500 ns            2 bytes_per_second=34.7013M/s items_per_second=2.29602M/s
MatchLikePrefix                        57800965 ns     57385417 ns           12 bytes_per_second=276.165M/s items_per_second=18.2725M/s
MatchLikeSuffix                        58057163 ns     57391083 ns           12 bytes_per_second=276.138M/s items_per_second=18.2707M/s
Utf8Lower                             127273701 ns    123126167 ns            6 bytes_per_second=128.712M/s items_per_second=8.51627M/s
Utf8Upper                             129544535 ns    123194833 ns            6 bytes_per_second=128.64M/s items_per_second=8.51153M/s
IsAlphaNumericUnicode                 440119375 ns    415563000 ns            2 bytes_per_second=38.1358M/s items_per_second=2.52327M/s
TrimSingleUtf8                        286792500 ns    279280667 ns            3 bytes_per_second=56.7452M/s items_per_second=3.75456M/s
TrimManyUtf8                          281064542 ns    278604000 ns            3 bytes_per_second=56.883M/s items_per_second=3.76368M/s
BinaryJoinArrayScalar                    520501 ns       517758 ns         1359 bytes_per_second=223.004M/s
BinaryJoinArrayArray                     609256 ns       601919 ns         1152 bytes_per_second=191.824M/s
BinaryJoinElementWiseArrayScalar/2      3831887 ns      3817383 ns          183 bytes_per_second=60.9956M/s
BinaryJoinElementWiseArrayScalar/8      8460215 ns      8416759 ns           83 bytes_per_second=110.974M/s
BinaryJoinElementWiseArrayScalar/64    30925404 ns     30820304 ns           23 bytes_per_second=242.713M/s
BinaryJoinElementWiseArrayScalar/128   51981167 ns     51192000 ns           10 bytes_per_second=292.134M/s
BinaryJoinElementWiseArrayArray/2       2568378 ns      2559942 ns          274 bytes_per_second=90.9565M/s
BinaryJoinElementWiseArrayArray/8       7007568 ns      6984750 ns          100 bytes_per_second=133.726M/s
BinaryJoinElementWiseArrayArray/64     29844184 ns     29741000 ns           24 bytes_per_second=251.521M/s
BinaryJoinElementWiseArrayArray/128    49258946 ns     48964357 ns           14 bytes_per_second=305.425M/s
BinaryRepeat                          123376778 ns    120404167 ns            6 bytes_per_second=131.622M/s items_per_second=8.7088M/s
Z-a
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2023-11-05T16:59:32+09:00
Running /Users/lama/workspace/arrow-build-test/cpp/cmake-build-debug/debug/arrow-compute-scalar-string-benchmark
Run on (10 X 24.1209 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 26.56, 20.79, 14.29
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------
AsciiLower                             86322557 ns     66717455 ns           11 bytes_per_second=237.537M/s items_per_second=15.7167M/s
AsciiUpper                             60021861 ns     57794750 ns           12 bytes_per_second=274.209M/s items_per_second=18.1431M/s
IsAlphaNumericAscii                    20754428 ns     20625882 ns           34 bytes_per_second=768.347M/s items_per_second=50.8379M/s
MatchSubstring                        289741979 ns    288741500 ns            2 bytes_per_second=54.8859M/s items_per_second=3.63154M/s
SplitPattern                          600066833 ns    597824000 ns            1 bytes_per_second=26.5092M/s items_per_second=1.75399M/s
TrimSingleAscii                       115720187 ns    115507333 ns            6 bytes_per_second=137.202M/s items_per_second=9.078M/s
TrimManyAscii                         116993021 ns    116707167 ns            6 bytes_per_second=135.792M/s items_per_second=8.98468M/s
MatchLike                             123895333 ns    121384167 ns            6 bytes_per_second=130.559M/s items_per_second=8.63849M/s
MatchLikeSubstring                    290632938 ns    290035000 ns            2 bytes_per_second=54.6411M/s items_per_second=3.61534M/s
MatchLikePrefix                        57807208 ns     57681333 ns           12 bytes_per_second=274.748M/s items_per_second=18.1788M/s
MatchLikeSuffix                        57836149 ns     57712833 ns           12 bytes_per_second=274.598M/s items_per_second=18.1689M/s
Utf8Lower                             125754833 ns    125529400 ns            5 bytes_per_second=126.248M/s items_per_second=8.35323M/s
Utf8Upper                             125341347 ns    125139000 ns            6 bytes_per_second=126.642M/s items_per_second=8.37929M/s
IsAlphaNumericUnicode                  39587384 ns     39492222 ns           18 bytes_per_second=401.29M/s items_per_second=26.5515M/s
TrimSingleUtf8                        114925729 ns    114602333 ns            6 bytes_per_second=138.286M/s items_per_second=9.14969M/s
TrimManyUtf8                          114509111 ns    114236333 ns            6 bytes_per_second=138.729M/s items_per_second=9.17901M/s
BinaryJoinArrayScalar                    510474 ns       509409 ns         1365 bytes_per_second=226.659M/s
BinaryJoinArrayArray                     590959 ns       589773 ns         1183 bytes_per_second=195.774M/s
BinaryJoinElementWiseArrayScalar/2      4062842 ns      3789581 ns          186 bytes_per_second=61.4431M/s
BinaryJoinElementWiseArrayScalar/8      8409721 ns      8381446 ns           83 bytes_per_second=111.442M/s
BinaryJoinElementWiseArrayScalar/64    30689185 ns     30588696 ns           23 bytes_per_second=244.55M/s
BinaryJoinElementWiseArrayScalar/128   49766131 ns     49647214 ns           14 bytes_per_second=301.224M/s
BinaryJoinElementWiseArrayArray/2       2467776 ns      2464942 ns          277 bytes_per_second=94.462M/s
BinaryJoinElementWiseArrayArray/8       6784096 ns      6775165 ns          103 bytes_per_second=137.863M/s
BinaryJoinElementWiseArrayArray/64     29109743 ns     29048958 ns           24 bytes_per_second=257.513M/s
BinaryJoinElementWiseArrayArray/128    47438644 ns     47354667 ns           15 bytes_per_second=315.807M/s
BinaryRepeat                          114823000 ns    114782167 ns            6 bytes_per_second=138.069M/s items_per_second=9.13536M/s
A-B
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2023-11-05T17:18:08+09:00
Running /Users/lama/workspace/arrow-build-test/cpp/cmake-build-debug/debug/arrow-compute-scalar-string-benchmark
Run on (10 X 24.0567 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 21.43, 12.73, 11.72
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------
AsciiLower                             71967039 ns     49053800 ns           15 bytes_per_second=323.071M/s items_per_second=21.376M/s
AsciiUpper                             39829296 ns     39489500 ns           18 bytes_per_second=401.318M/s items_per_second=26.5533M/s
IsAlphaNumericAscii                   122967611 ns    121176500 ns            6 bytes_per_second=130.783M/s items_per_second=8.65329M/s
MatchSubstring                        251616319 ns    250154667 ns            3 bytes_per_second=63.3522M/s items_per_second=4.19171M/s
SplitPattern                          300475188 ns    299781000 ns            2 bytes_per_second=52.8647M/s items_per_second=3.49781M/s
TrimSingleAscii                       112082438 ns    111340667 ns            6 bytes_per_second=142.337M/s items_per_second=9.41773M/s
TrimManyAscii                         203422375 ns    202994333 ns            3 bytes_per_second=78.0704M/s items_per_second=5.16554M/s
MatchLike                             119299465 ns    119062167 ns            6 bytes_per_second=133.106M/s items_per_second=8.80696M/s
MatchLikeSubstring                    248962139 ns    248359333 ns            3 bytes_per_second=63.8101M/s items_per_second=4.22201M/s
MatchLikePrefix                        56438840 ns     56303750 ns           12 bytes_per_second=281.47M/s items_per_second=18.6236M/s
MatchLikeSuffix                        56384566 ns     56263333 ns           12 bytes_per_second=281.673M/s items_per_second=18.6369M/s
Utf8Lower                             122655479 ns    122189667 ns            6 bytes_per_second=129.699M/s items_per_second=8.58154M/s
Utf8Upper                             121992771 ns    121751667 ns            6 bytes_per_second=130.165M/s items_per_second=8.61242M/s
IsAlphaNumericUnicode                 409912125 ns    408886000 ns            2 bytes_per_second=38.7586M/s items_per_second=2.56447M/s
TrimSingleUtf8                        111437299 ns    111046667 ns            6 bytes_per_second=142.713M/s items_per_second=9.44266M/s
TrimManyUtf8                          279231319 ns    278481667 ns            3 bytes_per_second=56.908M/s items_per_second=3.76533M/s
BinaryJoinArrayScalar                    521854 ns       520399 ns         1336 bytes_per_second=221.873M/s
BinaryJoinArrayArray                     600541 ns       599092 ns         1168 bytes_per_second=192.729M/s
BinaryJoinElementWiseArrayScalar/2      3832028 ns      3821415 ns          183 bytes_per_second=60.9312M/s
BinaryJoinElementWiseArrayScalar/8      8415257 ns      8385398 ns           83 bytes_per_second=111.389M/s
BinaryJoinElementWiseArrayScalar/64    31034620 ns     30932609 ns           23 bytes_per_second=241.831M/s
BinaryJoinElementWiseArrayScalar/128   51910233 ns     50558200 ns           10 bytes_per_second=295.796M/s
BinaryJoinElementWiseArrayArray/2       2543797 ns      2536232 ns          276 bytes_per_second=91.8068M/s
BinaryJoinElementWiseArrayArray/8       7002950 ns      6978510 ns          100 bytes_per_second=133.846M/s
BinaryJoinElementWiseArrayArray/64     30062612 ns     29991696 ns           23 bytes_per_second=249.418M/s
BinaryJoinElementWiseArrayArray/128    48614661 ns     48454429 ns           14 bytes_per_second=308.639M/s
BinaryRepeat                          117739181 ns    117578000 ns            6 bytes_per_second=134.786M/s items_per_second=8.91813M/s
a-b
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2023-11-05T17:13:17+09:00
Running /Users/lama/workspace/arrow-build-test/cpp/cmake-build-debug/debug/arrow-compute-scalar-string-benchmark
Run on (10 X 24.0135 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 26.45, 12.28, 11.53
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------
AsciiLower                             93316692 ns     48147667 ns           15 bytes_per_second=329.151M/s items_per_second=21.7783M/s
AsciiUpper                             46178678 ns     45217267 ns           15 bytes_per_second=350.482M/s items_per_second=23.1897M/s
IsAlphaNumericAscii                   119479292 ns    112159667 ns            6 bytes_per_second=141.297M/s items_per_second=9.34896M/s
MatchSubstring                        423356271 ns    420812000 ns            2 bytes_per_second=37.6601M/s items_per_second=2.49179M/s
SplitPattern                         1397767917 ns   1383606000 ns            1 bytes_per_second=11.454M/s items_per_second=757.857k/s
TrimSingleAscii                       189197188 ns    176411500 ns            4 bytes_per_second=89.8345M/s items_per_second=5.94392M/s
TrimManyAscii                         224340180 ns    208932667 ns            3 bytes_per_second=75.8514M/s items_per_second=5.01873M/s
MatchLike                             265022208 ns    257496667 ns            3 bytes_per_second=61.5458M/s items_per_second=4.07219M/s
MatchLikeSubstring                    421082313 ns    418003000 ns            2 bytes_per_second=37.9132M/s items_per_second=2.50854M/s
MatchLikePrefix                        65303189 ns     64958000 ns           11 bytes_per_second=243.971M/s items_per_second=16.1424M/s
MatchLikeSuffix                        65367530 ns     64915545 ns           11 bytes_per_second=244.13M/s items_per_second=16.1529M/s
Utf8Lower                             123366854 ns    122773167 ns            6 bytes_per_second=129.082M/s items_per_second=8.54076M/s
Utf8Upper                             123336771 ns    122494500 ns            6 bytes_per_second=129.376M/s items_per_second=8.56019M/s
IsAlphaNumericUnicode                 430325105 ns    411080500 ns            2 bytes_per_second=38.5517M/s items_per_second=2.55078M/s
TrimSingleUtf8                        153566483 ns    152935200 ns            5 bytes_per_second=103.625M/s items_per_second=6.85634M/s
TrimManyUtf8                          282400486 ns    280322667 ns            3 bytes_per_second=56.5343M/s items_per_second=3.7406M/s
BinaryJoinArrayScalar                    523523 ns       519276 ns         1349 bytes_per_second=222.353M/s
BinaryJoinArrayArray                     611899 ns       605256 ns         1154 bytes_per_second=190.766M/s
BinaryJoinElementWiseArrayScalar/2      3893473 ns      3867033 ns          182 bytes_per_second=60.2124M/s
BinaryJoinElementWiseArrayScalar/8      8498821 ns      8422482 ns           83 bytes_per_second=110.899M/s
BinaryJoinElementWiseArrayScalar/64    36238366 ns     31760478 ns           23 bytes_per_second=235.528M/s
BinaryJoinElementWiseArrayScalar/128   50431283 ns     50079100 ns           10 bytes_per_second=298.626M/s
BinaryJoinElementWiseArrayArray/2       2631463 ns      2579765 ns          272 bytes_per_second=90.2576M/s
BinaryJoinElementWiseArrayArray/8       7075472 ns      7027343 ns           99 bytes_per_second=132.916M/s
BinaryJoinElementWiseArrayArray/64     30109264 ns     29964913 ns           23 bytes_per_second=249.641M/s
BinaryJoinElementWiseArrayArray/128    50799277 ns     49716357 ns           14 bytes_per_second=300.805M/s
BinaryRepeat                          118798889 ns    118317833 ns            6 bytes_per_second=133.943M/s items_per_second=8.86237M/s

@llama90
Copy link
Contributor Author

llama90 commented Nov 6, 2023

I am confused to deal with the Decimal type.

The scale is the same as before, but the precision is different before. In previous benchmarks, the precision range was set to randomly generate values between low and high, while in random.cc, the precision is fixed for all generated values within a specified value (also, negative values are included).

Using random.cc seems to necessitate specifying a certain precision value, which is confusing as to what would be appropriate.

What would be the appropriate approach in terms of benchmark?

@kou
Copy link
Member

kou commented Nov 8, 2023

Thanks for summarizing the behavior.

String: Are you using https://github.com/apache/arrow/pull/38526/files#diff-b440faf74bbde4937a0a476511319f0c1cc255fbf0fb7372277c5f465df7a970R222-R229 ? If so, a String() argument seems wrong:

diff --git a/cpp/src/gandiva/tests/micro_benchmarks.cc b/cpp/src/gandiva/tests/micro_benchmarks.cc
index 4bd4e8d51..88dd5a14c 100644
--- a/cpp/src/gandiva/tests/micro_benchmarks.cc
+++ b/cpp/src/gandiva/tests/micro_benchmarks.cc
@@ -318,7 +318,7 @@ static void TimedTestAllocs(benchmark::State& state) {
   for (int i = 0; i < NUM_BATCHES; i++) {
     for (int col = 0; col < num_fields; col++) {
       arrays[col * NUM_BATCHES + i] =
-          std::make_shared<ArrayPtr>(rag.String(num_batches, 0, 64, 0));
+          std::make_shared<ArrayPtr>(rag.String(num_batches, 64, 64, 0));
     }
   }
 
@@ -351,7 +351,7 @@ static void TimedTestOutputStringAllocs(benchmark::State& state) {
   for (int i = 0; i < NUM_BATCHES; i++) {
     for (int col = 0; col < num_fields; col++) {
       arrays[col * NUM_BATCHES + i] =
-          std::make_shared<ArrayPtr>(rag.String(num_batches, 0, 64, 0));
+          std::make_shared<ArrayPtr>(rag.String(num_batches, 64, 64, 0));
     }
   }
 

BTW, why did you compare AsciiLower/AsciiUpper performance with multiple inputs? They are Arrow's compute kernels not Gandiva's functions. We are working on Gandiva's benchmark not Arrow's compute kernels, right?

What is the important point in TimedTestAllocs/TimedTestOutputStringAllocs? It seems that it focus on memory allocation not upper performance because they have Allocs in their names. If memory allocation performance is the important point and it's not increased by the data change, we don't need to care about it.

Decimal: Could you explain more?
Here is a sample generated array with #38526:

diff --git a/cpp/src/gandiva/tests/micro_benchmarks.cc b/cpp/src/gandiva/tests/micro_benchmarks.cc
index 4bd4e8d51..02aed4c71 100644
--- a/cpp/src/gandiva/tests/micro_benchmarks.cc
+++ b/cpp/src/gandiva/tests/micro_benchmarks.cc
@@ -525,6 +526,7 @@ static void DoDecimalAdd2(benchmark::State& state, int32_t precision, int32_t sc
     for (int col = 0; col < num_fields; col++) {
       arrays[col * NUM_BATCHES + i] =
           std::make_shared<ArrayPtr>(rag.Decimal128(decimal_type, num_batches, 0, 64, 0));
+      std::cout << **arrays[col * NUM_BATCHES + i] << std::endl;
     }
   }
 
[
  30767549570.000000002566854084,
  -87923973068817.656751945692017159,
  -91928206713617.619506354929822755,
  60821288362815.346528371379023392,
  -64112645956997.970772781684908311,
  63074099127972.553382781455698009,
  -16941184082774.182190282352712117,
  -67409212744940.964364408478716538,
  -36369925434737.491950329130210850,
  37440631996063.195363909449835822,
  ...
  91737090187668.612368195042828097,
  -83035962690737.078487654169938906,
  -42192441527807.394459688661789590,
  41830067640897.065604367136779604,
  -65275487302948.826419045409244118,
  33681673668716.936339078316272999,
  -9118956089337.854439920962868218,
  -19542009636242.217679573976922312,
  -72636625294767.978416906406436654,
  -40247733243916.629646124384741677
]

It seems that all of these values have different precision. (The 000000002566854084 part in 30767549570.000000002566854084 is precision, right?)

@js8544
Copy link
Collaborator

js8544 commented Nov 8, 2023

In previous benchmarks, the precision range was set to randomly generate values between low and high

I think in both cases precision is fixed. In DoDecimalAdd2 it's given as a function argument, isn't it? The Decimal128DataGenerator only randomizes the physical content of the decimal (i.e. the int128). Their precision and scale are specified in the Decimal128Type.

@llama90
Copy link
Contributor Author

llama90 commented Nov 8, 2023

Thank you for your response.

For string

BTW, why did you compare AsciiLower/AsciiUpper performance with multiple inputs? They are Arrow's compute kernels not Gandiva's functions. We are working on Gandiva's benchmark not Arrow's compute kernels, right?

You are right. We are working on Ganvida's benchmark.

In summary, it appears that:

  • We should focus on Allocs rather than upper.
  • The mix of uppercase and lowercase characters in string generation is not significant.
  • Previously, strings of lowercase characters of varying lengths from 1 to 64 were generated, but for benchmark consistency, we now maintain a length of 64 when generating strings.
previous generated string data
[
    "bcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrs",
    "tuvwxyzabc",
    "defghijklm",
    "nopqrstuvwxyzabcdefghijklmnopqrstuvwxy",
    "zabcdefghijklmnopqrstuvwx",
    "yzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc",
    "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghij",
    "klmnopqrstu",
    "vwxyzabcdefghijklmnopqrstuvwxy",
    "z",
    ...
    "ijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrs",
    "tuvwxyzabcdefgh",
    "ijklmnopqrstuvwxyzabcdefghijklmnopqrs",
    "tuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopq",
    "rstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopq",
    "rstuvwxyzabcdefghijklmn",
    "opqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklm",
    "nopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr",
    "stuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrst",
    "uvwxyzabcdefghijklmnopqrstuvwxyzabc"
  ]

To briefly explain why we conducted the test, I've observed that the previous benchmarks selected characters within the a-z range to generate data. However, when using random.h, characters are selected from the A - z range, which led to questions when I saw differences in benchmark times.

# previous
- TimedTestAllocs/min_time:1.000                     140487 us       137767 us           10
- TimedTestOutputStringAllocs/min_time:1.000         228228 us       226211 us            6
# random.h
+ TimedTestAllocs/min_time:1.000                     243130 us       242760 us            6
+ TimedTestOutputStringAllocs/min_time:1.000         332357 us       331799 us            4

Upon further inspection, particularly of the arrow-compute-scalar-string-benchmark, I noticed that AsciiLower and AsciiUpper show different performances depending on the range of characters in the string, which is why I mentioned the upper aspect.

My understanding of Gandiva is limited, but I believe that mixing uppercase and lowercase could have a similar impact.

For decimal

In this case, I think you should check the precision and scale of the Decimal Type first. Was I searching for the wrong thing? I was understanding scale and precision in reverse.

And for the existing decimal generation code, it was implemented like this

class Decimal128DataGenerator : public DataGenerator<arrow::Decimal128> {
public:
explicit Decimal128DataGenerator(bool large) : large_(large) {}
arrow::Decimal128 GenerateData() {
uint64_t low = random_.next();
int64_t high = random_.next();
if (large_) {
high += (1ull << 62);
}
return arrow::Decimal128(high, low);
}
protected:
bool large_;
Random random_;
};

Anyway, I'm having a hard time figuring out how to fit this in because for random.h, the precision and scale values are the same as before, but the random values generated are different.

previous generated DecimalAdd2LeadingZeroes data
[
    16351795034091788378275.759672,
    27854373977842001526319.734684,
    6061446293668872071695.356134,
    31215470323596391714672.813452,
    15533332405516061115471.066480,
    15401288582294600038721.016490,
    30274851179048759105303.625351,
    6564021121342245976114.232879,
    36171304665527520454124.155732,
    3546578283628563437121.241384,
    ...
    14860910085674466129132.927835,
    6131533904566936343671.748478,
    36528723632058120161272.517328,
    3964543116664404486145.446787,
    20606324915198138790682.388805,
    20631004666520402209109.905293,
    29878891319444493588791.347655,
    13823654056734867583462.044256,
    30786726283235031078880.608496,
    620129083913073530975.895978
  ]
`random.h` generated DecimalAdd2LeadingZeroes data
[
  -10038291425340000002078631.818296,
  20434255526017535581463554076021.370910,
  45129537519062609815007557527197.169527,
  -62154291670684288853156612735793.477627,
  87103160282621257590199953623276.740180,
  79969063163511904595223170720469.422729,
  -86583636649126666385105002765466.142621,
  92352037140996988957666339677514.598197,
  25931328664987260071252581126967.305358,
  91215577756571866841817287196691.182509,
  ...
  52281930080182271842503590963377.447177,
  4372942953952971927671269281255.305966,
  96763102312104951696190306530776.653371,
  -18467695819003138567360094105422.436093,
  -9792209172207529275819289504154.021690,
  61852766762170189161854285754108.616433,
  25571897876364008995405425598827.463597,
  64701056156004809966065180476482.250553,
  -63257829389611554338056688929518.067776,
  67455720423980831654534847333775.750429
]
previous generated DecimalAdd2LeadingZeroesWithDiv data
[
    7587380203.400195060290629070,
    26540763051.629767310192345708,
    7223529576.634111708006082596,
    6365242807.233231030439853103,
    30019804134.302409408324050740,
    25340645699.884287035225737289,
    10696019445.722928448197246826,
    949466737.960961850748143665,
    10100302893.597466582610415885,
    3997196768.260434038917129614,
    ...
    32227537378.843572088950643055,
    8537777136.877817787746240968,
    30551609561.954519025705399073,
    26680430508.348522923995981443,
    33569096800.981186984459717330,
    3351111436.781940463508363016,
    24558127179.325481340697631536,
    22734710467.783675740496138638,
    20732616951.383696862539113282,
    23868794600.064308290332457814
  ]
`random.h` generated DecimalAdd2LeadingZeroes data
[
  14208508617760.000001420629929750,
  34494975892204083434.054431421823503934,
  -59808716875461356024.747039105565396418,
  -39719453848714764419.682550443910720831,
  -87317155083825605821.703637655099976619,
  -65044845346732651787.465211119366632301,
  -88009445715660277207.144934805781121915,
  95937137753543307790.043195390657972343,
  20590051896038797936.563280809743603093,
  49223527283939584300.218543938584200541,
  ...
  19342162338087482443.922956656008521477,
  -48849509529657253769.022422810864683008,
  -22694641636271599896.031597585758119284,
  -40556169347524762115.197985730587632297,
  12188092371682991218.322898910856250055,
  85478353248271990406.317595833539039857,
  75882328331451585423.885909771843576864,
  53687092121591423454.148705140146222481,
  -44309181005291397260.084537547533448259,
  -95059575541611513750.967476857639700305
]

@js8544
Copy link
Collaborator

js8544 commented Nov 8, 2023

Oh I see your point now. Their precisions are indeed different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants