Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable reading StringViewArray by default from Parquet #12092

Closed
wants to merge 3 commits into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Aug 20, 2024

Draft as it builds on:

Which issue does this PR close?

Closes #11682

Rationale for this change

Reading data as StringViewArray is significantly faster than StringArray. We have been testing this behind a feature flag but it is now stable enough to enable by default.

See blog post #11603:

Benchmark Results

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_enable_string_view_by_def… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.66ms │                           0.67ms │     no change │
│ QQuery 1     │    72.07ms │                          70.10ms │     no change │
│ QQuery 2     │   119.92ms │                         124.84ms │     no change │
│ QQuery 3     │   128.29ms │                         134.03ms │     no change │
│ QQuery 4     │   964.03ms │                         974.14ms │     no change │
│ QQuery 5     │  1088.23ms │                        1060.44ms │     no change │
│ QQuery 6     │    66.53ms │                          64.39ms │     no change │
│ QQuery 7     │    83.84ms │                          80.44ms │     no change │
│ QQuery 8     │  1484.01ms │                        1466.81ms │     no change │
│ QQuery 9     │  1366.17ms │                        1378.33ms │     no change │
│ QQuery 10    │   465.55ms │                         362.07ms │ +1.29x faster │
│ QQuery 11    │   514.19ms │                         398.94ms │ +1.29x faster │
│ QQuery 12    │  1195.47ms │                        1098.46ms │ +1.09x faster │
│ QQuery 13    │  2183.91ms │                        2083.70ms │     no change │
│ QQuery 14    │  1641.36ms │                        1551.03ms │ +1.06x faster │
│ QQuery 15    │  1126.56ms │                        1132.80ms │     no change │
│ QQuery 16    │  3050.64ms │                        3088.31ms │     no change │
│ QQuery 17    │  2746.65ms │                        2791.27ms │     no change │
│ QQuery 18    │  5845.46ms │                        6064.71ms │     no change │
│ QQuery 19    │   121.17ms │                         123.79ms │     no change │
│ QQuery 20    │  1628.64ms │                        1387.74ms │ +1.17x faster │
│ QQuery 21    │  2048.94ms │                        1738.74ms │ +1.18x faster │
│ QQuery 22    │  4994.39ms │                        4252.83ms │ +1.17x faster │
│ QQuery 23    │ 11893.20ms │                        9996.52ms │ +1.19x faster │
│ QQuery 24    │   775.43ms │                         669.76ms │ +1.16x faster │
│ QQuery 25    │   684.66ms │                         601.04ms │ +1.14x faster │
│ QQuery 26    │   873.53ms │                         738.14ms │ +1.18x faster │
│ QQuery 27    │  2576.02ms │                        2098.18ms │ +1.23x faster │
│ QQuery 28    │ 15637.06ms │                       14166.05ms │ +1.10x faster │
│ QQuery 29    │   565.58ms │                         551.64ms │     no change │
│ QQuery 30    │  1305.23ms │                        1209.25ms │ +1.08x faster │
│ QQuery 31    │  1389.60ms │                        1265.74ms │ +1.10x faster │
│ QQuery 32    │  4741.08ms │                        4853.34ms │     no change │
│ QQuery 33    │  5306.91ms │                        4193.22ms │ +1.27x faster │
│ QQuery 34    │  5167.11ms │                        4222.33ms │ +1.22x faster │
│ QQuery 35    │  1863.80ms │                        1859.61ms │     no change │
│ QQuery 36    │   317.80ms │                         291.40ms │ +1.09x faster │
│ QQuery 37    │   216.31ms │                         184.00ms │ +1.18x faster │
│ QQuery 38    │   200.83ms │                         182.21ms │ +1.10x faster │
│ QQuery 39    │  1038.73ms │                         851.30ms │ +1.22x faster │
│ QQuery 40    │    86.40ms │                          88.35ms │     no change │
│ QQuery 41    │    80.99ms │                          78.34ms │     no change │
│ QQuery 42    │    93.00ms │                          97.02ms │     no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)                             │ 87749.93ms │
│ Total Time (alamb_enable_string_view_by_default)   │ 79626.03ms │
│ Average Time (main_base)                           │  2040.70ms │
│ Average Time (alamb_enable_string_view_by_default) │  1851.77ms │
│ Queries Faster                                     │         21 │
│ Queries Slower                                     │          0 │
│ Queries with No Change                             │         22 │
└────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_enable_string_view_by_def… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.37ms │                           1.97ms │ +1.21x faster │
│ QQuery 1     │    38.68ms │                          36.78ms │     no change │
│ QQuery 2     │    95.07ms │                          94.11ms │     no change │
│ QQuery 3     │    98.75ms │                         100.86ms │     no change │
│ QQuery 4     │   927.07ms │                         931.67ms │     no change │
│ QQuery 5     │   964.71ms │                        1043.29ms │  1.08x slower │
│ QQuery 6     │    36.40ms │                          33.21ms │ +1.10x faster │
│ QQuery 7     │    42.34ms │                          40.83ms │     no change │
│ QQuery 8     │  1440.57ms │                        1434.71ms │     no change │
│ QQuery 9     │  1343.13ms │                        1345.13ms │     no change │
│ QQuery 10    │   357.26ms │                         432.73ms │  1.21x slower │
│ QQuery 11    │   404.40ms │                         478.21ms │  1.18x slower │
│ QQuery 12    │  1094.88ms │                        1057.96ms │     no change │
│ QQuery 13    │  1829.88ms │                        2013.06ms │  1.10x slower │
│ QQuery 14    │  1519.03ms │                        1516.29ms │     no change │
│ QQuery 15    │  1085.44ms │                        1068.30ms │     no change │
│ QQuery 16    │  2905.66ms │                        3086.20ms │  1.06x slower │
│ QQuery 17    │  2744.69ms │                        2810.55ms │     no change │
│ QQuery 18    │  5845.13ms │                        6043.80ms │     no change │
│ QQuery 19    │    95.15ms │                          89.50ms │ +1.06x faster │
│ QQuery 20    │  1751.84ms │                        2165.42ms │  1.24x slower │
│ QQuery 21    │  2019.88ms │                        2379.20ms │  1.18x slower │
│ QQuery 22    │  5167.42ms │                        5642.45ms │  1.09x slower │
│ QQuery 23    │ 10400.41ms │                       14124.07ms │  1.36x slower │
│ QQuery 24    │   584.30ms │                         656.67ms │  1.12x slower │
│ QQuery 25    │   498.02ms │                         519.50ms │     no change │
│ QQuery 26    │   651.84ms │                         719.41ms │  1.10x slower │
│ QQuery 27    │  2525.75ms │                        2694.77ms │  1.07x slower │
│ QQuery 28    │ 14905.83ms │                       26662.02ms │  1.79x slower │
│ QQuery 29    │   529.85ms │                         524.36ms │     no change │
│ QQuery 30    │  1086.70ms │                        1212.89ms │  1.12x slower │
│ QQuery 31    │  1163.36ms │                        1291.25ms │  1.11x slower │
│ QQuery 32    │  4764.64ms │                        4782.92ms │     no change │
│ QQuery 33    │  5149.48ms │                        4480.69ms │ +1.15x faster │
│ QQuery 34    │  5162.88ms │                        4481.70ms │ +1.15x faster │
│ QQuery 35    │  1811.72ms │                        1812.12ms │     no change │
│ QQuery 36    │   282.31ms │                         268.12ms │ +1.05x faster │
│ QQuery 37    │   121.31ms │                         104.96ms │ +1.16x faster │
│ QQuery 38    │   147.57ms │                         176.74ms │  1.20x slower │
│ QQuery 39    │   975.45ms │                         848.76ms │ +1.15x faster │
│ QQuery 40    │    61.35ms │                          57.33ms │ +1.07x faster │
│ QQuery 41    │    50.15ms │                          47.45ms │ +1.06x faster │
│ QQuery 42    │    63.67ms │                          62.35ms │     no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)                             │ 82746.33ms │
│ Total Time (alamb_enable_string_view_by_default)   │ 99374.32ms │
│ Average Time (main_base)                           │  1924.33ms │
│ Average Time (alamb_enable_string_view_by_default) │  2311.03ms │
│ Queries Faster                                     │         10 │
│ Queries Slower                                     │         16 │
│ Queries with No Change                             │         17 │
└────────────────────────────────────────────────────┴────────────┘

What changes are included in this PR?

  1. Set schema_force_view_types to true

Are these changes tested?

Yes, by CI tests

Are there any user-facing changes?

  1. Faster reading of data from Parquet files

If you see an error related to StringView use, you can disable this feature using the schema_force_string_view option

> set datafusion.execution.parquet.schema_force_view_types = false;
0 row(s) fetched.
Elapsed 0.000 seconds.

Context

@XiangpengHao debugged these tests previously using #11862

@alamb alamb changed the title Enable reading string view by default from Parquet Enable reading StringViewArray by default from Parquet Aug 20, 2024
@github-actions github-actions bot added sql SQL Planner physical-expr Physical Expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) substrait common Related to common crate proto Related to proto crate functions labels Aug 20, 2024
comphead

This comment was marked as outdated.

@alamb

This comment was marked as outdated.

@alamb alamb force-pushed the alamb/enable_string_view_by_default branch 2 times, most recently from bc5d7f7 to 27f7d1e Compare August 22, 2024 19:43
@alamb

This comment was marked as outdated.

@alamb alamb force-pushed the alamb/enable_string_view_by_default branch from 27f7d1e to ce5470d Compare September 12, 2024 19:44
@github-actions github-actions bot added core Core DataFusion crate and removed sql SQL Planner physical-expr Physical Expressions core Core DataFusion crate proto Related to proto crate functions labels Sep 12, 2024
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Sep 12, 2024
@alamb

This comment was marked as outdated.

@alamb alamb force-pushed the alamb/enable_string_view_by_default branch from 2275216 to e8f7384 Compare September 13, 2024 14:17
@alamb

This comment was marked as outdated.

@alamb

This comment was marked as outdated.

@alamb
Copy link
Contributor Author

alamb commented Oct 5, 2024

Benchmarks results

clickbench_1:

Is slower due to #12771

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_enable_string_view_by_def… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.66ms │                           0.69ms │     no change │
│ QQuery 1     │    71.33ms │                          75.13ms │  1.05x slower │
│ QQuery 2     │   131.03ms │                         127.11ms │     no change │
│ QQuery 3     │   126.47ms │                         130.56ms │     no change │
│ QQuery 4     │   980.91ms │                         979.41ms │     no change │
│ QQuery 5     │  1039.82ms │                        1074.08ms │     no change │
│ QQuery 6     │    63.36ms │                          65.10ms │     no change │
│ QQuery 7     │    81.92ms │                          81.78ms │     no change │
│ QQuery 8     │  1349.24ms │                        1307.04ms │     no change │
│ QQuery 9     │  1385.82ms │                        1389.54ms │     no change │
│ QQuery 10    │   451.48ms │                         343.72ms │ +1.31x faster │
│ QQuery 11    │   490.86ms │                         399.73ms │ +1.23x faster │
│ QQuery 12    │  1185.68ms │                        1105.26ms │ +1.07x faster │
│ QQuery 13    │  1807.96ms │                        2124.67ms │  1.18x slower │
│ QQuery 14    │  1316.22ms │                        1544.26ms │  1.17x slower │
│ QQuery 15    │  1134.52ms │                        1127.19ms │     no change │
│ QQuery 16    │  2585.31ms │                        3063.20ms │  1.18x slower │
│ QQuery 17    │  2377.87ms │                        2806.52ms │  1.18x slower │
│ QQuery 18    │  5039.50ms │                        5931.83ms │  1.18x slower │
│ QQuery 19    │   123.28ms │                         118.74ms │     no change │
│ QQuery 20    │  1620.67ms │                        1387.02ms │ +1.17x faster │
│ QQuery 21    │  2066.28ms │                        1716.94ms │ +1.20x faster │
│ QQuery 22    │  4534.60ms │                        4257.65ms │ +1.07x faster │
│ QQuery 23    │ 11757.43ms │                       10161.49ms │ +1.16x faster │
│ QQuery 24    │   751.56ms │                         677.31ms │ +1.11x faster │
│ QQuery 25    │   661.76ms │                         583.94ms │ +1.13x faster │
│ QQuery 26    │   827.91ms │                         778.19ms │ +1.06x faster │
│ QQuery 27    │  2542.23ms │                        2090.62ms │ +1.22x faster │
│ QQuery 28    │ 15396.30ms │                       14334.92ms │ +1.07x faster │
│ QQuery 29    │   570.84ms │                         572.35ms │     no change │
│ QQuery 30    │  1188.89ms │                        1142.40ms │     no change │
│ QQuery 31    │  1243.82ms │                        1173.04ms │ +1.06x faster │
│ QQuery 32    │  4336.24ms │                        4174.83ms │     no change │
│ QQuery 33    │  5266.84ms │                        4255.56ms │ +1.24x faster │
│ QQuery 34    │  5350.35ms │                        4244.46ms │ +1.26x faster │
│ QQuery 35    │  1781.95ms │                        1788.24ms │     no change │
│ QQuery 36    │   319.41ms │                         284.84ms │ +1.12x faster │
│ QQuery 37    │   207.99ms │                         183.65ms │ +1.13x faster │
│ QQuery 38    │   194.13ms │                         184.99ms │     no change │
│ QQuery 39    │   768.26ms │                         816.16ms │  1.06x slower │
│ QQuery 40    │    84.86ms │                          85.84ms │     no change │
│ QQuery 41    │    84.45ms │                          79.66ms │ +1.06x faster │
│ QQuery 42    │    96.44ms │                          97.44ms │     no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)                             │ 83396.44ms │
│ Total Time (alamb_enable_string_view_by_default)   │ 78867.11ms │
│ Average Time (main_base)                           │  1939.45ms │
│ Average Time (alamb_enable_string_view_by_default) │  1834.12ms │
│ Queries Faster                                     │         18 │
│ Queries Slower                                     │          7 │
│ Queries with No Change                             │         18 │
└────────────────────────────────────────────────────┴────────────┘
--------------------

@alamb
Copy link
Contributor Author

alamb commented Oct 5, 2024

clickbench_extended

Looking good here

--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_enable_string_view_by_def… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2741.98ms │                        2616.63ms │     no change │
│ QQuery 1     │   759.57ms │                         681.31ms │ +1.11x faster │
│ QQuery 2     │  1625.61ms │                        1398.16ms │ +1.16x faster │
│ QQuery 3     │   680.35ms │                         683.42ms │     no change │
│ QQuery 4     │ 12760.38ms │                       12498.84ms │     no change │
│ QQuery 5     │ 19106.95ms │                       19157.16ms │     no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘

@alamb
Copy link
Contributor Author

alamb commented Oct 5, 2024

clickbench_partitioned

Slowing down also due to #12509 / #12788

--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_enable_string_view_by_def… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.20ms │                           2.23ms │     no change │
│ QQuery 1     │    40.39ms │                          40.67ms │     no change │
│ QQuery 2     │    96.55ms │                          98.55ms │     no change │
│ QQuery 3     │    99.47ms │                          99.17ms │     no change │
│ QQuery 4     │   924.46ms │                         922.44ms │     no change │
│ QQuery 5     │   950.04ms │                         968.54ms │     no change │
│ QQuery 6     │    34.09ms │                          33.63ms │     no change │
│ QQuery 7     │    43.25ms │                          43.24ms │     no change │
│ QQuery 8     │  1366.70ms │                        1350.65ms │     no change │
│ QQuery 9     │  1328.80ms │                        1344.05ms │     no change │
│ QQuery 10    │   348.25ms │                         314.41ms │ +1.11x faster │
│ QQuery 11    │   395.56ms │                         360.36ms │ +1.10x faster │
│ QQuery 12    │  1096.67ms │                         968.45ms │ +1.13x faster │
│ QQuery 13    │  1749.70ms │                        1747.58ms │     no change │
│ QQuery 14    │  1209.17ms │                        1406.79ms │  1.16x slower │
│ QQuery 15    │  1085.42ms │                        1077.92ms │     no change │
│ QQuery 16    │  2491.88ms │                        2904.94ms │  1.17x slower │
│ QQuery 17    │  2297.53ms │                        2705.51ms │  1.18x slower │
│ QQuery 18    │  5009.23ms │                        5787.71ms │  1.16x slower │
│ QQuery 19    │    94.74ms │                          90.64ms │     no change │
│ QQuery 20    │  1697.79ms │                        1821.09ms │  1.07x slower │
│ QQuery 21    │  1997.89ms │                        1907.50ms │     no change │
│ QQuery 22    │  4709.43ms │                        4818.95ms │     no change │
│ QQuery 23    │ 10513.97ms │                       10145.30ms │     no change │
│ QQuery 24    │   583.88ms │                         486.20ms │ +1.20x faster │
│ QQuery 25    │   488.55ms │                         408.51ms │ +1.20x faster │
│ QQuery 26    │   657.96ms │                         566.02ms │ +1.16x faster │
│ QQuery 27    │  2529.00ms │                        2311.10ms │ +1.09x faster │
│ QQuery 28    │ 14698.76ms │                       26994.08ms │  1.84x slower │
│ QQuery 29    │   521.52ms │                         544.05ms │     no change │
│ QQuery 30    │  1043.00ms │                        1014.79ms │     no change │
│ QQuery 31    │  1111.27ms │                        1057.91ms │     no change │
│ QQuery 32    │  4308.74ms │                        4254.27ms │     no change │
│ QQuery 33    │  5276.47ms │                        4134.58ms │ +1.28x faster │
│ QQuery 34    │  5342.06ms │                        4150.13ms │ +1.29x faster │
│ QQuery 35    │  1905.15ms │                        1857.24ms │     no change │
│ QQuery 36    │   276.11ms │                         238.65ms │ +1.16x faster │
│ QQuery 37    │   120.85ms │                          87.17ms │ +1.39x faster │
│ QQuery 38    │   144.61ms │                         152.75ms │  1.06x slower │
│ QQuery 39    │   739.13ms │                         759.31ms │     no change │
│ QQuery 40    │    56.56ms │                          55.03ms │     no change │
│ QQuery 41    │    48.43ms │                          50.81ms │     no change │
│ QQuery 42    │    63.36ms │                          63.27ms │     no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘

@alamb
Copy link
Contributor Author

alamb commented Oct 7, 2024

Current status: #11682 (comment)

@alamb alamb force-pushed the alamb/enable_string_view_by_default branch from 6500361 to 5b67afe Compare October 9, 2024 18:24
@github-actions github-actions bot removed physical-expr Physical Expressions functions labels Oct 9, 2024
@alamb
Copy link
Contributor Author

alamb commented Oct 13, 2024

My plan is to pull the changes from the following PRs into this PR and rerun the overall perf test

Hopefully that will show the performance we need. Then we just need to get the various PRs actually merged and we can do this 😅

@alamb
Copy link
Contributor Author

alamb commented Oct 14, 2024

Here are the current performance results: 🚀 Basically 10% faster across all queries and no slow downs

👏 @Rachelint @jayzhan211 @XiangpengHao and many many others.

Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_enable_string_view_by_def… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.70ms │                           0.67ms │     no change │
│ QQuery 1     │    75.35ms │                          69.82ms │ +1.08x faster │
│ QQuery 2     │   123.15ms │                         128.37ms │     no change │
│ QQuery 3     │   129.02ms │                         128.14ms │     no change │
│ QQuery 4     │   947.97ms │                         967.71ms │     no change │
│ QQuery 5     │  1086.19ms │                        1060.35ms │     no change │
│ QQuery 6     │    65.49ms │                          66.43ms │     no change │
│ QQuery 7     │    80.72ms │                          84.38ms │     no change │
│ QQuery 8     │  1330.08ms │                        1332.41ms │     no change │
│ QQuery 9     │  1345.21ms │                        1344.89ms │     no change │
│ QQuery 10    │   443.35ms │                         335.50ms │ +1.32x faster │
│ QQuery 11    │   493.80ms │                         372.10ms │ +1.33x faster │
│ QQuery 12    │  1205.69ms │                        1101.01ms │ +1.10x faster │
│ QQuery 13    │  1921.43ms │                        1617.54ms │ +1.19x faster │
│ QQuery 14    │  1355.42ms │                        1146.40ms │ +1.18x faster │
│ QQuery 15    │  1106.21ms │                        1129.07ms │     no change │
│ QQuery 16    │  2546.11ms │                        2434.61ms │     no change │
│ QQuery 17    │  2358.85ms │                        2269.62ms │     no change │
│ QQuery 18    │  4944.14ms │                        5101.45ms │     no change │
│ QQuery 19    │   121.48ms │                         124.59ms │     no change │
│ QQuery 20    │  1637.28ms │                        1377.14ms │ +1.19x faster │
│ QQuery 21    │  2106.73ms │                        1716.14ms │ +1.23x faster │
│ QQuery 22    │  5050.72ms │                        4231.71ms │ +1.19x faster │
│ QQuery 23    │ 12149.30ms │                        9998.89ms │ +1.22x faster │
│ QQuery 24    │   793.52ms │                         683.81ms │ +1.16x faster │
│ QQuery 25    │   700.98ms │                         587.52ms │ +1.19x faster │
│ QQuery 26    │   860.57ms │                         725.23ms │ +1.19x faster │
│ QQuery 27    │  2540.34ms │                        2104.51ms │ +1.21x faster │
│ QQuery 28    │ 14015.68ms │                       14587.90ms │     no change │
│ QQuery 29    │   571.20ms │                         554.32ms │     no change │
│ QQuery 30    │  1239.94ms │                        1118.25ms │ +1.11x faster │
│ QQuery 31    │  1276.58ms │                        1189.58ms │ +1.07x faster │
│ QQuery 32    │  4230.09ms │                        4204.57ms │     no change │
│ QQuery 33    │  5223.33ms │                        4283.46ms │ +1.22x faster │
│ QQuery 34    │  5328.12ms │                        4264.81ms │ +1.25x faster │
│ QQuery 35    │  1754.78ms │                        1797.85ms │     no change │
│ QQuery 36    │   306.94ms │                         287.75ms │ +1.07x faster │
│ QQuery 37    │   220.70ms │                         182.40ms │ +1.21x faster │
│ QQuery 38    │   192.66ms │                         188.30ms │     no change │
│ QQuery 39    │   790.63ms │                         551.89ms │ +1.43x faster │
│ QQuery 40    │    86.56ms │                          84.86ms │     no change │
│ QQuery 41    │    83.08ms │                          77.75ms │ +1.07x faster │
│ QQuery 42    │    93.76ms │                          94.03ms │     no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)                             │ 82933.83ms │
│ Total Time (alamb_enable_string_view_by_default)   │ 75707.73ms │
│ Average Time (main_base)                           │  1928.69ms │
│ Average Time (alamb_enable_string_view_by_default) │  1760.64ms │
│ Queries Faster                                     │         22 │
│ Queries Slower                                     │          0 │
│ Queries with No Change                             │         21 │
└────────────────────────────────────────────────────┴────────────┘

Also for partitioned:

--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_enable_string_view_by_def… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.24ms │                           2.22ms │     no change │
│ QQuery 1     │    37.96ms │                          38.15ms │     no change │
│ QQuery 2     │    95.79ms │                          94.43ms │     no change │
│ QQuery 3     │   102.41ms │                         100.80ms │     no change │
│ QQuery 4     │   927.15ms │                         930.56ms │     no change │
│ QQuery 5     │   973.96ms │                         951.87ms │     no change │
│ QQuery 6     │    34.33ms │                          33.58ms │     no change │
│ QQuery 7     │    42.82ms │                          41.29ms │     no change │
│ QQuery 8     │  1364.76ms │                        1344.57ms │     no change │
│ QQuery 9     │  1309.61ms │                        1365.89ms │     no change │
│ QQuery 10    │   346.30ms │                         307.70ms │ +1.13x faster │
│ QQuery 11    │   394.60ms │                         347.22ms │ +1.14x faster │
│ QQuery 12    │  1097.07ms │                        1007.94ms │ +1.09x faster │
│ QQuery 13    │  1636.51ms │                        1602.93ms │     no change │
│ QQuery 14    │  1197.28ms │                        1069.49ms │ +1.12x faster │
│ QQuery 15    │  1068.82ms │                        1077.73ms │     no change │
│ QQuery 16    │  2485.04ms │                        2398.93ms │     no change │
│ QQuery 17    │  2275.37ms │                        2212.18ms │     no change │
│ QQuery 18    │  4952.47ms │                        5114.44ms │     no change │
│ QQuery 19    │    96.01ms │                          92.62ms │     no change │
│ QQuery 20    │  1719.75ms │                        1254.03ms │ +1.37x faster │
│ QQuery 21    │  2001.12ms │                        1481.53ms │ +1.35x faster │
│ QQuery 22    │  5169.50ms │                        2649.07ms │ +1.95x faster │
│ QQuery 23    │ 10586.32ms │                        8619.20ms │ +1.23x faster │
│ QQuery 24    │   594.74ms │                         517.28ms │ +1.15x faster │
│ QQuery 25    │   495.22ms │                         427.98ms │ +1.16x faster │
│ QQuery 26    │   660.64ms │                         565.71ms │ +1.17x faster │
│ QQuery 27    │  2557.47ms │                        1846.03ms │ +1.39x faster │
│ QQuery 28    │ 13399.41ms │                       13583.18ms │     no change │
│ QQuery 29    │   535.81ms │                         524.10ms │     no change │
│ QQuery 30    │  1034.30ms │                         964.20ms │ +1.07x faster │
│ QQuery 31    │  1101.14ms │                        1035.81ms │ +1.06x faster │
│ QQuery 32    │  4202.55ms │                        4049.53ms │     no change │
│ QQuery 33    │  5136.86ms │                        3916.87ms │ +1.31x faster │
│ QQuery 34    │  5104.18ms │                        3932.13ms │ +1.30x faster │
│ QQuery 35    │  1909.94ms │                        1797.84ms │ +1.06x faster │
│ QQuery 36    │   264.07ms │                         225.04ms │ +1.17x faster │
│ QQuery 37    │   121.76ms │                          92.57ms │ +1.32x faster │
│ QQuery 38    │   147.92ms │                         131.16ms │ +1.13x faster │
│ QQuery 39    │   749.30ms │                         477.60ms │ +1.57x faster │
│ QQuery 40    │    56.82ms │                          55.17ms │     no change │
│ QQuery 41    │    47.54ms │                          45.12ms │ +1.05x faster │
│ QQuery 42    │    63.14ms │                          64.81ms │     no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)                             │ 78099.98ms │
│ Total Time (alamb_enable_string_view_by_default)   │ 68390.51ms │
│ Average Time (main_base)                           │  1816.28ms │
│ Average Time (alamb_enable_string_view_by_default) │  1590.48ms │
│ Queries Faster                                     │         22 │
│ Queries Slower                                     │          0 │
│ Queries with No Change                             │         21 │
└────────────────────────────────────────────────────┴────────────┘

@alamb
Copy link
Contributor Author

alamb commented Oct 14, 2024

Next steps are to get the various PRs merged, get this one ready, and then write all about it.

@alamb
Copy link
Contributor Author

alamb commented Oct 24, 2024

I made a new PR as this one has lots of now irrelevant historical context

New PR here: #13101

@alamb alamb closed this Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate core Core DataFusion crate documentation Improvements or additions to documentation proto Related to proto crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable reading StringView by default from Parquet (schema_force_string_view) by default
4 participants