Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new 'unsigned_long' numeric field type support #6237

Merged
merged 3 commits into from
May 4, 2023

Conversation

reta
Copy link
Collaborator

@reta reta commented Feb 8, 2023

Signed-off-by: Andriy Redko andriy.redko@aiven.io

Description

Introduce new 'unsigned_long' numeric field type support: unsigned 64-bit integer in range of [0, 2^64-1].

Limitations: the unsigned_long could not be used as index sort fields (sort.field index setting). The limitation also applies to the scenario when search is done over multiple indices and sorted by the field that has unsigned_long type in (at least) one of the indices but different numeric type(s) in others. The error will be returned in such cases.

  • unsigned_long ( if store is set to true, it is stored and returned as String).

    {
        "_index": "users",
        "_id": "ByXWN4YBvI_qqxSVoIuE",
        "_score": 1.0,
        "_source": {
            "user_name": "tom",
            "age": 35,
            "counter": 0
        }
    },
    {
        "_index": "users",
        "_id": "CCXWN4YBvI_qqxSVoIvR",
        "_score": 1.0,
        "_source": {
            "user_name": "bob",
            "age": 35,
            "counter": 18446744073709551615
        }
    },
    {
        "_index": "users",
        "_id": "CSXWN4YBvI_qqxSVoIvm",
        "_score": 1.0,
        "_source": {
            "user_name": "john",
            "age": 35,
            "counter": 10223372036854775807
        }
    }
    
  • support term aggregations

    • terms
    "buckets": [
        {
            "key": 0,
            "doc_count": 1
        },
        {
            "key": 10223372036854775807,
            "doc_count": 1
        },
        {
            "key": 18446744073709551615,
            "doc_count": 1
        }
    ]
    
    • multi terms
     "buckets": [
          {
              "key": [
                  0,
                  "tom"
              ],
              "key_as_string": "0|tom",
              "doc_count": 1
          },
          {
              "key": [
                  10223372036854776000,
                  "john"
              ],
              "key_as_string": "10223372036854775807|john",
              "doc_count": 1
          },
          {
              "key": [
                  18446744073709552000,
                  "bob"
              ],
              "key_as_string": "18446744073709551615|bob",
              "doc_count": 1
          }
      ]
    
  • support sorting

      "hits": [
        {
            "_id": "coq7VoYBBOvFaCjK1Yl6",
            "_source": {
                "counter": 184
            },
            "sort": [
                184
            ]
        },
        {
            "_id": "cYq7VoYBBOvFaCjK1Ylm",
            "_source": {
                "counter": 10223372036854775807
            },
            "sort": [
                10223372036854775807
            ]
        },
        {
            "_id": "cIq7VoYBBOvFaCjK1YlP",
            "_source": {
                "counter": 18446744073709551615
            },
            "sort": [
                18446744073709551615
            ]
        },
        {
            "_id": "b4q7VoYBBOvFaCjK1YkQ",
            "_source": {
            },
            "sort": [
                18446744073709551615
            ]
        }
    ]
    
  • range queries

    {
      "query": {
          "range": {
          "counter": {
            "gte": 11223372036854775807,
            "boost": 2.0
          }
        }
      }
    }
    
    {
        "hits": {
            "total": {
                "value": 1,
                "relation": "eq"
            },
            "max_score": 2.0,
            "hits": [
                {
                    "_index": "users",
                    "_id": "nWAM24YBFEH-9O0Ssk_r",
                    "_score": 2.0,
                    "_source": {
                        "user_name": "bob",
                        "age": 35,
                        "counter": 18446744073709551615
                    }
                }
            ]
        }
    }
    
  • support aggregations

    {
      "query": {
          "match_all": {}
      },
      "aggs": {        
        "counters": {
          "sum": 
              { "field": "counter"  }
         }
       }
     }
    
     {
      "aggregations": {
              "counters": {
                  "value": 2.8670116110564327E19
              }
          }
      }
    
  • access from scripts

    {
        "query": {
          "bool": {
            "filter": {
              "script": {
                "script": "BigInteger amount = doc['counter'].size() > 0 ? doc['counter'].value : BigInteger.ZERO; 
                                 return amount.compareTo(BigInteger.ZERO) > 0;"
              }
            }
          }
        }
      }
    
    
     {
     ... ,
     { 
     "hits": {
          "total": {
              "value": 3,
              "relation": "eq"
              },
              "max_score": 0.0,
              "hits": [
                  {
                      "_index": "users",
                      "_id": "nWAM24YBFEH-9O0Ssk_r",
                      "_score": 0.0,
                      "_source": {
                          "user_name": "bob",
                          "age": 35,
                          "counter": 18446744073709551615
                      }
                  },
                  {
                      "_index": "users",
                      "_id": "nGAM24YBFEH-9O0Ssk-2",
                      "_score": 0.0,
                      "_source": {
                          "user_name": "john",
                          "age": 35,
                          "counter": 10223372036854775807
                      }
                  },
                  {
                      "_index": "users",
                      "_id": "m2AM24YBFEH-9O0Ssk9Q",
                      "_score": 0.0,
                      "_source": {
                          "user_name": "dilan",
                          "age": 35,
                          "counter": 184
                      }
                  }
              ]
          }
      }
    
  • more tests

  • benchmarks

    The benchmarks have been conducted using opensearch-benchmarking tool against geo_names dataset which does search and aggregation over numeric field population (the baseline has long type, the contended has unsigned_long). As expected, the queries involving sort are slow (since right now unsigned_long is not wrapped into sorted numeric field).

|                                                        Metric |                           Task |    Baseline |   Contender |     Diff |    Unit |
|--------------------------------------------------------------:|-------------------------------:|------------:|------------:|---------:|--------:|

|                                                Min Throughput |           country_agg_uncached |     2.99758 |      2.9995 |  0.00192 |   ops/s |
|                                               Mean Throughput |           country_agg_uncached |     2.99804 |      2.9996 |  0.00156 |   ops/s |
|                                             Median Throughput |           country_agg_uncached |     2.99807 |      2.9996 |  0.00154 |   ops/s |
|                                                Max Throughput |           country_agg_uncached |     2.99837 |     2.99967 |   0.0013 |   ops/s |
|                                       50th percentile latency |           country_agg_uncached |     182.423 |     248.738 |  66.3158 |      ms |
|                                       90th percentile latency |           country_agg_uncached |     239.122 |     263.508 |  24.3866 |      ms |
|                                       99th percentile latency |           country_agg_uncached |     274.684 |     285.451 |  10.7667 |      ms |
|                                      100th percentile latency |           country_agg_uncached |     307.982 |     285.541 | -22.4406 |      ms |
|                                  50th percentile service time |           country_agg_uncached |     181.304 |     246.987 |   65.683 |      ms |
|                                  90th percentile service time |           country_agg_uncached |     238.151 |     262.073 |  23.9221 |      ms |
|                                  99th percentile service time |           country_agg_uncached |     273.206 |     284.051 |  10.8451 |      ms |
|                                 100th percentile service time |           country_agg_uncached |     306.247 |     284.392 | -21.8547 |      ms |
|                                                    error rate |           country_agg_uncached |           0 |           0 |        0 |       % |

|                                                Min Throughput |             country_agg_cached |     98.0588 |     98.5585 |  0.49968 |   ops/s |
|                                               Mean Throughput |             country_agg_cached |     98.5743 |     98.9414 |   0.3671 |   ops/s |
|                                             Median Throughput |             country_agg_cached |     98.6257 |     98.9794 |  0.35374 |   ops/s |
|                                                Max Throughput |             country_agg_cached |     98.9362 |     99.2075 |  0.27132 |   ops/s |
|                                       50th percentile latency |             country_agg_cached |     2.27887 |     4.96466 |  2.68579 |      ms |
|                                       90th percentile latency |             country_agg_cached |     3.73894 |     5.81729 |  2.07835 |      ms |
|                                       99th percentile latency |             country_agg_cached |     6.05356 |     6.23036 |   0.1768 |      ms |
|                                     99.9th percentile latency |             country_agg_cached |     6.40937 |     12.1867 |  5.77729 |      ms |
|                                      100th percentile latency |             country_agg_cached |     6.84441 |     17.0093 |  10.1649 |      ms |
|                                  50th percentile service time |             country_agg_cached |     1.35882 |     3.75803 |  2.39921 |      ms |
|                                  90th percentile service time |             country_agg_cached |      1.7465 |     4.20416 |  2.45766 |      ms |
|                                  99th percentile service time |             country_agg_cached |     4.45217 |     4.66835 |  0.21617 |      ms |
|                                99.9th percentile service time |             country_agg_cached |     4.93103 |     10.7866 |  5.85555 |      ms |
|                                 100th percentile service time |             country_agg_cached |     5.80315 |     15.2267 |  9.42354 |      ms |
|                                                    error rate |             country_agg_cached |           0 |           0 |        0 |       % |

|                                                Min Throughput |                     expression |     1.50107 |     1.50221 |  0.00113 |   ops/s |
|                                               Mean Throughput |                     expression |      1.5013 |     1.50266 |  0.00137 |   ops/s |
|                                             Median Throughput |                     expression |     1.50128 |     1.50264 |  0.00136 |   ops/s |
|                                                Max Throughput |                     expression |     1.50159 |     1.50325 |  0.00167 |   ops/s |
|                                       50th percentile latency |                     expression |     278.867 |     261.689 | -17.1779 |      ms |
|                                       90th percentile latency |                     expression |     357.976 |     297.028 | -60.9483 |      ms |
|                                       99th percentile latency |                     expression |     389.866 |      319.97 | -69.8962 |      ms |
|                                      100th percentile latency |                     expression |     410.049 |     322.453 | -87.5951 |      ms |
|                                  50th percentile service time |                     expression |     277.285 |     259.354 | -17.9312 |      ms |
|                                  90th percentile service time |                     expression |     356.321 |     295.189 | -61.1322 |      ms |
|                                  99th percentile service time |                     expression |      387.41 |     317.979 | -69.4306 |      ms |
|                                 100th percentile service time |                     expression |     408.686 |      321.32 | -87.3668 |      ms |
|                                                    error rate |                     expression |           0 |           0 |        0 |       % |

|                                                Min Throughput |                painless_static |     1.49818 |      1.4949 | -0.00328 |   ops/s |
|                                               Mean Throughput |                painless_static |     1.49853 |     1.49585 | -0.00268 |   ops/s |
|                                             Median Throughput |                painless_static |     1.49855 |     1.49589 | -0.00266 |   ops/s |
|                                                Max Throughput |                painless_static |     1.49879 |     1.49658 | -0.00221 |   ops/s |
|                                       50th percentile latency |                painless_static |     337.309 |     344.385 |  7.07632 |      ms |
|                                       90th percentile latency |                painless_static |      449.12 |     402.796 | -46.3243 |      ms |
|                                       99th percentile latency |                painless_static |     490.858 |     436.599 | -54.2592 |      ms |
|                                      100th percentile latency |                painless_static |     493.903 |     444.296 | -49.6067 |      ms |
|                                  50th percentile service time |                painless_static |     336.008 |     342.383 |  6.37574 |      ms |
|                                  90th percentile service time |                painless_static |     448.471 |      400.92 | -47.5505 |      ms |
|                                  99th percentile service time |                painless_static |     489.874 |     435.308 | -54.5664 |      ms |
|                                 100th percentile service time |                painless_static |      493.21 |     442.407 |  -50.803 |      ms |
|                                                    error rate |                painless_static |           0 |           0 |        0 |       % |

|                                       50th percentile latency |               painless_dynamic |     345.909 |    0.741502 | -345.168 |      ms |
|                                       90th percentile latency |               painless_dynamic |     448.162 |    0.845196 | -447.317 |      ms |
|                                       99th percentile latency |               painless_dynamic |     483.554 |      1.0135 | -482.541 |      ms |
|                                      100th percentile latency |               painless_dynamic |     486.248 |     1.03213 | -485.216 |      ms |
|                                  50th percentile service time |               painless_dynamic |      344.23 |    0.741502 | -343.488 |      ms |
|                                  90th percentile service time |               painless_dynamic |     447.149 |    0.845196 | -446.304 |      ms |
|                                  99th percentile service time |               painless_dynamic |     481.864 |      1.0135 |  -480.85 |      ms |
|                                 100th percentile service time |               painless_dynamic |     485.173 |     1.03213 | -484.141 |      ms |
|                                                    error rate |               painless_dynamic |           0 |         100 |      100 |       % |
....
|                                                Min Throughput |     field_value_function_score |     1.50373 |     1.50397 |  0.00024 |   ops/s |
|                                               Mean Throughput |     field_value_function_score |     1.50454 |     1.50483 |  0.00029 |   ops/s |
|                                             Median Throughput |     field_value_function_score |     1.50448 |     1.50476 |  0.00029 |   ops/s |
|                                                Max Throughput |     field_value_function_score |     1.50558 |     1.50593 |  0.00035 |   ops/s |
|                                       50th percentile latency |     field_value_function_score |     166.929 |     157.711 | -9.21808 |      ms |
|                                       90th percentile latency |     field_value_function_score |     189.263 |      184.71 | -4.55316 |      ms |
|                                       99th percentile latency |     field_value_function_score |     195.331 |     208.833 |  13.5022 |      ms |
|                                      100th percentile latency |     field_value_function_score |     200.676 |     221.516 |  20.8395 |      ms |
|                                  50th percentile service time |     field_value_function_score |     165.204 |     155.988 | -9.21626 |      ms |
|                                  90th percentile service time |     field_value_function_score |     188.038 |     182.797 | -5.24123 |      ms |
|                                  99th percentile service time |     field_value_function_score |     193.969 |     206.876 |  12.9071 |      ms |
|                                 100th percentile service time |     field_value_function_score |     199.424 |     219.649 |  20.2249 |      ms |
|                                                    error rate |     field_value_function_score |           0 |           0 |        0 |       % |

|                                                Min Throughput |       field_value_script_score |     1.50355 |     1.50375 |   0.0002 |   ops/s |
|                                               Mean Throughput |       field_value_script_score |     1.50432 |     1.50456 |  0.00023 |   ops/s |
|                                             Median Throughput |       field_value_script_score |     1.50426 |      1.5045 |  0.00024 |   ops/s |
|                                                Max Throughput |       field_value_script_score |     1.50531 |      1.5056 |  0.00029 |   ops/s |
|                                       50th percentile latency |       field_value_script_score |     210.476 |     153.393 | -57.0826 |      ms |
|                                       90th percentile latency |       field_value_script_score |     239.582 |     178.513 |  -61.069 |      ms |
|                                       99th percentile latency |       field_value_script_score |     258.723 |     230.565 | -28.1578 |      ms |
|                                      100th percentile latency |       field_value_script_score |     270.413 |     238.546 | -31.8676 |      ms |
|                                  50th percentile service time |       field_value_script_score |     209.328 |     151.599 | -57.7291 |      ms |
|                                  90th percentile service time |       field_value_script_score |     237.922 |     176.424 | -61.4978 |      ms |
|                                  99th percentile service time |       field_value_script_score |     257.421 |     228.758 |  -28.663 |      ms |
|                                 100th percentile service time |       field_value_script_score |     269.235 |     236.737 | -32.4975 |      ms |
|                                                    error rate |       field_value_script_score |           0 |           0 |        0 |       % |


|                                                Min Throughput |           desc_sort_population |     1.50481 |     1.50389 | -0.00092 |   ops/s |
|                                               Mean Throughput |           desc_sort_population |     1.50585 |     1.50474 | -0.00111 |   ops/s |
|                                             Median Throughput |           desc_sort_population |     1.50577 |     1.50468 | -0.00109 |   ops/s |
|                                                Max Throughput |           desc_sort_population |     1.50719 |     1.50582 | -0.00137 |   ops/s |
|                                       50th percentile latency |           desc_sort_population |      4.0631 |      100.25 |  96.1873 |      ms |
|                                       90th percentile latency |           desc_sort_population |     4.64736 |     125.915 |  121.268 |      ms |
|                                       99th percentile latency |           desc_sort_population |     10.4932 |      140.85 |  130.357 |      ms |
|                                      100th percentile latency |           desc_sort_population |     10.5316 |     161.286 |  150.754 |      ms |
|                                  50th percentile service time |           desc_sort_population |     2.67249 |     98.4228 |  95.7503 |      ms |
|                                  90th percentile service time |           desc_sort_population |     3.08992 |     123.962 |  120.872 |      ms |
|                                  99th percentile service time |           desc_sort_population |     8.55561 |     138.964 |  130.409 |      ms |
|                                 100th percentile service time |           desc_sort_population |     8.74436 |     159.249 |  150.505 |      ms |
|                                                    error rate |           desc_sort_population |           0 |           0 |        0 |       % |

|                                                Min Throughput |            asc_sort_population |     1.50496 |     1.50437 |  -0.0006 |   ops/s |
|                                               Mean Throughput |            asc_sort_population |     1.50604 |     1.50531 | -0.00073 |   ops/s |
|                                             Median Throughput |            asc_sort_population |     1.50596 |     1.50524 | -0.00072 |   ops/s |
|                                                Max Throughput |            asc_sort_population |     1.50742 |     1.50652 |  -0.0009 |   ops/s |
|                                       50th percentile latency |            asc_sort_population |     3.33881 |     95.5045 |  92.1657 |      ms |
|                                       90th percentile latency |            asc_sort_population |     4.79438 |     125.935 |   121.14 |      ms |
|                                       99th percentile latency |            asc_sort_population |     8.03781 |     138.533 |  130.496 |      ms |
|                                      100th percentile latency |            asc_sort_population |     8.12174 |     139.951 |  131.829 |      ms |
|                                  50th percentile service time |            asc_sort_population |     1.81037 |     93.5794 |   91.769 |      ms |
|                                  90th percentile service time |            asc_sort_population |     2.91967 |     124.166 |  121.247 |      ms |
|                                  99th percentile service time |            asc_sort_population |     6.05693 |     136.212 |  130.155 |      ms |
|                                 100th percentile service time |            asc_sort_population |     6.10385 |     136.803 |    130.7 |      ms |
|                                                    error rate |            asc_sort_population |           0 |           0 |        0 |       % |

|                                                Min Throughput | asc_sort_with_after_population |     1.50486 |     1.50395 | -0.00091 |   ops/s |
|                                               Mean Throughput | asc_sort_with_after_population |     1.50591 |      1.5048 |  -0.0011 |   ops/s |
|                                             Median Throughput | asc_sort_with_after_population |     1.50583 |     1.50474 | -0.00109 |   ops/s |
|                                                Max Throughput | asc_sort_with_after_population |     1.50726 |      1.5059 | -0.00136 |   ops/s |
|                                       50th percentile latency | asc_sort_with_after_population |     5.54227 |      144.29 |  138.748 |      ms |
|                                       90th percentile latency | asc_sort_with_after_population |     6.22942 |     171.732 |  165.503 |      ms |
|                                       99th percentile latency | asc_sort_with_after_population |     15.7206 |     205.437 |  189.716 |      ms |
|                                      100th percentile latency | asc_sort_with_after_population |      15.798 |     218.585 |  202.787 |      ms |
|                                  50th percentile service time | asc_sort_with_after_population |     4.16325 |     142.475 |  138.311 |      ms |
|                                  90th percentile service time | asc_sort_with_after_population |     4.69177 |      170.08 |  165.389 |      ms |
|                                  99th percentile service time | asc_sort_with_after_population |     13.4406 |      203.93 |  190.489 |      ms |
|                                 100th percentile service time | asc_sort_with_after_population |     14.0393 |     217.258 |  203.219 |      ms |
|                                                    error rate | asc_sort_with_after_population |           0 |           0 |        0 |       % |

Issues Resolved

Closes #2083, documentation opensearch-project/documentation-website#3585

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testShardRoutingWithNetworkDisruption_FailOpenEnabled
      1 org.opensearch.indices.replication.SegmentReplicationStatsIT.testSegmentReplicationStatsResponse

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

codecov-commenter commented Feb 10, 2023

Codecov Report

Merging #6237 (e90e6ec) into main (1ab22e1) will decrease coverage by 0.09%.
The diff coverage is 41.89%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #6237      +/-   ##
============================================
- Coverage     70.52%   70.43%   -0.09%     
- Complexity    59412    59489      +77     
============================================
  Files          4862     4872      +10     
  Lines        285530   286190     +660     
  Branches      41153    41274     +121     
============================================
+ Hits         201357   201579     +222     
- Misses        67588    68038     +450     
+ Partials      16585    16573      -12     
Impacted Files Coverage Δ
...rg/opensearch/core/xcontent/MapXContentParser.java 73.07% <0.00%> (-1.44%) ⬇️
...a/org/opensearch/core/xcontent/XContentParser.java 100.00% <ø> (ø)
...ggregations/bucket/geogrid/cells/CellIdSource.java 64.28% <0.00%> (-4.95%) ⬇️
...ons/bucket/geogrid/cells/GeoShapeCellIdSource.java 0.00% <0.00%> (ø)
...ch/common/xcontent/JsonToStringXContentParser.java 44.44% <0.00%> (-0.56%) ⬇️
...org/opensearch/index/fielddata/IndexFieldData.java 70.00% <ø> (ø)
...rg/opensearch/index/fielddata/ScriptDocValues.java 47.61% <0.00%> (-3.85%) ⬇️
...ddata/UnsignedLongToSortedNumericDoubleValues.java 0.00% <0.00%> (ø)
...dex/search/comparators/UnsignedLongComparator.java 0.00% <0.00%> (ø)
...n/java/org/opensearch/search/SearchSortValues.java 85.71% <0.00%> (-3.65%) ⬇️
... and 44 more

... and 481 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@reta reta force-pushed the issue-2083 branch 2 times, most recently from b630b3e to ae66f44 Compare February 15, 2023 14:24
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@@ -201,9 +201,15 @@ private static Object convertValueFromSortType(String fieldName, SortField.Type

case LONG:
// for unsigned_long field type we want to pass search_after value through formatting
if (value instanceof Number && format != DocValueFormat.UNSIGNED_LONG_SHIFTED) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious , what was the reason we didnt support search_after earlier for unsigned_long and now we can ?

Copy link
Collaborator Author

@reta reta Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unsigned_long (in some form) was supported when DocValueFormat.UNSIGNED_LONG_SHIFTED was used, but to be fair do not know how this value was used later on (this is/was x-pack feature, I have not looked into the code).

Copy link
Contributor

@gashutos gashutos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed Indexing & Search->Sort area. That looks good to me, but left few comments and doubts.
Someone has to help here on Search->aggregation code changes area who is more familiar with that code.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented May 1, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationIT.testScrollCreatedOnReplica
      1 org.opensearch.indices.replication.SegmentReplicationIT.testReplicationPostDeleteAndForceMerge
      1 org.opensearch.cluster.ClusterHealthIT.testHealthOnClusterManagerFailover

@dblock
Copy link
Member

dblock commented May 1, 2023

@gashutos care to take a look at the updates for me? anyone else wants to CR @andrross ?

Copy link
Contributor

@gashutos gashutos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor Comments.
PR/changes looks pretty good to me. The biggest plus point here is, changes are completely different path then existing NumericTypes, so no possibility of regression as far as I see, but yeah, these are big changes, but integ tests are covered nicely.

return h;
}

@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont need this to override right ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we need it since visit is abstract method in Query

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, my bad, I was comapring with SortedLongDocValuesRangeQueries.

visitor.visitLeaf(this);
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too we dont need to override ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly we need it, toString is abstract method in Query

@Override
public boolean matches() throws IOException {
final long value = singleton.longValue();
return Long.compareUnsigned(value, lowerValue) >= 0 && Long.compareUnsigned(value, upperValue) <= 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search_after is also kind of range query, that did not require any change ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw the code, search_after is not supported for unsigned_long as of now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ... it should be supported, I do have a test case for it [1]

I just saw the code, search_after is not supported for unsigned_long as of now.

Could you please point me the place that I missed, the search_after should be supported I think, thank you.

[1] https://github.com/opensearch-project/OpenSearch/pull/6237/files#diff-dd44653f1a2055a1e3cdedaac251645e4a9d2cf3170cd017e8fe96723cc416cd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow I overlooked here we were throwing exception in searchafter builder in case of unsigned_log
server/src/main/java/org/opensearch/search/searchafter/SearchAfterBuilder.java

All good ! Thank you for clarifying.

final BigInteger v = Numbers.toUnsignedLongExact(value);

if (indexed) {
fields.add(new BigIntegerPoint(name, v));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is getting indexed with BigIntegerPoint values, can we use Numeric point based sort optimization here too ? I see in IndexNumenricFieldData, we have custom comparator and dont use sort optimization in this case. May be we can just check, but I think it might not be possilbe since there is no SortType which fits unsigned long.

Copy link
Collaborator Author

@reta reta May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, the SortType is the show stopper here sadly :( I kept this particular feature (and index sort as well) as a future improvement, all because of the SortType limitation

@@ -157,6 +157,17 @@ static void register(ValuesSourceRegistry.Builder builder) {
compositeValuesSourceConfig.reverseMul()
);

} else if (vs.isBigInteger()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WE might have to work little bit in future in case we add some more NumericType ither then unsigned_long which doesnt fit size of long/double, BigInteger is for all other types but lets see if at all we need to make this generic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it at some but we are very much limited by Apache Lucene: the unsigned_long is as far as we could get right now (we could not use the BigInteger at full sadly). But sure, if you have suggestion to generalize it now - happy to take them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it would be not be required at all to have any such more type.

@github-actions
Copy link
Contributor

github-actions bot commented May 3, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented May 3, 2023

Gradle Check (Jenkins) Run Completed with:

CHANGELOG.md Outdated
@@ -90,6 +90,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- [Extensions] Add IdentityPlugin into core to support Extension identities ([#7246](https://github.com/opensearch-project/OpenSearch/pull/7246))
- Add descending order search optimization through reverse segment read. ([#7244](https://github.com/opensearch-project/OpenSearch/pull/7244))
- [Extensions] Moving RestActions APIs to protobuf serialization. ([#7302](https://github.com/opensearch-project/OpenSearch/pull/7302))
- Introduce new 'unsigned_long' numeric field type support ([#6237](https://github.com/opensearch-project/OpenSearch/pull/6237))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should say Add unsigned_long numeric field type.

I'm ready to merge otherwise :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dblock! @nknize do you have time to glance through? thank you

reta added 3 commits May 4, 2023 08:32
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
@github-actions
Copy link
Contributor

github-actions bot commented May 4, 2023

Gradle Check (Jenkins) Run Completed with:

@dblock dblock merged commit e2f21d8 into opensearch-project:main May 4, 2023
@dblock
Copy link
Member

dblock commented May 4, 2023

I decided to merge, we have time to revert if @nknize has strong opinions ;)

@dblock dblock added the backport 2.x Backport to 2.x branch label May 4, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-6237-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e2f21d824ac0dc76d826b9bbdc32ec80df7c9bf6
# Push it to GitHub
git push --set-upstream origin backport/backport-6237-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-6237-to-2.x.

@kotwanikunal
Copy link
Member

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-6237-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e2f21d824ac0dc76d826b9bbdc32ec80df7c9bf6
# Push it to GitHub
git push --set-upstream origin backport/backport-6237-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-6237-to-2.x.

@reta Looks like it needs a manual backport, hopefully not for the changelog.

reta added a commit to reta/OpenSearch that referenced this pull request May 4, 2023
…project#6237)

* Introduce new 'unsigned_long' numeric field type support

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Address code review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Addressed code review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

---------

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
(cherry picked from commit e2f21d8)
reta added a commit to reta/OpenSearch that referenced this pull request May 4, 2023
…project#6237)

* Introduce new 'unsigned_long' numeric field type support

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Address code review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Addressed code review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

---------

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
(cherry picked from commit e2f21d8)
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
kotwanikunal pushed a commit that referenced this pull request May 4, 2023
* Introduce new 'unsigned_long' numeric field type support



* Address code review comments



* Addressed code review comments



---------


(cherry picked from commit e2f21d8)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…project#6237)

* Introduce new 'unsigned_long' numeric field type support

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Address code review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Addressed code review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

---------

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch v2.8.0 'Issues and PRs related to version v2.8.0' v3.0.0 Issues and PRs related to version 3.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] mapper_parsing_exception: No handler for type [unsigned_long] declared
6 participants