Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support json contains feature #25384

Merged
merged 1 commit into from
Aug 11, 2023

Conversation

xiaocai2333
Copy link
Contributor

@xiaocai2333 xiaocai2333 commented Jul 6, 2023

issue: #25276

  1. Redefine the json_contains expression so that it no longer depends on the term expression.
  2. support json_contains(json_array, element), element can be an array.
  3. support json_contains_all(json_array, element_array), all element in element_array must be in the json_array.
  4. support json_contains_any(json_array, element_array), any element in element_array is in the json_array.

@sre-ci-robot sre-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines. label Jul 6, 2023
@mergify mergify bot added the dco-passed DCO check passed. label Jul 6, 2023
@xiaocai2333 xiaocai2333 force-pushed the json_contains_feature-4 branch from e3f8640 to 16e210c Compare July 7, 2023 01:56
@codecov
Copy link

codecov bot commented Jul 7, 2023

Codecov Report

Merging #25384 (6cbd644) into master (b6fcbb0) will increase coverage by 0.02%.
Report is 1 commits behind head on master.
The diff coverage is 87.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25384      +/-   ##
==========================================
+ Coverage   82.62%   82.65%   +0.02%     
==========================================
  Files         812      812              
  Lines      109153   109522     +369     
==========================================
+ Hits        90191    90528     +337     
- Misses      15828    15863      +35     
+ Partials     3134     3131       -3     
Files Changed Coverage Δ
internal/core/src/common/Types.h 3.12% <ø> (ø)
internal/core/src/query/PlanProto.h 7.14% <ø> (ø)
...nternal/core/src/query/generated/ExecExprVisitor.h 100.00% <ø> (ø)
.../core/src/query/generated/ExtractInfoExprVisitor.h 100.00% <ø> (ø)
...nternal/core/src/query/generated/ShowExprVisitor.h 0.00% <ø> (ø)
...ternal/core/src/query/visitors/ShowExprVisitor.cpp 0.00% <0.00%> (ø)
...rnal/core/src/query/visitors/VerifyExprVisitor.cpp 0.00% <0.00%> (ø)
...ternal/core/src/query/visitors/ExecExprVisitor.cpp 78.29% <81.57%> (+0.67%) ⬆️
internal/core/src/query/PlanProto.cpp 82.75% <85.10%> (+0.40%) ⬆️
internal/core/src/query/Expr.h 85.71% <100.00%> (+1.09%) ⬆️
... and 4 more

... and 10 files with indirect coverage changes

@xiaocai2333 xiaocai2333 force-pushed the json_contains_feature-4 branch from 16e210c to 9c1bace Compare July 28, 2023 08:59
@xiaocai2333
Copy link
Contributor Author

xiaocai2333 commented Jul 28, 2023

test example:

def test():
    connections.connect()
    int64_field = FieldSchema(name="int64", dtype=DataType.INT64, is_primary=True)
    json_field = FieldSchema(name="json", dtype=DataType.JSON)
    float_vector = FieldSchema(name="float_vector", dtype=DataType.FLOAT_VECTOR, dim=128)

    schema = CollectionSchema(fields=[int64_field, json_field, float_vector], enable_dynamic_field=True)
    hello_milvus = Collection("hello_milvus", schema=schema)

    import numpy as np
    rng = np.random.default_rng(seed=19530)
    rows = []
    rows.append({"int64": 1, "float_vector": rng.random((1, 128))[0], "json": {"array": [1,2,3,4,5,6]}, "array": [[1,2,3,4,5,6]]})
    rows.append({"int64": 2, "float_vector": rng.random((1, 128))[0], "json": {"array": ["a","b","c"]}, "array": [["a","b","c"]]})
    rows.append({"int64": 3, "float_vector": rng.random((1, 128))[0], "json": {"array": [1,"a","d"]}, "array": [[1,"a","d"]]})
    rows.append({"int64": 4, "float_vector": rng.random((1, 128))[0], "json": {"array": [1,2,3,4,5,6]}, "array": [[1,2,3,4,5,6]]})

    hello_milvus.insert(rows)
    index_type = "IVF_FLAT"
    index_params = {"nlist": 128}
    hello_milvus.create_index("float_vector",
                              index_params={"index_type": index_type, "params": index_params, "metric_type": "L2"})
    hello_milvus.load()

    expr = r'json_contains(json["array"], 3)'
    res = hello_milvus.search([rng.random((1, 128))[0]], "float_vector", {"metric_type": "L2"}, limit=6, expr=expr, output_fields=["$meta"])
    print(expr)
    print(res)

    expr = r'json_contains(json["array"], "a")'
    res = hello_milvus.search([rng.random((1, 128))[0]], "float_vector", {"metric_type": "L2"}, limit=6, expr=expr, output_fields=["$meta"])
    print(expr)
    print(res)

    expr = r'json_contains(array, [1,2,3,4,5,6])'
    res = hello_milvus.search([rng.random((1, 128))[0]], "float_vector", {"metric_type": "L2"}, limit=6, expr=expr,
                              output_fields=["$meta"])
    print(expr)
    print(res)

    expr = r'json_contains_all(json["array"], [1,2,3])'
    res = hello_milvus.search([rng.random((1, 128))[0]], "float_vector", {"metric_type": "L2"}, limit=6, expr=expr, output_fields=["json"])
    print(expr)
    print(res)

    expr = r'json_contains_all(json["array"], [1,"a"])'
    res = hello_milvus.search([rng.random((1, 128))[0]], "float_vector", {"metric_type": "L2"}, limit=6, expr=expr, output_fields=["y"])
    print(expr)
    print(res)

    expr = r'json_contains_all(json["array"], [1,"a", 2])'
    res = hello_milvus.search([rng.random((1, 128))[0]], "float_vector", {"metric_type": "L2"}, limit=6, expr=expr, output_fields=["$meta"])
    print(expr)
    print(res)

    expr = r'json_contains_any(json["array"], [1,"a", 2])'
    res = hello_milvus.search([rng.random((1, 128))[0]], "float_vector", {"metric_type": "L2"}, limit=6, expr=expr, output_fields=["json"])
    print(expr)
    print(res)

    expr = r'json_contains_any(json["array"], [1,7])'
    res = hello_milvus.search([rng.random((1, 128))[0]], "float_vector", {"metric_type": "L2"}, limit=6, expr=expr, output_fields=["$meta"])
    print(expr)
    print(res)

    expr = r'json_contains_any(json["array"], [7,8])'
    res = hello_milvus.search([rng.random((1, 128))[0]], "float_vector", {"metric_type": "L2"}, limit=6, expr=expr,
                              output_fields=["$meta"])
    print(expr)
    print(res)

    expr = r'json_contains_any(json["array"], [1,"a"])'
    res = hello_milvus.search([rng.random((1, 128))[0]], "float_vector", {"metric_type": "L2"}, limit=6, expr=expr,
                              output_fields=["$meta"])
    print(expr)
    print(res)
    hello_milvus.drop()

@mergify
Copy link
Contributor

mergify bot commented Jul 28, 2023

@xiaocai2333 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@mergify
Copy link
Contributor

mergify bot commented Jul 31, 2023

@xiaocai2333 ut workflow job failed, comment rerun ut can trigger the job again.

@xiaocai2333 xiaocai2333 force-pushed the json_contains_feature-4 branch from 31d2476 to bffeb8d Compare July 31, 2023 06:55
@mergify mergify bot added needs-dco DCO is missing in this pull request. and removed dco-passed DCO check passed. labels Jul 31, 2023
@mergify
Copy link
Contributor

mergify bot commented Jul 31, 2023

@xiaocai2333 Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/master/CONTRIBUTING.md#developer-certificate-of-origin-dco.

@mergify mergify bot added dco-passed DCO check passed. and removed needs-dco DCO is missing in this pull request. labels Jul 31, 2023
@mergify
Copy link
Contributor

mergify bot commented Jul 31, 2023

@xiaocai2333 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@xiaocai2333
Copy link
Contributor Author

/run-cpu-e2e

@xiaocai2333 xiaocai2333 force-pushed the json_contains_feature-4 branch from bffeb8d to 1d41cd6 Compare August 2, 2023 07:56
@mergify
Copy link
Contributor

mergify bot commented Aug 2, 2023

@xiaocai2333 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@xiaocai2333
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Member

@yah01 yah01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some copying could be avoided, and the contains time complexity could be optimized, others lgtm

internal/core/src/query/visitors/ExecExprVisitor.cpp Outdated Show resolved Hide resolved
internal/core/src/query/visitors/ExecExprVisitor.cpp Outdated Show resolved Hide resolved
internal/core/src/query/visitors/ExecExprVisitor.cpp Outdated Show resolved Hide resolved
internal/core/src/query/visitors/ExecExprVisitor.cpp Outdated Show resolved Hide resolved
internal/core/src/query/visitors/ExecExprVisitor.cpp Outdated Show resolved Hide resolved
@czs007
Copy link
Collaborator

czs007 commented Aug 10, 2023

/hold

Copy link
Contributor

@longjiquan longjiquan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add ut for the changes of segcore.

internal/core/src/query/visitors/ExecExprVisitor.cpp Outdated Show resolved Hide resolved
@xiaocai2333 xiaocai2333 force-pushed the json_contains_feature-4 branch from 1d41cd6 to 15a67a7 Compare August 10, 2023 10:33
@mergify
Copy link
Contributor

mergify bot commented Aug 10, 2023

@xiaocai2333 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@xiaocai2333 xiaocai2333 force-pushed the json_contains_feature-4 branch from 15a67a7 to f9abc11 Compare August 11, 2023 00:35
@czs007
Copy link
Collaborator

czs007 commented Aug 11, 2023

/approve

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: czs007, xiaocai2333

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@czs007
Copy link
Collaborator

czs007 commented Aug 11, 2023

/unhold

@xiaocai2333 xiaocai2333 force-pushed the json_contains_feature-4 branch from f9abc11 to be994bd Compare August 11, 2023 03:34
@czs007 czs007 added this to the 2.3 milestone Aug 11, 2023
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
@xiaocai2333 xiaocai2333 force-pushed the json_contains_feature-4 branch from be994bd to 6cbd644 Compare August 11, 2023 07:02
@mergify
Copy link
Contributor

mergify bot commented Aug 11, 2023

@xiaocai2333 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@xiaocai2333
Copy link
Contributor Author

/run-cpu-e2e

@mergify
Copy link
Contributor

mergify bot commented Aug 11, 2023

@xiaocai2333 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@xiaocai2333
Copy link
Contributor Author

/run-cpu-e2e

@mergify mergify bot added the ci-passed label Aug 11, 2023
@longjiquan
Copy link
Contributor

/lgtm

@sre-ci-robot sre-ci-robot merged commit a0198ce into milvus-io:master Aug 11, 2023
@xiaocai2333 xiaocai2333 deleted the json_contains_feature-4 branch November 1, 2023 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/test ci-passed dco-passed DCO check passed. lgtm sig/testing size/XXL Denotes a PR that changes 1000+ lines. test/integration integration test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants