Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Geospatial Data Type and GIS Function Support for milvus #37417

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

tasty-gumi
Copy link
Contributor

issue:#27576
pr:#35990

Main Goals

  1. Create and describe collections with geospatial fields, enabling both client and server to recognize and process geo fields.
  2. Insert geospatial data as payload values in the insert binlog, and print the values for verification.
  3. Load segments containing geospatial data into memory.
  4. Ensure query outputs can display geospatial data.
  5. Support filtering on GIS functions for geospatial columns.

Solution

  1. Add Type: Modify the Milvus core by adding a Geospatial type in both the C++ and Go code layers, defining the Geospatial data structure and the corresponding interfaces.
  2. Dependency Libraries: Introduce necessary geospatial data processing libraries. In the C++ source code, use Conan package management to include the GDAL library. In the Go source code, add the go-geom library to the go.mod file.
  3. Protocol Interface: Revise the Milvus protocol to provide mechanisms for Geospatial message serialization and deserialization.
  4. Data Pipeline: Facilitate interaction between the client and proxy using the WKT format for geospatial data. The proxy will convert all data into WKB format for downstream processing, providing column data interfaces, segment encapsulation, segment loading, payload writing, and cache block management.
  5. Query Operators: Implement simple display and support for filter queries. Initially, focus on filtering based on spatial relationships for a single column of geospatial literal values, providing parsing and execution for query expressions.
  6. Index Construction: Consider building an H3 index, utilizing the C interface provided by the H3 system.
  7. Client Modification: Enable the client to handle user input for geospatial data and facilitate end-to-end testing.Check the modification in pymilvus.

delete incomplete H3 Index development and useless generated files.
fix conanfiles in milvus conan repo so that local can fetch the packages to build libraries

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tasty-gumi
To complete the pull request process, please assign czs007 after the PR has been reviewed.
You can assign the PR to them by writing /assign @czs007 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot added size/XXL Denotes a PR that changes 1000+ lines. area/dependency Pull requests that update a dependency file area/test sig/testing test/integration integration test labels Nov 4, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/feature Issues related to feature request from users labels Nov 4, 2024
Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@czs007
Copy link
Collaborator

czs007 commented Nov 4, 2024

rerun go-sdk

@czs007
Copy link
Collaborator

czs007 commented Nov 4, 2024

rerun cpp-unit-test

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@czs007
Copy link
Collaborator

czs007 commented Nov 4, 2024

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

1 similar comment
Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 5, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 5, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 5, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 5, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 8, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 8, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 8, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@czs007
Copy link
Collaborator

czs007 commented Nov 9, 2024

@tasty-gumi
pytest : test] pymilvus.exceptions.MilvusException: <MilvusException: (code=65535, message=create auto index on type:JSON is not supported)>

[pytest : test] (api_request.py:45)

[pytest : test] [2024-11-08 14:24:30 - ERROR - ci_test]: (api_response) : <MilvusException: (code=65535, message=create auto index on type:JSON is not supported)> (api_request.py:46)

[pytest : test] ---------- generated html file: file:///tmp/ci_logs/test/report.html -----------

[pytest : test] =========================== short test summary info ============================

[pytest : test] FAILED testcases/test_index.py::TestIndexInvalid::test_create_index_json

[pytest : test] !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!

Copy link
Contributor

mergify bot commented Nov 11, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 12, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 12, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

@czs007
Copy link
Collaborator

czs007 commented Nov 13, 2024

@tasty-gumi Intergration Test failed.

2024-11-12T13:06:50.0930073Z [2024/11/12 13:02:43.107 +00:00] [INFO] [querynodev2/services.go:483] ["start to load segments in parallel"] [collectionID=453880325253365774] [segmentType=Sealed] [requestSegments="[453880325253771188]"] [preparedSegments="[453880325253771188]"] [segmentNum=1] [concurrencyLevel=1]
2024-11-12T13:06:50.0931913Z [2024/11/12 13:02:43.107 +00:00] [WARN] [querynodev2/services.go:461] ["worker failed to load segments"] [collectionID=453880325253365774] [channel=by-dev-rootcoord-dml_0_453880325253365774v0] [replicaID=453880325350359041] [workID=1] [segments="[453880325253769583]"] [error="At LoadSegment: => unsupported data type at /go/src/github.com/milvus-io/milvus/internal/core/src/segcore/ChunkedSegmentSealedImpl.cpp:442\n"]
2024-11-12T13:06:50.0933010Z [2024/11/12 13:02:43.108 +00:00] [INFO] [funcutil/parallel.go:86] ["load segment..."] [collectionID=453880325253365774] [segmentType=Sealed] [requestSegments="[453880325253771188]"] [preparedSegments="[453880325253771188]"] [partitionID=453880325253365775] [segmentID=453880325253771188] [segmentType=L1]
2024-11-12T13:06:50.0935035Z [2024/11/12 13:02:43.108 +00:00] [WARN] [querynode/service.go:301] ["delegator failed to load segments"] [collectionID=453880325253365774] [partitionID=453880325253365775] [shard=by-dev-rootcoord-dml_0_453880325253365774v0] [segmentID=453880325253769583] [level=L1] [currentNodeID=1] [dstNodeID=1] [error="At LoadSegment: => unsupported data type at /go/src/github.com/milvus-io/milvus/internal/core/src/segcore/ChunkedSegmentSealedImpl.cpp:442\n"]
2024-11-12T13:06:50.0936237Z [2024/11/12 13:02:43.108 +00:00] [INFO] [segments/segment_loader.go:334] ["start loading segment files"] [collectionID=453880325253365774] [partitionID=453880325253365775] [shard=by-dev-rootcoord-dml_0_453880325253365774v0] [segmentID=453880325253771188] [rowNum=3000] [segmentType=Sealed]
2024-11-12T13:06:50.0938012Z [2024/11/12 13:02:43.108 +00:00] [WARN] [task/executor.go:160] ["failed to load segment"] [taskID=1731416037011] [collectionID=453880325253365774] [replicaID=453880325350359041] [segmentID=453880325253769583] [node=1] [source=segment_checker] [shardLeader=1] [error="At LoadSegment: => unsupported data type at /go/src/github.com/milvus-io/milvus/internal/core/src/segcore/ChunkedSegmentSealedImpl.cpp:442\n"]
2024-11-12T13:06:50.0939242Z [2024/11/12 13:02:43.109 +00:00] [INFO] [task/executor.go:142] ["execute ac

Copy link
Contributor

mergify bot commented Nov 13, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@tasty-gumi
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Nov 13, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 13, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 13, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 13, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 13, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

add geospatial interface in src common

change type define and add segcore support

add storage & chunkdata support

feature: go package storage & proxy & typeutil support geospatial type in internal and typeutil in pkg

Signed-off-by: tasty-gumi <1021989072@qq.com>

add geospatial interface in src common

change type define and add segcore support

change: use wkb only in core

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix:the geospatial only use std::string as FieldDataImpl template paramters && add geospatial data generation && pass chunk ,growing , sealed test

fix : merge confilcts after rebase ,test nullable not pass due to upstream

feat:basic GIS Function expr and visitor impl and GIS proto support && add:storage test of geo data

Signed-off-by: tasty-gumi <1021989072@qq.com>

feat:add proxy validate (pass httpserver test) && plan parser of geospatialfunction

fix:sealedseg && go tidy

fix:go mod

feat:can produce wkt result for pymilvus client

feat: add parser and query operator for geos filed && print geos binlog as wkt

fix:fielddataimpl interface
Signed-off-by: tasty-gumi <1021989072@qq.com>

fix: some format of code && segmentfault debug for rebase

Signed-off-by: tasty-gumi <1021989072@qq.com>

add: import util test for parquet and mix compaction test

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix: delete useless file and fix error for rebase

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix: git rebase for custom function feat

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix:rename geospatial field && update proto && rewrite Geometry class with smart pointer

Signed-off-by: tasty-gumi <1021989072@qq.com>

add:last commit miss add files

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix: geospatial name replace in test files && fix geomertry and parser

fix:remove some file change for dev

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix:remove size in if && add destory in ~Geometry()

Signed-off-by: tasty-gumi <1021989072@qq.com>

add:conan file gdal rep

Signed-off-by: tasty-gumi <1021989072@qq.com>

remove:gdal fPIC

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix: for rebase

Signed-off-by: tasty-gumi <1021989072@qq.com>

remove:log_warn

Signed-off-by: tasty-gumi <1021989072@qq.com>

remove:gdal shared

Signed-off-by: tasty-gumi <1021989072@qq.com>

remove:tbbproxy

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix:add gdal option && update go mod

Signed-off-by: tasty-gumi <1021989072@qq.com>

dev:change some scripts

Signed-off-by: tasty-gumi <1021989072@qq.com>

remove: dev scripts

Signed-off-by: tasty-gumi <1021989072@qq.com>

add:conan files dependency of gdal

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix:fmt cpp code

Signed-off-by: tasty-gumi <1021989072@qq.com>

add:delete geos-config in cmake_bulid/bin which may cause permission deny

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix: add go client geometry interface && fix group by test

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix: mod tidy for tests go client

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix:memory leak in test and go fmt

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix: datagen function remove pkoffset

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix: go-client test add entity.geometry

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix: fix test args and add some annotations

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix:name and remove wkt marshl MaxDecimalDigits limit

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix:misspell

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix:go client test

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix:listA size

Signed-off-by: tasty-gumi <1021989072@qq.com>

add:field data in schema_test

Signed-off-by: tasty-gumi <1021989072@qq.com>

test:add mergefield data

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix:test err code modify

Signed-off-by: tasty-gumi <1021989072@qq.com>

fmt code

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix:add geo  type in client

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix:add type in chunksegment sealdimpl

Signed-off-by: tasty-gumi <1021989072@qq.com>

fix:add chunk writer for geometry

Signed-off-by: tasty-gumi <1021989072@qq.com>
Signed-off-by: tasty-gumi <1021989072@qq.com>
Signed-off-by: tasty-gumi <1021989072@qq.com>
Copy link
Contributor

mergify bot commented Dec 7, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Signed-off-by: tasty-gumi <1021989072@qq.com>
Copy link
Contributor

mergify bot commented Dec 7, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dependency Pull requests that update a dependency file area/test dco-passed DCO check passed. kind/feature Issues related to feature request from users sig/testing size/XXL Denotes a PR that changes 1000+ lines. test/integration integration test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants