Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support table partition #77

Closed
yhmo opened this issue Oct 22, 2019 · 2 comments
Closed

[FEATURE] Support table partition #77

yhmo opened this issue Oct 22, 2019 · 2 comments
Assignees
Labels
kind/enhancement Issues or changes related to enhancement

Comments

@yhmo
Copy link
Contributor

yhmo commented Oct 22, 2019

Is your feature request related to a problem? Please describe.
I wish I could use Milvus to create partition within a table. User can insert vectors into a partition by specifying a partition tag, and search vectors from a certain partition of table. Partition can be dropped.

Describe the solution you'd like
to be discussed..

Additional context
Related issue:
#28 Add new api about vector deletion via generated date not insert date

@yhmo yhmo added the kind/enhancement Issues or changes related to enhancement label Oct 24, 2019
@yhmo yhmo self-assigned this Oct 24, 2019
@yhmo
Copy link
Contributor Author

yhmo commented Oct 26, 2019

Milvus partition proposal
SDK enhancement:

create_table({'table_name': "tag_tbl", 'dimension': 512, 'index_file_size': 1024, 'metric_type':MetricType.L2}); // old api no change

create_partition({'table_name':"tag_tbl", 'partition_name': "sub_tag_1", 'tag':"aaa"}); //new api

create_partition({'table_name':"tag_tbl", 'partition_name': "sub_tag_2", 'tag':"bbb"}); 

add_vector(table_name="tag_tbl", records=vec_list, ids=vec_ids, partition_tag="aaa"); //old api add a parameter

search_vectors(table_name="tag_tbl", query_records=query_vectors, top_k=k, nprobe=p, partition_tags=["aaa", "bbb"]); //old api add a parameter

show_partitions(table_name="tag_tbl"); //new api

delete_partion(table_name="tag_tbl", 'partition_name': "sub_tag_2"); //new api

Note:

A table can be partitioned even it already has data
If partition not specified, vectors will be inserted into parent table
If add_vectors api specify a non-exist tag, vectors will be inserted into parent table
Sub table index parameters are inherited from parent table
Delete parent table will also delete its sub-tables and all data
create_index("tag_tbl") specify the parent table and its sub-tables by same index parameter
search_vectors parameter partition_tags must support regex match

Server enhancement

  1. Add new columns to meta Tables:

The 'version' column is for general purpose.

The 'owner_table' and 'partition_tag' column default is empty.

id table_id state dimension created_on flag index_file_size engine_type nlist metric_type owner_table partition_tag version
1 tag_tbl 1 512 1570851293981928 0 1073741824 2 16384 1     6.0
2 sub_tag_1 1 512 1570851293436262 0 1073741824 2 16384 1 tag_tbl aaa 6.0
3 sub_tag_2 1 512 1570851293432383 0 1073741824 2 16384 1 tag_tbl bbb 6.0
  1. Grpc proto update

    message TableName {
    string table_name = 1;
    }

    message PartitionName {
    string partition_name = 1;
    }

    message PartitionParam {
    string partition_name = 1;
    string tag = 2; // must be non-empty
    }

    rpc CreatePartition(TableName, PartitionParam) returns (Status){}

    message InsertParam {
    string table_name = 1;
    repeated RowRecord row_record_array = 2;
    repeated int64 row_id_array = 3;
    string partition_tag = 4; // default empty
    }

    message SearchParam {
    string table_name = 1;
    repeated RowRecord query_record_array = 2;
    repeated Range query_range_array = 3;
    int64 topk = 4;
    int64 nprobe = 5;
    string partition_tag = 6; // default empty
    }

    message PartitionList {
    Status status = 1;
    repeated PartitionParam partitions = 2;
    }

    rpc ShowPartitions(TableName) returns (PartitionList) {}

    rpc DropPartition(TableName, PartitionName) return (Status){}

  2. Source code design

  • Implement task classes, input validation

Add new task: CreatePartitionTask, DropPartitionTask, ShowPartitionsTask, implement the OnExecute() method.

  • Implement new interface in DBImpl class, handle partition for Insert/Query interface

Add new interfaces:

Status CreatePartition(const std::string& table_name, const std::string& partition_name, const std::string& tag);

Status DropPartition(const std::string& table_name, const std::string& partition_name);

Status ShowPartitions(const std::string& table_name, std::vector<meta::TableSchema>& partiton_schema_array);

      Handle partition for Insert interface:

Status InsertVectors(const std::string& table_name, const std::string& partition_tag, uint64_t n, const float* data, IDNumbers& vector_ids) {

    if(tag.empty()) {

        //normal table insert

    } else {

        std::string partition_name = meta_->GetPartitionName(table_name, tag);

        mem_mgr_->InsertVectors(partition_name, n, data, vector_ids);

    }

}

Handle partition for Query interface:

Status Query(const std::string& table_name, const std::string& partition_tag, uint64_t topk, uint64_t nq, uint64_t nprobe, const float* data, QueryResults& results) {

 meta::DatePartionedTableFilesSchema files;

std::vector<size_t> ids;

auto status = meta_ptr_->FilesToSearch(table_id, partition_tag, ids, dates, files);

// do search

}
  • Implement new interface in SqliteMetaImpl and MySQLMetaImpl class, handle partition for FilesToSearch interface

Add new interfaces:

Status CreatePartition(const std::string& table_name, const std::string& partition_name, const std::string& tag);

Status DropPartition(const std::string& table_name, const std::string& partition_name);

Status ShowPartitions(const std::string& table_name, std::vector<meta::TableSchema>& partiton_schema_array);

Status GetPartitionName(const std::string& table_name, const std::string& tag, std::string& partition_name);

Handle partition for FilesToSearch:

Status FilesToSearch(const std::string& table_id, const std::string& partition_tag, const std::vector<size_t>& ids, const DatesT& dates, DatePartionedTableFilesSchema& files) {

//step1: get files from parent table

//step2:select partitions from meta, get files from partitions

}
  • modify code of Scheduler to support safe-delete partition during searching or building index

    std::vector TaskCreator::Create(const SearchJobPtr& job) {

       //step1: check the type of files, if to_delete, then return nullptr and set job status
    
       //step2: get job status, return error message
    

    }

JinHai-CN added a commit that referenced this issue Nov 8, 2019
@yhmo
Copy link
Contributor Author

yhmo commented Nov 8, 2019

Already implemented in 0.6.0

@yhmo yhmo closed this as completed Nov 8, 2019
jaime0815 pushed a commit to jaime0815/milvus that referenced this issue Nov 18, 2022
yah01 pushed a commit to yah01/milvus that referenced this issue Feb 13, 2023
…ilvus-io#77)

* the protobuf rpms on pbone disappeared. switching to https download

* fix sdist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Issues or changes related to enhancement
Projects
None yet
Development

No branches or pull requests

1 participant