Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Modify the collection schema once collection is created and is not empty #20405

Open
1 task done
jeet129 opened this issue Nov 8, 2022 · 11 comments
Open
1 task done
Assignees
Labels
kind/feature Issues related to feature request from users
Milestone

Comments

@jeet129
Copy link

jeet129 commented Nov 8, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

Many a times, when we start with an ANN collection definition we don't know the exhaustive list of fields which should be available for the use case and we create the collection with few known fields in the collection schema and as the application evolves there is a need to add/modify the schema defined earlier to accommodate more attributes.

Without this, the only way is to recreate a collection and do a fresh ingestion of data, which might not be an easy choice considering the longer data ingestion pipeline for huge collections.

Describe the solution you'd like.

We need a way to add new fields(non-mandatory/fields with default values)/drop existing(non-primary) fields from collection.
This way the same collection can be used to serve the different scenarios pertaining to a use case without a need to create a new collection and hydrating it with the data.

Also there should be an option to update the values for such attributes for existing entities.

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

@jeet129 jeet129 added the kind/feature Issues related to feature request from users label Nov 8, 2022
@zyy20191
Copy link

The ability to add field to an already created collection is really convenient, and I hope you can consider this requirement

@xiaofan-luan xiaofan-luan added this to the 2.3 milestone Dec 9, 2022
@xiaofan-luan
Copy link
Collaborator

Let's keep it.
Agree this is very useful feature.
But this require a lot effort so I think if anyone has time pls take it. Otherwise we will wait for performance/stability issue solved and we start to work on it

@sskserk
Copy link

sskserk commented Sep 4, 2024

Hey Milvus Developers and Community,

Would it be feasible to implement a feature that includes basic routines for renaming, adding, or dropping columns? Even a simple set of these functions could significantly enhance our capabilities.

The use case is straightforward but has a substantial impact:

  • We have a large collection containing many thousands of records.
  • A considerable amount of time has been invested in calculating the stored embeddings.
  • A new release of the functional logic now necessitates the inclusion of an additional field.

A major challenge we face is determining how to properly migrate the data.

Currently, we are compelled to recreate the entire collection from scratch whenever an additional or modified field is needed. Given the vast amounts of data involved, this process is exceedingly challenging.

Providing a command-line tool that could handle these modifications would offer significant relief and improve our efficiency.

I also do suppose that physically modification of an existing collection can be practically an impossible task. It might require changes of the vector's data which is a computational challenge.

P/S: Would be happy to cooperate with somebody or assist with a corresponding MR.

@xiaofan-luan
Copy link
Collaborator

this is for sure already on our roadmap.

@tedxu and @smellthemoon is actually working on it so hopefully that would help..

@smellthemoon could you please followup with @sskserk and see how it can work with our latest modify schema feature

@sskserk
Copy link

sskserk commented Sep 5, 2024

@xiaofan-luan , @tedxu , @smellthemoon,

I am eager to test the new feature and am looking forward to receiving it. I'm ready to test a prerelease of this feature, just need to know when.

The implementation of this feature will undoubtedly mark a significant milestone. I anticipate that, as a result, a new Milvus-related product similar to "Flyway" might emerge in the future.

Your solution is already widely adopted by major companies, and this enhancement will further solidify its enterprise-grade capabilities.

Thank you for the positive update!

@xiaofan-luan
Copy link
Collaborator

@xiaofan-luan , @tedxu , @smellthemoon,

I am eager to test the new feature and am looking forward to receiving it. I'm ready to test a prerelease of this feature, just need to know when.

The implementation of this feature will undoubtedly mark a significant milestone. I anticipate that, as a result, a new Milvus-related product similar to "Flyway" might emerge in the future.

Your solution is already widely adopted by major companies, and this enhancement will further solidify its enterprise-grade capabilities.

Thank you for the positive update!

could, let's ship it

@smellthemoon
Copy link
Contributor

smellthemoon commented Sep 6, 2024

In fact, the add field feature has been included in our development plan. Users can add a new column through add field operation. The values in this new column are all null values. After the add field operation is completed, the field data in insert/upsert request needs to has the data of the new column. I will keep you updated if there is any progress. @sskserk

@smellthemoon
Copy link
Contributor

/assign

@yanliang567 yanliang567 modified the milestones: 2.3, 2.5.0 Sep 29, 2024
@xiaofan-luan xiaofan-luan modified the milestones: 2.5.0, 3.0 Oct 28, 2024
@xiaofan-luan
Copy link
Collaborator

On 2.0 we support null/default value.
The target for 3.0 is to support schema change.

@iamkhalidbashir
Copy link

is there any null value for embeddings field?
js lib if we pass null for an embedding field, we get this error

Error processing PDF: TypeError: Cannot read properties of null (reading 'length')
    at Function.concat (node:buffer:589:19)
    at /app/node_modules/@zilliz/milvus2-sdk-node/dist/milvus/grpc/Data.js:241:47
    at Array.map (<anonymous>)
    at MilvusClient.<anonymous> (/app/node_modules/@zilliz/milvus2-sdk-node/dist/milvus/grpc/Data.js:218:64)
    at Generator.next (<anonymous>)
    at fulfilled (/app/node_modules/@zilliz/milvus2-sdk-node/dist/milvus/grpc/Data.js:5:58)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

@xiaofan-luan
Copy link
Collaborator

is there any null value for embeddings field? js lib if we pass null for an embedding field, we get this error

Error processing PDF: TypeError: Cannot read properties of null (reading 'length')
    at Function.concat (node:buffer:589:19)
    at /app/node_modules/@zilliz/milvus2-sdk-node/dist/milvus/grpc/Data.js:241:47
    at Array.map (<anonymous>)
    at MilvusClient.<anonymous> (/app/node_modules/@zilliz/milvus2-sdk-node/dist/milvus/grpc/Data.js:218:64)
    at Generator.next (<anonymous>)
    at fulfilled (/app/node_modules/@zilliz/milvus2-sdk-node/dist/milvus/grpc/Data.js:5:58)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

embeddings can not be null.

for data field, it can be null only if nullable enabled after milvus 2.5

@smellthemoon
Do we support alter a non-nullable field to nullable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Issues related to feature request from users
Projects
None yet
Development

No branches or pull requests

7 participants