VectorDB hosted solution takes a lot of time to push vectors

I tried to make use of vectordb's hosted provision from jina ai, using commands mentioned in the docs

```python
from docarray import DocList, BaseDoc
from docarray.typing import NdArray
from vectordb import HNSWVectorDB
import time
import glob

class LogoDoc(BaseDoc):
        embedding: NdArray[768]
        id: str

db = HNSWVectorDB[LogoDoc](
     workspace="hnsw_vectordb",
     space = "ip",
     max_elements = 2700000,
     ef_construction = 256,
     M = 16,
     num_threads = 8
)

if __name__=="__main__" :
	with db.serve() as service :
		service.block()
```

and tried to push my vectors using the client interface

I have a collection 2.5M 768 dimensional vectors to be stored in the db, so I decided to make batched calls of  db.index method with 64k vectors in each call. The code didnt respond to the same, so i tried to change the batch size to 2, the code was able to index at a speed of 5 s/it and the estimated time taken was 27 hours. ( I assume this is happening since the tree construction is happening during each index call)

It would be nice if we could speedup the process by asking the user to push all the documents at first and then perform tree construction upon another specific api call

```
db.push_documents([doc1 , doc2, doc3, ...])
db.build_tree()
```

which could replace the 
```
db.index()
```

and during the build process we could easily block the crud operations with a `is_building_tree` flag and throw an error named TreeCurrentlyBuildingError() when crud operations are being performed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VectorDB hosted solution takes a lot of time to push vectors #51

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development