Skip to content

[FEA] Establish "streaming batched build" example for CAGRA #1146

@cjnolet

Description

@cjnolet

Oracle has asked that we overlap batches as much as possible to support the ability to stream batches into an API so that we can start building a large CAGRA graph on the batches prior to the last batch being received. The idea is that we immediately start building so that we don't have to wait for the last batch to arrive.

We need to find a good API to enable this option without having to provide yet another completely new interface and code path.

One idea is that we once again provide a separate optimize() function in cuvs::neighbors::cagra::helpers so that we can build the all-neighbors graph in a streaming fashion (using a variant of our new cuvs::neighbors::all_neighbors APIs) and then run that graph through the optimization process.

This, of course, assumes the optimization process doesn't end up becoming the bottleneck. If it does, we might need to separate that our further.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions