Separate workspace/allocations from graph

### Feature description

In the current state of browsermt/marian-dev, the concept of a workspace which manages allocation of tensors is placed behind a graph accessible to the library API bergamot-translator uses. This leads to a temporarily inefficient implementation of multiple-models handling (https://github.com/browsermt/bergamot-translator/pull/210), where the workspaces grow proportional to the number of models active. 

@XapaJIaMnu and @kpu have previously solved swapping multiple models by means of swapping tensors onto an active graph. This is "dynamic" and a reference implementation available at https://github.com/kpu/marian-dev/blob/dynamic_swap_mvp/src/translator/swappable.cpp. While this is doable in the case of shared-architectures without incurring much expense, a change in architecture involves reconstructing the graph (eg: tied embedding model swapped out for a non-tied embedding model). 

It is optimal to keep the concept of a workspace bound to threads/workers active instead, separate the graph and architecture aside to avoid the blow-up in memory usage than what is originally required.

This issue is intended to investigate how best to make the modifications to solve the above problem in this repository.

/cc @graemenail 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Separate workspace/allocations from graph #57

Feature description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Separate workspace/allocations from graph #57

Description

Feature description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions