Skip to content

Separate workspace/allocations from graph #57

Open
@jerinphilip

Description

@jerinphilip

Feature description

In the current state of browsermt/marian-dev, the concept of a workspace which manages allocation of tensors is placed behind a graph accessible to the library API bergamot-translator uses. This leads to a temporarily inefficient implementation of multiple-models handling (browsermt/bergamot-translator#210), where the workspaces grow proportional to the number of models active.

@XapaJIaMnu and @kpu have previously solved swapping multiple models by means of swapping tensors onto an active graph. This is "dynamic" and a reference implementation available at https://github.com/kpu/marian-dev/blob/dynamic_swap_mvp/src/translator/swappable.cpp. While this is doable in the case of shared-architectures without incurring much expense, a change in architecture involves reconstructing the graph (eg: tied embedding model swapped out for a non-tied embedding model).

It is optimal to keep the concept of a workspace bound to threads/workers active instead, separate the graph and architecture aside to avoid the blow-up in memory usage than what is originally required.

This issue is intended to investigate how best to make the modifications to solve the above problem in this repository.

/cc @graemenail

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions