Skip to content

deploy/runtime: use a background thread to run GC when interpreters aren't executing the forward pass #58

Open
@d4l3k

Description

@d4l3k

To optimize the forward pass latency it would be good to time GC to run in between model executions. This won't improve the QPS since the GC cost is the same amoratized but it would make the latency lower per batch.

import gc

gc.collect()

We should spin up a background thread that periodically iterates over all of the interpreter threads -- locks them between execution and runs the GC. It might also be worth it to explicitly disable GC on the individual interpreter threads so they won't run during the forward pass.

Context:

https://fb.workplace.com/notes/538119557964077/

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestruntimeC++ runtime / torch::deploy

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions