Skip to content

Ray Executor #488

Open
Open
@alxmrs

Description

@alxmrs

In addition to accelerator support (e.g. via #304), Cubed could benefit ML users by providing ray executor: https://docs.ray.io/en/latest/ray-core/walkthrough.html

Since Cubed is a serverless model, I bet it could get away with only using Tasks/remote functions.

From talking with @cromwellian a bit, my hope is that Cubed could provide memory bounds when trying to saturate GPUs during model training. I'm not totally sure exactly what a training loop with Cubed would look like. Here's how ray integrates with PyTorch, for example: https://docs.ray.io/en/latest/train/api/doc/ray.train.torch.TorchTrainer.html#ray.train.torch.TorchTrainer

@shoyer pointed out to me once the idea that GPU OOM errors occur while taking the gradient of a function graph, not necessarily on the forward pass. I'm not totally sure right now if Cubed is in fact a good fit for tackling this problem, only that the potential is exciting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions