Add `BackendRouter` to handle multiple backends #2353

laggui · 2024-10-09T18:59:03Z

Needs more tests :)

Checklist

Confirmed that run-checks all script has been executed.
Made sure the book is up to date with changes in this PR.

Related Issues/PRs

Changes

Introduces a new BackendRouter responsible for forwarding tensor operations to the appropriate backend, given multiple backends.

This is achieved with the help of the intermediate representation defined for ReprBackend and the tensor/ops descriptions.

Testing

Modified the ag-news-train text classification example to run on cuda + wgpu. Also have a MWE:

use burn::{
    backend::{cuda_jit::CudaDevice, wgpu::WgpuDevice, CudaJit, Wgpu},
    tensor::Tensor,
};
use burn_router::{BackendRouter, ByteBridge, DirectChannel, MultiDevice2};

fn main() {
    type DirectByteChannel<Backends> = DirectChannel<Backends, ByteBridge<Backends>>;

    type DualBackend = BackendRouter<DirectByteChannel<(CudaJit, Wgpu)>>;

    let device2 = WgpuDevice::Cpu;
    let device1 = CudaDevice::new(0);

    // TODO: this is wack.. how to automatically implement From<B1::Device1> for MultiDevice2?
    let multi_device1 = MultiDevice2::Device1(device1);
    let multi_device2 = MultiDevice2::Device2(device2);
    let tensor1 = Tensor::<DualBackend, 1>::from_floats([1.0, 2.0, 3.0, 4.0], &multi_device1);
    let tensor2 = Tensor::<DualBackend, 1>::from_floats([5.0, 6.0, 7.0, 8.0], &multi_device2);

    println!("Tensor 1:\n{tensor1}");
    println!("Tensor 2:\n{tensor2}");

    let tensor1 = tensor1.to_device(&multi_device2);

    let output = tensor1.add(tensor2);

    println!("Result:\n{output}");
}

Tensor 1:
Tensor {
  data:
[1.0, 2.0, 3.0, 4.0],
  shape:  [4],
  device:  Device1(CudaDevice { index: 0 }),
  backend:  "router<direct<(fusion<jit<cuda>>, fusion<jit<wgpu>>)>>",
  kind:  "Float",
  dtype:  "f32",
}
Tensor 2:
Tensor {
  data:
[5.0, 6.0, 7.0, 8.0],
  shape:  [4],
  device:  Device2(Cpu),
  backend:  "router<direct<(fusion<jit<cuda>>, fusion<jit<wgpu>>)>>",
  kind:  "Float",
  dtype:  "f32",
}
Result:
Tensor {
  data:
[6.0, 8.0, 10.0, 12.0],
  shape:  [4],
  device:  Device2(Cpu),
  backend:  "router<direct<(fusion<jit<cuda>>, fusion<jit<wgpu>>)>>",
  kind:  "Float",
  dtype:  "f32",
}

… (WIP)

codecov · 2024-10-09T19:14:03Z

Codecov Report

Attention: Patch coverage is 90.61910% with 497 lines in your changes missing coverage. Please review.

Project coverage is 85.23%. Comparing base (604dbae) to head (67c7479).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/burn-router/src/ops/op_int.rs	92.17%	84 Missing ⚠️
crates/burn-router/src/runner.rs	92.00%	84 Missing ⚠️
crates/burn-router/src/bridge/byte.rs	29.91%	82 Missing ⚠️
crates/burn-router/src/ops/op_qfloat.rs	0.00%	54 Missing ⚠️
crates/burn-router/src/ops/op_bool.rs	89.09%	29 Missing ⚠️
crates/burn-router/src/channel/direct.rs	76.10%	27 Missing ⚠️
crates/burn-ndarray/src/backend.rs	50.00%	21 Missing ⚠️
crates/burn-router/src/backend.rs	66.10%	20 Missing ⚠️
crates/burn-fusion/src/backend.rs	24.00%	19 Missing ⚠️
crates/burn-router/src/client/base.rs	76.66%	14 Missing ⚠️
... and 10 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2353      +/-   ##
==========================================
+ Coverage   84.95%   85.23%   +0.28%     
==========================================
  Files         771      785      +14     
  Lines       98678   103756    +5078     
==========================================
+ Hits        83828    88437    +4609     
- Misses      14850    15319     +469

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

laggui · 2024-10-17T19:51:35Z

Stoopid windows CI. Wgpu as a test backend doesn't seem to work on windows (auto device doesn't detect anything)

laggui · 2024-10-18T13:04:26Z

I just disabled the ByteBridge tests for the windows CI. Seems like this is a known issue with wgpu because we are actively setting DISABLE_WGPU=1 for windows.

The BackendRouter implementation is pretty much complete. Just a couple things to do or improve:

Quantization ops (qtensor and q_* ops not implemented in this PR)
DirectChannel is only implemented with support for two backends at this time
- Improvements: maybe a macro to implement it for up to 4 backends (with appropriate types generated)

nathanielsimard and others added 22 commits October 9, 2024 14:21

WIP

0f61e67

it compiles

f0f46ef

WIP

a4acc54

Remove const D (w/ rebase) + WIP BackendRouter

458cf64

Add missing types from merge

0aeb0ad

Rework traits, types and add MultiBackendBridge & RunnerClientLocator…

b33fb50

… (WIP)

First draft ByteBridge to_backend(tensor, device)

2dd5405

Refactor into modules

d815569

Add mutex, fix types

8ffbfaf

Remove StreamId and implement ReprBackend for Fusion (WIP)

37b484a

float_add op working (w/o fusion)

0251283

Small cleanup

6ed84b5

Remove comment

afc5cd1

Cleanup

ea092d9

Fix fusion ReprBackend implementation (duhhh)

98e84ac

Add runner ops

ed232a4

More ops

916ec4f

Cleanup

d5a4199

Add name

c1cb364

Update Cargo.lock

af0f694

Undo fusion stream changes to common

3ccba7c

Clippy + cleanup

e8d430f

laggui added 7 commits October 9, 2024 15:18

Fix no-std

b5ddbbc

Deal with unused tensors

46cd852

Clippy baby

75dc5c9

Fix comment

b063b49

Fix tensor handle orphans management

a8a9d75

Implement runner read_tensor for other dtypes

ab384d7

Move backend router to its own crate

0452d2c

laggui added 13 commits October 15, 2024 15:30

Refactor repr quantized tensor handle

19199af

Fix typo

d4fe0da

Implement repr backend for ndarray

fb68ded

Add router tests w/ ndarray and wgpu backends (+ fix tests)

ba421dd

Add empty base operation and autodiff tests

81ec02a

Add precision bridge

032d26e

Remove dep from local changes

5cb96b3

Merge branch 'main' into feat/backend-server

29c9d35

Apply same mask_where broadcast fix

18ed162

Add float and int elem associative types (should match for each backend)

fc3dc48

Add simple byte bridge test

c5db4d6

Remove comment

3c775e0

Remove bridge types

f0db9d0

Screw you windows

5754a82

laggui marked this pull request as ready for review October 18, 2024 13:03

laggui requested a review from nathanielsimard October 18, 2024 13:03

laggui added 5 commits October 18, 2024 09:56

Remove dead code

1b4bc23

Add unreachable message

9722b07

Add seed

830deb3

Fix clippy

3df448e

Set the seed anytime a new client is initialized

67c7479

laggui merged commit eaf50e6 into main Oct 18, 2024
11 checks passed

laggui deleted the feat/backend-server branch October 18, 2024 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `BackendRouter` to handle multiple backends #2353

Add `BackendRouter` to handle multiple backends #2353

laggui commented Oct 9, 2024 •

edited

Loading

codecov bot commented Oct 9, 2024 •

edited

Loading

laggui commented Oct 17, 2024

laggui commented Oct 18, 2024 •

edited

Loading

Add BackendRouter to handle multiple backends #2353

Add BackendRouter to handle multiple backends #2353

Conversation

laggui commented Oct 9, 2024 • edited Loading

Checklist

Related Issues/PRs

Changes

Testing

codecov bot commented Oct 9, 2024 • edited Loading

Codecov Report

laggui commented Oct 17, 2024

laggui commented Oct 18, 2024 • edited Loading

Add `BackendRouter` to handle multiple backends #2353

Add `BackendRouter` to handle multiple backends #2353

laggui commented Oct 9, 2024 •

edited

Loading

codecov bot commented Oct 9, 2024 •

edited

Loading

laggui commented Oct 18, 2024 •

edited

Loading