Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[torchbench][accuracy] demucs accuracy check failed #459

Open
alexbaden opened this issue Feb 6, 2024 · 4 comments
Open

[torchbench][accuracy] demucs accuracy check failed #459

alexbaden opened this issue Feb 6, 2024 · 4 comments

Comments

@alexbaden
Copy link
Contributor

» benchmarks/dynamo/torchbench.py --float32 -dxpu -n10 --no-skip --dashboard --training --inductor --accuracy --output /tmp/torchbench.csv --filter demucs







loading model: 0it [00:05, ?it/s]
xpu  train demucs                             
WARNING:common:fp64 golden ref were not generated for demucs. Setting accuracy check to cosine
/localdisk/abaden/Projects/envs/triton-benchmark-env/lib/python3.10/site-packages/fbgemm_gpu/fbgemm_gpu_py.so: undefined symbol: _ZNK5torch8autograd4Node4nameEv
/localdisk/abaden/Projects/envs/triton-benchmark-env/lib/python3.10/site-packages/fbgemm_gpu/fbgemm_gpu_py.so: undefined symbol: _ZNK5torch8autograd4Node4nameEv
skipping cudagraphs for unknown reason
skipping cudagraphs for unknown reason
skipping cudagraphs for unknown reason
skipping cudagraphs for unknown reason
[2024-02-05 21:43:53,107] torch._dynamo.utils: [WARNING] Similarity score=0.00015774701023474336
fail_accuray
@whitneywhtsang
Copy link
Contributor

Also fail with v2.1.

@vlad-penkin vlad-penkin changed the title [torchbench] demucs accuracy check failed [torchbench][accuracy] demucs accuracy check failed Feb 9, 2024
@vlad-penkin vlad-penkin added this to the E2E pass rate milestone Feb 9, 2024
@ienkovich
Copy link
Contributor

ienkovich commented Feb 13, 2024

demucs fails only in training and it happens due to random numbers usage in training. Training passes on CPU but fails on XPU in both eager and inductor modes. Looks like this happens because RNG state is not reset properly for XPU between model runs.

There are at least two places where torch.manual_seed is replaced with an implementation that is not XPU-enabled:
https://github.com/weishi-deng/benchmark/blob/main/torchbenchmark/util/env_check.py#L133
https://github.com/weishi-deng/benchmark/blob/main/userbenchmark/dynamo/dynamobench/common.py#L329
Fixing it lets demucs pass on XPU in eager mode.

Here is a patch I use

diff --git a/torchbenchmark/util/env_check.py b/torchbenchmark/util/env_check.py
index 956fdb4f..bef4924b 100644
--- a/torchbenchmark/util/env_check.py
+++ b/torchbenchmark/util/env_check.py
@@ -125,6 +125,8 @@ def set_random_seed():

         if not torch.cuda._is_in_bad_fork():
             torch.cuda.manual_seed_all(seed)
+        if hasattr(torch, 'xpu') and not torch.xpu._is_in_bad_fork():
+            torch.xpu.manual_seed_all(seed)
         return default_generator.manual_seed(seed)

     torch.manual_seed(MAIN_RANDOM_SEED)
diff --git a/userbenchmark/dynamo/dynamobench/common.py b/userbenchmark/dynamo/dynamobench/common.py
index 831dfe06..1bf14e96 100644
--- a/userbenchmark/dynamo/dynamobench/common.py
+++ b/userbenchmark/dynamo/dynamobench/common.py
@@ -320,10 +320,17 @@ def patch_torch_manual_seed():
         from torch._C import default_generator

         seed = 1337
-        import torch.cuda

-        if not torch.cuda._is_in_bad_fork():
-            torch.cuda.manual_seed_all(seed)
+        try:
+            import intel_extension_for_pytorch
+
+            if torch.xpu.is_available() and not torch.xpu._is_in_bad_fork():
+                torch.xpu.manual_seed_all(seed)
+        except:
+            import torch.cuda
+
+            if torch.cuda.is_available() and not torch.cuda._is_in_bad_fork():
+                torch.cuda.manual_seed_all(seed)
         return default_generator.manual_seed(seed)

     torch.manual_seed = deterministic_torch_manual_seed

@ienkovich
Copy link
Contributor

When we run the benchmark with Inductor, random number generation goes through Triton kernel, and the seed used for this kernel is generated using aten.randint operation. So, I believe eager mode and inductor would always produce different random number sequences because they would use different seeds and generation algorithms. It doesn't look like different backends have to be aligned in random number generation, so their results are simply incomparable.

Inductor kernel where the seed is generated and put to buf0 passed later to the Triton kernel:

def call(args):
    buf0 = empty_strided((1, ), (1, ), device='xpu', dtype=torch.int64)
    # Source Nodes: [], Original ATen: []
    aten.randint.low_out(-9223372036854775808, 9223372036854775807, [1], out=buf0)
    buf1 = empty_strided((4, 4, 1, 1), (4, 1, 1, 1), device='xpu', dtype=torch.int64)
    # Source Nodes: [], Original ATen: []
    stream0 = get_xpu_stream(0)
    triton_poi_fused_0.run(buf0, buf1, 0, 16, grid=grid(16), stream=stream0)
    return (buf1, )

Triton kernel where the seed (tmp0) is used for randint64:

def triton_(in_ptr0, out_ptr0, load_seed_offset, xnumel, XBLOCK : tl.constexpr):
    xnumel = 16
    xoffset = tl.program_id(0) * XBLOCK
    xindex = xoffset + tl.arange(0, XBLOCK)[:]
    xmask = xindex < xnumel
    x0 = xindex
    tmp0 = tl.load(in_ptr0 + load_seed_offset)
    tmp1 = x0
    tmp2 = triton_helpers.randint64(tmp0, (tmp1).to(tl.uint32), 0, 44100)
    tl.store(out_ptr0 + (x0), tmp2, xmask)

@vlad-penkin
Copy link
Contributor

This issue is still reproduceable.

Env:

  • pytorch is built from source, top of the main trunk, commit_id - 9a8ab778d34bd24c5caceb340837483decc4c311
  • triton xpu is built from source, top of the main trunk, commit_id - fe93a00ffe438e9ba8c8392c0b051b1662c810de
  • benchmark is built from source, top of the main trunk, commit_id - d54ca9f80ead108c8797441681e219becaf963d8
  • torchaudio is built from source, top of the main trunk, commit_id - 1980f8af5bcd0bb2ce51965cf79d8d4c25dad8a0
  • torchvision is built from source, top of the main trunk, commit_id - 10239873229e527f8b7e7b3340c40ee38bb1cfc4
  • PyTorch Dependency Bundle 0.5.0
  • Latest Rolling Driver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants