Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TODO for the next release with Stockfish 14 features #8

Closed
24 tasks done
hi-ogawa opened this issue Jul 27, 2021 · 5 comments
Closed
24 tasks done

TODO for the next release with Stockfish 14 features #8

hi-ogawa opened this issue Jul 27, 2021 · 5 comments

Comments

@hi-ogawa
Copy link
Owner

hi-ogawa commented Jul 27, 2021

@hi-ogawa
Copy link
Owner Author

hi-ogawa commented Jul 27, 2021

I upgraded my local version of emscripten and fixed small errors/warnings in this commit 728b862.

EDIT:
Stockfish commit official-stockfish@f90274d8 is the one just before the new net architecture commit official-stockfish@e8d64af1.

@hi-ogawa
Copy link
Owner Author

hi-ogawa commented Jul 27, 2021

Having some difficulty adopting changes from new net architecture commit official-stockfish@e8d64af1. Build succeeds but evaluation doesn't match with official build. Specifically, ARCH=wasm wasm_simd_post_mvp=yes and ARCH=wasm wasm_simd=yes are incorrect, but ARCH=wasm is working fine.

I will list some debugging plans here:

  • verify weights are read correctly by writing out and compare to the original
  • compare each layer's output values
  • remove #if defined(USE_WASM_SIMD) from certain layer file and identify which wasm simd kernel has issue.

EDIT:
It seems affine_transform.h with InputDimensions != PaddedInputDimensions has some issue as it's verified that the code below works

    const OutputType* propagate(
        const TransformedFeatureType* transformedFeatures, char* buffer) const {
      const auto input = previousLayer.propagate(
          transformedFeatures, buffer + SelfBufferSize);

#if defined(USE_WASM_SIMD)
      if constexpr (InputDimensions == PaddedInputDimensions) {  // <<== Use wasm simd kernel only on this condition
        // Simplify variable names (y = Ax + b)
        static_assert(InputDimensions % 16 == 0);
        constexpr int n = InputDimensions;
        constexpr int m = OutputDimensions;
        constexpr int n_stride = PaddedInputDimensions;
        auto A = *reinterpret_cast<const int8_t(*)[m][n_stride]>(weights);
        auto x = *reinterpret_cast<const uint8_t(*)[n]>(input);
        auto b = *reinterpret_cast<const int32_t(*)[m]>(biases);
        auto y = *reinterpret_cast<int32_t(*)[m]>(buffer);
        emscripten_wasm_simd::affine<n, m, n_stride>(A, x, b, y);
        return y;
      }
#endif
    ....

Maybe overflow? Maybe SIMD alignment issue?


EDIT2:

I found a bug in dot4 in "wasm_simd.cpp" and resolved the issue.

// BEFORE
        z0 = wasm_i32x4_add(z0, dot_i8x16(wasm_v128_load(&a[j + 0 * n]), xv));
        z1 = wasm_i32x4_add(z1, dot_i8x16(wasm_v128_load(&a[j + 1 * n]), xv));
        z2 = wasm_i32x4_add(z2, dot_i8x16(wasm_v128_load(&a[j + 2 * n]), xv));
        z3 = wasm_i32x4_add(z3, dot_i8x16(wasm_v128_load(&a[j + 3 * n]), xv));

// AFTER
        z0 = wasm_i32x4_add(z0, dot_i8x16(wasm_v128_load(&a[j + 0 * n_stride]), xv));
        z1 = wasm_i32x4_add(z1, dot_i8x16(wasm_v128_load(&a[j + 1 * n_stride]), xv));
        z2 = wasm_i32x4_add(z2, dot_i8x16(wasm_v128_load(&a[j + 2 * n_stride]), xv));
        z3 = wasm_i32x4_add(z3, dot_i8x16(wasm_v128_load(&a[j + 3 * n_stride]), xv));

@hi-ogawa
Copy link
Owner Author

hi-ogawa commented Jul 28, 2021

I finished resolving conflicts and rebased single commit on the latest upstream.

Thanks to Sopel for making strided affine implementation https://github.com/hi-ogawa/Stockfish/pull/7/files#diff-599f7ecb14d235a2360be139f960404f3849d9f1a320749969a3a8a71e58f3b6.

Currently test is done only by comparing "Node searched" statistics from bench 16 1 25 current depth NNUE.

# build ARCH=x86-64-avx2
$ ./src/stockfish bench 16 1 25 current depth NNUE
Total time (ms) : 13303
Nodes searched  : 11562893
Nodes/second    : 869194

# emscripten_build ARCH=wasm wasm_simd_post_mvp=yes
$ node -e 'console.log(`node: ${process.version}\nv8: ${process.versions.v8}`)'
node: v16.4.2
v8: 9.1.269.36-node.14
$ node --experimental-wasm-{simd,threads} src/emscripten/public/uci.js bench 16 1 25 current depth NNUE
Total time (ms) : 28829
Nodes searched  : 11562893
Nodes/second    : 401085

EDIT:

One thing a bit too concerning is that initial loading time got worse

$ time node --experimental-wasm-{simd,threads} src/emscripten/public/uci.js quit 
Stockfish 280721 by the Stockfish developers (see AUTHORS file)

real    0m2,546s
user    0m4,477s
sys     0m0,385s

comparing to the one from the issue #1

$ time /usr/bin/node --experimental-wasm-{simd,threads} --wasm-simd-post-mvp src/emscripten/public/uci.js uci > /dev/null

real    0m1.289s
user    0m3.346s
sys     0m0.295s

Possibly, emscripten upgrade might be the reason for this and not the doubling of embedded weights.

@hi-ogawa
Copy link
Owner Author

hi-ogawa commented Jul 28, 2021

Regarding initial loading time, it turns out it depends on v8's wasm compiler strategy.

  • Time command

EDIT: --wasm-lazy-compilation's start up looks slow from below "time" result, but actually it seems it's taking time when exiting node process. So, actual responsiveness are more-or-less same as --no-wasm-tier-up mode.

# TurboFan only (default on node?)
$ time node --liftoff --wasm-tier-up --experimental-wasm-{simd,threads} src/emscripten/public/uci.js quit > /dev/null

real    0m2,497s
user    0m4,479s
sys     0m0,386s

# Liftoff only
$ time node --liftoff --no-wasm-tier-up --experimental-wasm-{simd,threads} src/emscripten/public/uci.js quit > /dev/null

real    0m0,562s
user    0m0,517s
sys     0m0,133s

# Lazy tier up (default on browser?)
$ time node --wasm-lazy-compilation --experimental-wasm-{simd,threads} src/emscripten/public/uci.js quit > /dev/null

real    0m2,210s
user    0m2,781s
sys     0m0,304s
  • Bench statistics
$ node --liftoff --wasm-tier-up --experimental-wasm-{simd,threads} src/emscripten/public/uci.js bench 16 1 25 current depth NNUE
Total time (ms) : 29388
Nodes searched  : 11562893
Nodes/second    : 393456

$ node --liftoff --no-wasm-tier-up --experimental-wasm-{simd,threads} src/emscripten/public/uci.js bench 16 1 25 current depth NNUE
Total time (ms) : 50598
Nodes searched  : 11562893
Nodes/second    : 228524

$ node --wasm-lazy-compilation --experimental-wasm-{simd,threads} src/emscripten/public/uci.js bench 16 1 25 current depth NNUE
Total time (ms) : 28169
Nodes searched  : 11562893
Nodes/second    : 410482

References

@hi-ogawa hi-ogawa changed the title TODO TODO for the next release with Stockfish 14 features Jul 30, 2021
@hi-ogawa
Copy link
Owner Author

Except the problem about the initial loading speed (which is still tracked by the issue #1), applying the changes from Stockfish 14 seems went well.
I switched the default branch to emscripten-237ed1ef-2.0.26 and published a new package as 1.0.0-alpha.2, so I'll close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant