Description
Go version
go version go1.21.6 linux/arm64
Output of go env
in your module/workspace:
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='linux'
GOINSECURE=''
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPRIVATE=''
GOSUMDB='off'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOVCS=''
GOVERSION='go1.21.6'
GCCGO='gccgo'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3484327849=/tmp/go-build -gno-record-gcc-switches'
What did you do?
As shown in the following example,test.go
is compiled into wasm:
test.go :
//go:noinline
func testFor() int {
sum := 0
for i := 0; i < 200; i++ {
for j := 0; j < 10000000; j++ {
sum += j
}
}
return sum
}
func main() {
startTime := time.Now()
res := testFor()
elapsed := time.Since(startTime)
fmt.Println("Done. Cost: ", elapsed, res)
}
go to wasm compile command:
GOOS=wasip1 GOARCH=wasm go build -o test.wasm test.go
What did you see happen?
As you can see in the wat code, the for
loop is expressed using the br_table
operation.
wat code:
(func $main.testFor (type 0) (param i32) (result i32)
(local i32 i64 i64 i64 i64)
global.get 0
local.set 1
loop ;; label = @1
block ;; label = @2
block ;; label = @3
block ;; label = @4
block ;; label = @5
block ;; label = @6
block ;; label = @7
block ;; label = @8
block ;; label = @9
local.get 0
br_table 0 (;@9;) 1 (;@8;) 2 (;@7;) 3 (;@6;) 4 (;@5;) 5 (;@4;) 6 (;@3;) 7 (;@2;)
end
i64.const 0
local.set 2
i64.const 0
local.set 3
i32.const 2
local.set 0
br 7 (;@1;)
end
local.get 2
i64.const 1
i64.add
local.set 2
end
local.get 2
i64.const 200
i64.lt_s
i32.eqz
if ;; label = @7
i32.const 4
local.set 0
br 6 (;@1;)
end
end
i64.const 0
local.set 4
i32.const 6
local.set 0
br 4 (;@1;)
end
local.get 1
i64.extend_i32_u
i64.const 8
i64.add
i32.wrap_i64
local.get 3
i64.store
local.get 1
i32.const 8
i32.add
local.tee 1
global.set 0
i32.const 0
return
end
local.get 4
i64.const 1
i64.add
local.set 5
local.get 3
local.get 4
i64.add
local.set 3
local.get 5
local.set 4
end
local.get 4
i64.const 10000000
i64.lt_s
if ;; label = @3
i32.const 5
local.set 0
br 2 (;@1;)
end
i32.const 1
local.set 0
br 1 (;@1;)
end
end
unreachable)
What did you expect to see?
When the aot compiler of wasm runtime performs backend optimization, it is difficult to identify the br_table
as a for
loop. So, during the backend optimization, this for
loop was not optimized.
I tested several of the most popular wasm runtimes, such as wasmtime, wamr, and wasmer, and I found that the performance of go wasm after aot compilation was very poor, and the runtime performance was only 20% of go native in the best case.
Why use so many br_table
operation instead of loop
operation? Will the performance of go wasm be optimized in the future?
Also, I found that the wat code of the go runtime functions uses br_table a lot,the craziest function has 417 hops in the br_table.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status