Skip to content

Commit 18f7707

Browse files
sophie-zhaogopherbot
authored andcommitted
salsa20: add loong64 SIMD implementation
The performance gains on Loongson 3A6000 and 3A5000 are as follows: goos: linux goarch: loong64 pkg: golang.org/x/crypto/salsa20 cpu: Loongson-3A6000-HV @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | XOR1K 3175.0n ± 0% 435.4n ± 0% -86.29% (p=0.000 n=20) | bench.old | bench.new | | B/s | B/s vs base | XOR1K 307.6Mi ± 0% 2242.7Mi ± 0% +629.13% (p=0.000 n=20) goos: linux goarch: loong64 pkg: golang.org/x/crypto/salsa20 cpu: Loongson-3A5000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | XOR1K 4125.0n ± 0% 864.0n ± 0% -79.05% (p=0.000 n=20) | bench.old | bench.new | | B/s | B/s vs base | XOR1K 236.7Mi ± 0% 1130.3Mi ± 0% +377.41% (p=0.000 n=20) Change-Id: Ib37f603e6654f1e3837985fad4b6dee10b5af993 Reviewed-on: https://go-review.googlesource.com/c/crypto/+/663375 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Carlos Amedee <carlos@golang.org>
1 parent 2ebaafc commit 18f7707

File tree

4 files changed

+513
-2
lines changed

4 files changed

+513
-2
lines changed

salsa20/salsa/salsa20_loong64.go

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
// Copyright 2025 The Go Authors. All rights reserved.
2+
// Use of this source code is governed by a BSD-style
3+
// license that can be found in the LICENSE file.
4+
5+
//go:build loong64 && !purego && gc
6+
7+
package salsa
8+
9+
import "golang.org/x/sys/cpu"
10+
11+
// XORKeyStreamVX is implemented in salsa20_loong64.s.
12+
//
13+
//go:noescape
14+
func XORKeyStreamVX(out, in *byte, n uint64, nonce, key *byte)
15+
16+
// XORKeyStream crypts bytes from in to out using the given key and counters.
17+
// In and out must overlap entirely or not at all. Counter
18+
// contains the raw salsa20 counter bytes (both nonce and block counter).
19+
func XORKeyStream(out, in []byte, counter *[16]byte, key *[32]byte) {
20+
if len(in) == 0 {
21+
return
22+
}
23+
_ = out[len(in)-1]
24+
if cpu.Loong64.HasLSX {
25+
XORKeyStreamVX(&out[0], &in[0], uint64(len(in)), &counter[0], &key[0])
26+
} else {
27+
genericXORKeyStream(out, in, counter, key)
28+
}
29+
}

0 commit comments

Comments
 (0)