diff --git a/docs/KataGoMethods.md b/docs/KataGoMethods.md index f55c518f7..07a872dd5 100644 --- a/docs/KataGoMethods.md +++ b/docs/KataGoMethods.md @@ -328,7 +328,7 @@ The variance accumulates to be larger than 1 due to the summations with skip con For any series of blocks in a stack, such as the main trunk, since each block adds an output of variance 1, the variance of the trunk increments by 1 with each block. So each successive block that reads from that trunk needs to set K for its first normalization layer to the inverse sqrt of that incrementing variance: -