diff --git a/docs/KataGoMethods.md b/docs/KataGoMethods.md index f55c518f7..07a872dd5 100644 --- a/docs/KataGoMethods.md +++ b/docs/KataGoMethods.md @@ -328,7 +328,7 @@ The variance accumulates to be larger than 1 due to the summations with skip con For any series of blocks in a stack, such as the main trunk, since each block adds an output of variance 1, the variance of the trunk increments by 1 with each block. So each successive block that reads from that trunk needs to set K for its first normalization layer to the inverse sqrt of that incrementing variance: - + These are all consequences of the rule that every K is set so that it normalizes the idealized variance back to 1. By itself, this appears to work at least as well in KataGo as Fixup, but is a more general rule, so can be applied to more complex architectures that Fixup doesn't describe how to handle, such as the above nested residual block.