-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlearned_equi
34 lines (21 loc) · 3.51 KB
/
learned_equi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Making a learned optimizer follow a rule without understanding its inner goal
Last time we (maybe?) summoned a mesaoptimizer and found weird behaviour: it wants to warp its environment to look like its training set. This time, we are seeking good behaviour.
We're going to summon a mesa-optimizer that does continuing computation with a fairly specific constraint: the state it preserves between iterations of its optimization algorithm is in one to one correspondence to
its final output. Under this constraint, we have conditioning data x, a neural network y such that y_n+1 = f(x, y_n), and an outer goal L(x, y). This is a very restrictive constraint: it's not true at all of human brains or chess engines. It's also clearly relevant since (non-o1) LLMs and diffusion models have this constraint and do great work.
Under this constraint (and possibly more generally), a trick about mesaoptimizers is that whatever their inner goal, they have the property that at convergence Expected Value(f(x)) = Expected Value(x), i.e., if you run their algorithm for enough iterations that
the score stops improving with more iterations of the algorithm, A) the score won't improve with further iterations of the algorithm, B) a strategy where score goes up and reaches a maximum then goes back down is dominated by a strategy
that reaches the plateu and then stays there. (LLMs under reinforcement learning achieve this by outputting stop tokens, for example)
We'll then call the limit of iterating f many times F.
This property gives access to a way to neatly restrict the final behaviour.
IMHO, this is easiest to do if the constraint you want to put on the network is equivariance to a group.
We're going to be registering, or warping into correspondence, MNIST 9 digits. The network is the same, an MLP that takes in digits and outputs a displacement at each pixel. The condition x will be This time, we've got a slightly harder task: the digits are randomly rotated. It is
really hard to do this unsupervised: go ahead and try and write a loss that requests this, but I promise the network is just going to squish the stem into the loop and stretch out a new stem instead of rotating the 9.
Larger structure: W
Instead of trying to specify a loss to achieve this, we're going to specify a pretty simple outer loss: we want the images to correspond, and we'll grow a mesa-optimizer with a smarter loss than that. To do this, we
In order to make our algorithm correctly align rotated digits _restrict_ the mesaoptimizer by making it equivariant as follows: For rotations R, Q: F((a, b)) = R^{-1} \circ F((a \circ R, b \circ Q)) \circ Q. This is a sense of "fairness" over rotations or blindness to rottions:
our
Now, a restricted version of this is actually pretty easy: if we wanted For rotations R: F((a, b)) = R^{-1} \circ F((a \circ R, b \circ R)) \circ R (i.e. both input images are rotated the same, then network output doesn't change)
then we can achieve this via a local property of the weights: trick involving spherical harmonics. An ongoing lesson in neural networks is that if an architecture _can_ achieve a given property via a local weight property (such as rotational equviariance
then it will achieve that property, so you don't actually have to go to the effort of implementing spherical harmonics, you can just rotationally augment the dataset.) However, the full equivariance is much harder (as you likely
found when trying to design a loss earlier) because it requires global properties.
Our mesaoptimizer trick turns a local property into a global property