Switch to 'Reduced-Information Program Learning'

openai · Jul 27, 2016 · 838a3bc · 838a3bc
1 parent b00e00b
commit 838a3bc
Showing 1 changed file with 7 additions and 12 deletions.
diff --git a/_requests_for_research/infinite-symbolic-generalization.html b/_requests_for_research/infinite-symbolic-generalization.html
@@ -1,24 +1,19 @@
 ---
-title: 'Infinite Symbolic Generalization'
+title: 'Reduced-Information Program Learning'
 summary: ''
 difficulty: 2 # out of 3
 ---
 
-<p>Many sequence prediction tasks involve uncertainty. Even the sequence <code>abababab...</code> may not repeat the same pattern forever. However, some sequences - algorithmic tasks - should generalize for an unbounded amount of time. Examples of simple algorithmic tasks are <a href="https://gym.openai.com/envs/RepeatCopy-v0">RepeatCopy</a> and <a href="https://gym.openai.com/envs/ReversedAddition-v0">ReversedAddition</a>. The goal is to develop models that can learn algorithms, such as the <a href="http://www-personal.umich.edu/~reedscot/iclr_project.html">Neural Programmer-Interpreter</a> (NPI), and successfully apply the learned algorithms over sequences much longer than those used to train the model. For example, if the model were trained on pairs of numbers of length up to 10 on the ReversedAddition task, it should be add pairs of numbers that are 100 digits long, or even 1000 digits long.</p>
-
-<p>For this challenge the NPI can be taken as a baseline, including programming embeddings, primitive actions and execution traces. The three tasks that are to be performed by the model with a single set of weights (but interchangeable <em>encoders</em>) are addition, sorting and canonicalizing 3D models. The former two tasks make use of a "scratch pad" and pointers, whilst the latter uses a set of <a href="http://ttic.uchicago.edu/~fidler/projects/CAD.html">CAD models</a>. The goal is to reach the same performance (per sequence % accuracy) as the multi-task NPI, with 64 sequence samples per task. Is it then possible to get 100% accuracy using the same training and evaluation criteria as in the paper?</p>
-
-<p>With the above as a baseline, the following challenges can be set:</p>
-<ul>
-  <li>Achieve the same results with less training samples: 32, 16, 8. The NPI appears to almost converge to a perfect solution for sorting after only 8 samples.</li>
-  <li>Achieve the same results with shorter training data: 8-digit numbers for addition, length-4 arrays for sorting, and up to 3-step trajectories for canonicalization. The length of the evaluation sequences should remain the same (10-digit, length-5 and 4-step).</li>
-  <li>Achieve the same results with longer evaluation data (by an order of magnitude): 100-digits numbers, length-50 arrays, 30-step trajectories (note that 3-step trajectories should be used for training). In order to cope with larger trajectories the azimuth range should be in (-90&deg;...90&deg;) and the elevation range should be in (-30&deg;...90&deg;), with increments of 5&deg;.</li>
-</ul>
+<p>A difficult machine learning task is that of program, or algorithm, learning. An algorithm is a set of rules which can be used to perform a mapping from a set of inputs to a set of outputs. Examples of simple algorithmic tasks are <a href="https://gym.openai.com/envs/RepeatCopy-v0">RepeatCopy</a> and <a href="https://gym.openai.com/envs/ReversedAddition-v0">ReversedAddition</a>. Theoretically, recurrent neural networks (RNNs) are Turing-complete, which means that they can model any computable function. In practice, however, it has been difficult to <em>learn</em> algorithms.</p>
+
+<p>A recent success is the <a href="http://www-personal.umich.edu/~reedscot/iclr_project.html">Neural Programmer-Interpreter</a> (NPI), which uses a strong supervision signal in the form of execution traces. This is opposed to the notion of <em>program induction</em>, where programs must be inferred from input-output pairs. Using strong supervision, the NPI is able to exhibit <em>strong generalisation</em> on long inputs, unlike the sequence-to-sequence RNN baseline. Another key to the success of the NPI is its task-independent RNN core, which is able to take advantage of the compositionality (or hierarchy) of programs (in programming parlance, subroutines).</p>
+
+<p>The challenge is to achieve similar results with <strong>partial</strong> execution traces. For many problems we may not have the ability to produce detailed, full execution traces, but can specify higher-level details. In the algorithmic learning context this could be likened to psuedocode, where supervision is provided in the form of providing high-level routines, but there lies at least a single level of hierarchy between primitive actions/operations and the routines provided as supervision. The NPI can be taken as a baseline, with possible constructs such as programming embeddings - the only restriction is the lack of full execution traces. However, other weak supervision may be provided - such as the input-output pairs used in program induction. The three tasks that are to be performed by the model with a single set of weights (but interchangeable <em>encoders</em>) are addition, sorting and canonicalizing 3D models. The former two tasks make use of a "scratch pad" and pointers, whilst the latter uses a set of <a href="http://ttic.uchicago.edu/~fidler/projects/CAD.html">CAD models</a>. The goal is to reach the same performance (per sequence % accuracy) as the multi-task NPI, with 64 sequence samples per task.</p>
 
 <p>Progressing from learning to perform bubble sort (as achieved by the NPI), more difficult algorithms to learn would be more complex sorting algorithms, such as quicksort. Whilst bubble sort mainly consists of local comparisons, quicksort involves comparisons across the length of the sequence as well as multiple levels of recursion.</p>
 
 <hr />
 
 <h3>Notes</h3>
 
-<p>The compositionality or "subroutines" exhibited in algorithms has links to hierarchical reinforcement learning, where the reinforcement learning task can be subdivided into a series of smaller problems. Both identifying subtasks and learning when to execute subtask policies is an ongoing area of research. Without execution traces it seems likely that reinforcement learning would be needed to learn these algorithms, although the sample complexity will necessarily be higher.</p>
+<p>The compositionality or "subroutines" exhibited in algorithms has links to hierarchical reinforcement learning, where the reinforcement learning task can be subdivided into a series of smaller problems. Both identifying subtasks and learning when to execute subtask policies is an ongoing area of research. Without full execution traces it seems likely that reinforcement learning could be used to learn these algorithms, although the sample complexity will necessarily be higher.</p>