Add 2021Q3 roadmap.

This also restructures the docs to a "roadmap" directory, to preserve previous roadmaps / allow retrospective "grading" of how we did.
mlevesquedion · Jun 15, 2021 · 31c15ca · 31c15ca
1 parent 92ee0fa
commit 31c15ca
Show file tree

Hide file tree

Showing 2 changed files with 174 additions and 0 deletions.
diff --git a/docs/roadmap.md → docs/roadmaps/2021Q2.md b/docs/roadmap.md → docs/roadmaps/2021Q2.md
@@ -159,6 +159,46 @@ and satisfy our objectives, without getting bogged down in theoretical details.
   - ATen --> Linalg lowerings.
   - Canonicalizations and other compiler optimizations
 
+
+### 2021Q2 Retrospective (added afterwards)
+
+- Model curriculum
+  - [✅] Formalize / publish curriculum to ease collaboration
+    - Note: See `frontends/pytorch/e2e_testing/torchscript`.
+  - [~] Incorporate end-to-end ASR (speech recognition) model into curriculum,
+    or program of similar complexity.
+    - Note: TorchScript'able machine translation model identified, but not
+      formally added.
+  - [✅] Incorporate representative quantized models into curriculum.
+    - Note: See `frontends/pytorch/e2e_testing/torchscript/quantized_models.py`
+- End-to-end execution of at least the simplest models in the curriculum.
+  - [✅] User annotation infrastructure for users to provide the compiler
+    information, such as shapes to seed shape inference.
+    - Note: See `frontends/pytorch/csrc/builder/class_annotator.cpp` and
+      `frontends/pytorch/python/torch_mlir/torchscript/annotations.py`.
+  - [✅] Fill out ATen dialect and `aten-recognize-kernels` pass.
+    - Note: Accomplished with significant design shift. See
+      [PR](https://github.com/llvm/mlir-npcomp/pull/214).
+  - [❌] ATen lowering to Linalg-on-tensors
+    - Implement a minimal amount of linalg-inspired abstractions in the "TCF"
+      dialect.
+    - Extend the linalg
+      [OpDSL tooling](https://llvm.discourse.group/t/rfc-linalg-opdsl/2966/6) to
+      enable us to programmatically emit shape validity checks.
+    - Note: Not enough programs / ops brought up yet to generalize.
+  - [✅] Shape/dtype inference
+    - As needed for other incremental objectives.
+    - Build a clear picture of the right place(s) in the longer-term compiler
+      pipeline for shape inference.
+    - Note: Need one major pass of this in the frontend at the torch level to
+      get ranks and dtypes, and then one later pass at the linalg level to
+      propagate specific sizes as much as possible.
+  - [✅] Canonicalizations and general compiler optimizations
+    - As needed for other incremental objectives.
+  - [✅] Backend choice: RefBackend or IREE candidates.
+    - Note: Used RefBackend this quarter due to simplicity of current programs +
+      some blocking IREE issues.
+
 ## RefBackend
 
 The npcomp reference backend (or "RefBackend") is perhaps the most confusing

diff --git a/docs/roadmaps/2021Q3.md b/docs/roadmaps/2021Q3.md
@@ -0,0 +1,134 @@
+# Roadmap as of beginning of 2021Q3
+
+## Project status overview
+
+- TorchScript compilation: Significant work has gone into the TorchScript
+  compilation workstream. Basic multi-layer perceptrons execute end-to-end, and
+  significant strides have been taken towards ResNet and quantized programs.
+  Additionally, a full TorchScript'able machine translation model (IDs to IDs;
+  including beam search) has been identified as representative of the kind of
+  challenging programs that the TorchScript ahead-of-time compilation flow will
+  enable.
+
+- `acap_dispatch`: Discussions with stakeholders in the npcomp and PyTorch
+  community have shifted the `acap_dispatch` workstream to upstream discussions
+  (see [bug](https://github.com/pytorch/xla/issues/2854)). Work within npcomp on
+  `acap_dispatch` is temporarily on hold.
+
+- RefBackend: The RefBackend workstream is temporarily on hold as well. The
+  needs of the TorchScript compilation path are too complex (lists, dicts, error
+  handling, runtime ABI) and the engineering resources too limited to
+  meaningfully bring up an alternative backend. The decision going forward is to
+  single-source on IREE as our needs become more complex. This is somewhat
+  unfortunate, as the goal of the RefBackend was to somewhat defray the backend
+  story and prevent single-sourcing on what at the time (~2020Q1) was perceived
+  as a large external dependency. Somewhat mitigating this situation though is
+  that in the intervening year, IREE has become significantly "leaner and
+  meaner", and while still nontrivial, it has found a much more tightly scoped
+  role that leans much more heavily on upstream infrastructure. In fact,
+  inclusion of IREE in the LLVM project in some form now seems possible, which
+  will make this dependency very natural.
+
+## Non-technical project status overview
+
+Community contributions have somewhat petered out due to the shifting focus of
+the project. This was somewhat expected as the early aspirations of the project
+met with the reality of available resourcing, ecosystem constraints, and more
+fine-grained understanding of stakeholder needs. We have brought on 1 new full
+time engineer to work on the project though.
+
+## Roadmap overview
+
+The project has converged on the TorchScript workstream as the primary effort:
+
+- TorchScript compilation: The goal of this project is to build the frontend of
+  a truly next-generation ahead-of-time machine learning compiler.
+  - Why this project is cool: This system is designed from day 1 to support
+    features such as dynamic shapes, control flow, mutable variables,
+    program-internal state, and non-Tensor types (scalars, lists, dicts) in a
+    principled fashion. These features are essential for empowering an
+    industry-level shift in the set of machine learning programs that are
+    feasible to deploy with minimal effort across many devices (when combined
+    with a backend using the advanced compilation techniques being developed
+    elsewhere in the MLIR ecosystem).
+
+## TorchScript compilation
+
+The TorchScript compiler represents the bulk of core compiler effort in the
+npcomp project.
+[TorchScript](https://pytorch.org/docs/stable/jit_language_reference.html) is a
+restricted (more static) subset of Python, but even TorchScript is quite dynamic
+when compared to the needs of lower-levels of the compilation stack, especially
+systems like Linalg. The overarching theme of this project is building out
+compiler components that bridge that gap. As we do so, the recurring tradeoffs
+are:
+
+- user experience: we want a fairly unrestricted programming model -- that's
+  what users like about PyTorch, and what enables users to deploy without
+  significant modifications of their code.
+- feasibility of the compiler: we want a smart compiler that is feasible to
+  implement (for our own sanity :) )
+- excellent generated code quality: this is of course dependent on the backend
+  which is paired with the frontend we are building, but there are a number of
+  transformations that make sense before we reach the backend which strongly
+  affect the quality of code generated from a backend.
+
+To give a concrete example, consider the problem of inferring the shapes of
+tensors at various points in the program. The more precision we have on the
+shapes, the better code can be emitted by a backend. But in general, users need
+to provide at least some information about their program to help the compiler
+understand what shapes are at different points in the program. The smarter our
+compiler algorithms are, the less information the user needs to provide. Thus,
+all 3 facets are interlinked and there is no single right answer -- we need to
+balance them for a workable system.
+
+To accomplish this goal, we are guided by a *model curriculum*, which consists
+of programs of escalating complexity, from a simple elementwise operation all
+the way to a full-blown end-to-end speech recognition program. Our development
+process consists of setting incremental objectives to build out new layers of
+the compiler to a satisfactory level on easier programs in the curriculum, and
+backfilling complexity as needed to extend to the harder programs. Ideally, this
+backfilling does not require deep conceptual changes to components, but is
+simply an application of extension points anticipated in the original design.
+The trick to making that happen is evaluating designs on enough programs from
+the curriculum to ensure that a solution is likely to generalize and satisfy our
+objectives, without getting bogged down in theoretical details.
+
+### 2021Q3
+
+- Theme: Scale up the programs we can run end-to-end.
+  - End-to-end execution of ResNet.
+  - Significant strides towards end-to-end execution of the identified
+    end-to-end machine translation model.
+  - End-to-end execution of simple programs with lists.
+  - End-to-end execution of simple stateful programs.
+  - Significant strides towards end-to-end execution of two major "classes of
+    models". Tentatively: transformer, LSTM.
+- Theme: Start feeling production-ey
+  - For the simplest programs at least, get them running on IREE with
+    performance competitive with other frontends
+      - Stretch: Extend this result to ResNet.
+  - Write initial "user manual" (and any supporting tools, packaging) for how to
+    use the new frontend (+ backend integration points) to deploy something on
+    IREE.
+    - Redesign frontend API's as needed to be palatable to document.
+
+
+### 2021Q4
+
+- Theme: Compiler becomes generally functional for a large class of programs
+  - End-to-end execution of end-to-end MT (machine translation) program.
+  - End-to-end execution of the two major "classes of models" added to the
+    curriculum in Q3.
+  - End-to-end execution of quantized model.
+  - Identify/build TorchScript'able ASR (automatic speech recognition) program.
+  - Significant strides towards end-to-end execution of ASR.
+  - Bringing up new programs should be fairly quick and mechanical.
+- Theme: Pathfind next phase after initial compiler bringup.
+  - Begin talks with potential users / applications to identify a useful
+    "real" capstone project.
+      - Goal: Demonstrate viability of the tools.
+      - Goal: Start rallying support / interest more broadly.
+  - Begin looking at training use cases.
+  - Begin looking at building "anti-framework" numerical Python compiler layered
+    on our TorchScript compiler.