-
Notifications
You must be signed in to change notification settings - Fork 25
More Mooncake Design Docs #459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Performance Ratio: |
Signed-off-by: Hong Ge <3279477+yebai@users.noreply.github.com>
Signed-off-by: Hong Ge <3279477+yebai@users.noreply.github.com>
|
Mooncake.jl documentation for PR #459 is available at: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a new “Understanding Mooncake” design doc that develops a mathematical model for differentiating Julia programs and shows how to implement reverse-mode rules, plus updates the docs navigation to include it.
- New doc covers compositions, multi-arg pure functions, and mutating functions with forward/reverse passes and rule composition.
- Adds the page to the makedocs navigation.
Reviewed Changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 13 comments.
| File | Description |
|---|---|
| docs/src/understanding_mooncake/what_programme_are_you_differentiating.md | New design doc: models, adjoints, and rule composition for AD; includes illustrative Julia and math derivations. |
| docs/make.jl | Adds the new page to the docs sidebar/navigation. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| function r(f, x, y) | ||
| a, adj_g = r(g, x) | ||
| b, adj_h = r(h, a, x, y) | ||
| function adj_f(db) | ||
| _, da, dx, dy = adj_h(db) | ||
| _, dx2 = adj_g(da) | ||
| dx = Mooncake.increment!!(dx, dx2) | ||
| return NoRData(), dx, dy | ||
| end | ||
| return b, adj_f | ||
| end |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motivating example incorrectly passes x into r(h, a, x, y), but h takes only (a, y). This leads to an extra dx contribution from adj_h that should not exist. Update the call and destructuring so adj_h only returns gradients for (a, y), remove dx accumulation from adj_h, and return only dx from adj_g.
| end | ||
| ``` | ||
| Observe that the above rule essentially does the following: | ||
| 1. fowards-pass: replace calls to rules. |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct spelling.
| 1. fowards-pass: replace calls to rules. | |
| 1. forwards-pass: replace calls to rules. |
| ``` | ||
| Observe that the above rule essentially does the following: | ||
| 1. fowards-pass: replace calls to rules. | ||
| 2. reverse-pass: run adjoints in reverse order, adding together rdata when a variable is used multiple times. |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The term 'rdata' is undefined here and appears to be a misnomer. Replace with 'adjoints' (or 'cotangents') to match AD terminology and the rest of the doc.
| 2. reverse-pass: run adjoints in reverse order, adding together rdata when a variable is used multiple times. | |
| 2. reverse-pass: run adjoints in reverse order, adding together adjoints when a variable is used multiple times. |
|
|
||
| ### `function` Class | ||
|
|
||
| To start with, let us consider only `function`s which are pure (free of externally-visible side effects, such as the modification of their arguments of global variables), unary (single-argument), and don't contain any data themselves (e.g. no closures or callable `struct`s). |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar fix.
| To start with, let us consider only `function`s which are pure (free of externally-visible side effects, such as the modification of their arguments of global variables), unary (single-argument), and don't contain any data themselves (e.g. no closures or callable `struct`s). | |
| To start with, let us consider only `function`s which are pure (free of externally-visible side effects, such as the modification of their arguments or global variables), unary (single-argument), and don't contain any data themselves (e.g. no closures or callable `struct`s). |
| r(W, X, Y, \hat{Y}, \varepsilon, l) :=&\, l \nonumber | ||
| \end{align} | ||
| ``` | ||
| In words, our mathematical model for `linear_regression_loss` is the composition of four differentiable functions. The first three map from a tuple containing all variables seen so far, to a tuple containing the same variables _and_ the value returned by the operation being modeled, and the fourth simple reads off the elements of the final tuple which were passed in as arguments, and the return value. |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar fix.
| x_copy = copy(primal(x)) | ||
|
|
||
| # Run primal operation. | ||
| square!(x) |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x is a CoDual; the primal mutation should target primal(x). Use square!(primal(x)) to avoid implying a required square!(::CoDual, ...) method.
| square!(x) | |
| square!(primal(x)) |
| primal(x) .= x_copy | ||
|
|
||
| # Modify gradient to correspond to result of adjoint of transition function. | ||
| tangent(x) .= 2 .* primal(x) |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overwrites the incoming cotangent. For f(x)=x⊙x, the reverse update is x̄ .= x̄ .* (2 .* x). Multiply the existing tangent by 2 .* primal(x) instead of replacing it.
| tangent(x) .= 2 .* primal(x) | |
| tangent(x) .= tangent(x) .* (2 .* primal(x)) |
docs/src/understanding_mooncake/what_programme_are_you_differentiating.md
Show resolved
Hide resolved
| return nothing, f!_reverse | ||
| end | ||
| ``` | ||
| This rule satisfies our requirement that all modifications to `x` are un-done by `f!_reverse` inductively -- we assume that each `f_n!_reverse` satisfies this require. |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar fix.
| This rule satisfies our requirement that all modifications to `x` are un-done by `f!_reverse` inductively -- we assume that each `f_n!_reverse` satisfies this require. | |
| This rule satisfies our requirement that all modifications to `x` are undone by `f!_reverse` inductively -- we assume that each `f_n!_reverse` satisfies this requirement. |
docs/src/understanding_mooncake/what_programme_are_you_differentiating.md
Show resolved
Hide resolved
|
Performance Ratio: |
Moving forwards, I'm going to allocate 1h every morning to writing / reviewing docs for Mooncake. I'll do this until I can't find anything to spend the time on.
This isn't ready for review yet. I've mainly opened it so that I don't just have a local copy.