Skip to content

Conversation

@willtebbutt
Copy link
Collaborator

@willtebbutt willtebbutt commented Feb 5, 2025

Moving forwards, I'm going to allocate 1h every morning to writing / reviewing docs for Mooncake. I'll do this until I can't find anything to spend the time on.

This isn't ready for review yet. I've mainly opened it so that I don't just have a local copy.

@codecov
Copy link

codecov bot commented Feb 5, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2025

Performance Ratio:
Ratio of time to compute gradient and time to compute function.
Warning: results are very approximate! See here for more context.

┌────────────────────────────┬──────────┬──────────┬─────────┬─────────────┬─────────┐
│                      Label │   Primal │ Mooncake │  Zygote │ ReverseDiff │  Enzyme │
│                     String │   String │   String │  String │      String │  String │
├────────────────────────────┼──────────┼──────────┼─────────┼─────────────┼─────────┤
│                   sum_1000 │ 100.0 ns │      1.8 │     1.1 │        5.61 │    8.21 │
│                  _sum_1000 │ 941.0 ns │     6.63 │  1600.0 │        34.0 │    1.07 │
│               sum_sin_1000 │  6.54 μs │     2.17 │     1.7 │        10.6 │    2.22 │
│              _sum_sin_1000 │  5.03 μs │     2.63 │   327.0 │        14.0 │    2.58 │
│                   kron_sum │ 357.0 μs │     38.8 │     5.0 │       185.0 │    11.5 │
│              kron_view_sum │ 363.0 μs │     38.4 │    10.5 │       222.0 │    9.04 │
│      naive_map_sin_cos_exp │  2.14 μs │     2.19 │ missing │        7.13 │    2.33 │
│            map_sin_cos_exp │  2.23 μs │     2.32 │    1.44 │        5.79 │    2.73 │
│      broadcast_sin_cos_exp │  2.31 μs │     2.18 │    2.26 │        1.44 │    2.22 │
│                 simple_mlp │ 393.0 μs │     4.73 │     1.6 │        7.68 │    3.38 │
│                     gp_lml │ 543.0 μs │     4.61 │    2.35 │     missing │    4.52 │
│ turing_broadcast_benchmark │  1.98 ms │     3.49 │ missing │        27.1 │ missing │
│         large_single_block │ 390.0 ns │     4.39 │  4220.0 │        30.5 │    2.18 │
└────────────────────────────┴──────────┴──────────┴─────────┴─────────────┴─────────┘

yebai added 3 commits October 19, 2025 19:49
Signed-off-by: Hong Ge <3279477+yebai@users.noreply.github.com>
Signed-off-by: Hong Ge <3279477+yebai@users.noreply.github.com>
@yebai yebai requested a review from Copilot October 19, 2025 18:50
@github-actions
Copy link
Contributor

Mooncake.jl documentation for PR #459 is available at:
https://chalk-lab.github.io/Mooncake.jl/previews/PR459/

@yebai yebai marked this pull request as ready for review October 19, 2025 18:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a new “Understanding Mooncake” design doc that develops a mathematical model for differentiating Julia programs and shows how to implement reverse-mode rules, plus updates the docs navigation to include it.

  • New doc covers compositions, multi-arg pure functions, and mutating functions with forward/reverse passes and rule composition.
  • Adds the page to the makedocs navigation.

Reviewed Changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 13 comments.

File Description
docs/src/understanding_mooncake/what_programme_are_you_differentiating.md New design doc: models, adjoints, and rule composition for AD; includes illustrative Julia and math derivations.
docs/make.jl Adds the new page to the docs sidebar/navigation.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +21 to +31
function r(f, x, y)
a, adj_g = r(g, x)
b, adj_h = r(h, a, x, y)
function adj_f(db)
_, da, dx, dy = adj_h(db)
_, dx2 = adj_g(da)
dx = Mooncake.increment!!(dx, dx2)
return NoRData(), dx, dy
end
return b, adj_f
end
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivating example incorrectly passes x into r(h, a, x, y), but h takes only (a, y). This leads to an extra dx contribution from adj_h that should not exist. Update the call and destructuring so adj_h only returns gradients for (a, y), remove dx accumulation from adj_h, and return only dx from adj_g.

Copilot uses AI. Check for mistakes.
end
```
Observe that the above rule essentially does the following:
1. fowards-pass: replace calls to rules.
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct spelling.

Suggested change
1. fowards-pass: replace calls to rules.
1. forwards-pass: replace calls to rules.

Copilot uses AI. Check for mistakes.
```
Observe that the above rule essentially does the following:
1. fowards-pass: replace calls to rules.
2. reverse-pass: run adjoints in reverse order, adding together rdata when a variable is used multiple times.
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The term 'rdata' is undefined here and appears to be a misnomer. Replace with 'adjoints' (or 'cotangents') to match AD terminology and the rest of the doc.

Suggested change
2. reverse-pass: run adjoints in reverse order, adding together rdata when a variable is used multiple times.
2. reverse-pass: run adjoints in reverse order, adding together adjoints when a variable is used multiple times.

Copilot uses AI. Check for mistakes.

### `function` Class

To start with, let us consider only `function`s which are pure (free of externally-visible side effects, such as the modification of their arguments of global variables), unary (single-argument), and don't contain any data themselves (e.g. no closures or callable `struct`s).
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar fix.

Suggested change
To start with, let us consider only `function`s which are pure (free of externally-visible side effects, such as the modification of their arguments of global variables), unary (single-argument), and don't contain any data themselves (e.g. no closures or callable `struct`s).
To start with, let us consider only `function`s which are pure (free of externally-visible side effects, such as the modification of their arguments or global variables), unary (single-argument), and don't contain any data themselves (e.g. no closures or callable `struct`s).

Copilot uses AI. Check for mistakes.
r(W, X, Y, \hat{Y}, \varepsilon, l) :=&\, l \nonumber
\end{align}
```
In words, our mathematical model for `linear_regression_loss` is the composition of four differentiable functions. The first three map from a tuple containing all variables seen so far, to a tuple containing the same variables _and_ the value returned by the operation being modeled, and the fourth simple reads off the elements of the final tuple which were passed in as arguments, and the return value.
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar fix.

Copilot uses AI. Check for mistakes.
x_copy = copy(primal(x))

# Run primal operation.
square!(x)
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x is a CoDual; the primal mutation should target primal(x). Use square!(primal(x)) to avoid implying a required square!(::CoDual, ...) method.

Suggested change
square!(x)
square!(primal(x))

Copilot uses AI. Check for mistakes.
primal(x) .= x_copy

# Modify gradient to correspond to result of adjoint of transition function.
tangent(x) .= 2 .* primal(x)
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overwrites the incoming cotangent. For f(x)=x⊙x, the reverse update is x̄ .= x̄ .* (2 .* x). Multiply the existing tangent by 2 .* primal(x) instead of replacing it.

Suggested change
tangent(x) .= 2 .* primal(x)
tangent(x) .= tangent(x) .* (2 .* primal(x))

Copilot uses AI. Check for mistakes.
return nothing, f!_reverse
end
```
This rule satisfies our requirement that all modifications to `x` are un-done by `f!_reverse` inductively -- we assume that each `f_n!_reverse` satisfies this require.
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar fix.

Suggested change
This rule satisfies our requirement that all modifications to `x` are un-done by `f!_reverse` inductively -- we assume that each `f_n!_reverse` satisfies this require.
This rule satisfies our requirement that all modifications to `x` are undone by `f!_reverse` inductively -- we assume that each `f_n!_reverse` satisfies this requirement.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Contributor

Performance Ratio:
Ratio of time to compute gradient and time to compute function.
Warning: results are very approximate! See here for more context.

┌────────────────────────────┬──────────┬──────────┬─────────────┬─────────┬─────────────┬────────┐
│                      Label │   Primal │ Mooncake │ MooncakeFwd │  Zygote │ ReverseDiff │ Enzyme │
│                     String │   String │   String │      String │  String │      String │ String │
├────────────────────────────┼──────────┼──────────┼─────────────┼─────────┼─────────────┼────────┤
│                   sum_1000 │ 100.0 ns │      1.9 │         1.9 │     1.1 │        5.61 │   8.21 │
│                  _sum_1000 │ 941.0 ns │     6.96 │        1.01 │  1350.0 │        34.1 │   1.09 │
│               sum_sin_1000 │  6.62 μs │     2.47 │        1.35 │    1.66 │        10.5 │   2.16 │
│              _sum_sin_1000 │   5.3 μs │     2.98 │        2.17 │   246.0 │        13.3 │   2.46 │
│                   kron_sum │ 298.0 μs │     51.6 │         2.8 │     5.2 │       217.0 │   9.42 │
│              kron_view_sum │ 318.0 μs │     42.5 │        3.39 │    11.5 │       227.0 │   6.81 │
│      naive_map_sin_cos_exp │  2.14 μs │      2.5 │        1.41 │ missing │        7.31 │   2.36 │
│            map_sin_cos_exp │  2.13 μs │     2.76 │        1.45 │    1.58 │        6.18 │   2.37 │
│      broadcast_sin_cos_exp │  2.24 μs │      2.4 │        1.38 │    2.38 │        1.47 │   2.22 │
│                 simple_mlp │ 198.0 μs │     6.19 │        2.95 │    1.78 │        10.5 │    3.4 │
│                     gp_lml │ 241.0 μs │      8.5 │        2.11 │    3.84 │     missing │   4.73 │
│ turing_broadcast_benchmark │   1.8 ms │     4.08 │        3.42 │ missing │        26.8 │   2.64 │
│         large_single_block │ 380.0 ns │     4.53 │        2.03 │  4440.0 │        31.8 │   2.24 │
└────────────────────────────┴──────────┴──────────┴─────────────┴─────────┴─────────────┴────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants