Remove AD backend loops in test suite #2564

penelopeysm · 2025-05-23T11:37:09Z

Summary

In Turing's test suite, we have historically run HMC, etc. tests three times, once with ForwardDiff, once with ReverseDiff, and once with Mooncake. This PR removes this loop and only runs sampler-based tests with ForwardDiff.

In its place, I've moved all the AD-based tests to test/ad.jl. These comprise:

ADTypeCheckContext tests

These are directly copied from the old test/test_utils/ad_utils.jl file, and basically exist to make sure that samplers and optimisers don't suddenly default to using ForwardDiff.

SamplingContext tests with DynamicPPL demo models

When running sample(model, spl, N), the LogDensityFunction gets constructed with a context of SamplingContext(rng, spl), which means that model execution can in general go down a completely different path when compared to just using DefaultContext. Thus, there is still a need to test the correctness of AD on this.

GibbsContext tests with DynamicPPL demo models

The same rationale as above, but for Gibbs sampling, which contextualises the model with GibbsContext.

Are the demo models enough?

Given that we would previously have been testing AD on a range of random models in the inference tests, it might be fair to ask the question of whether we are losing AD testing coverage by not testing on all of the models.

I've (rather painstakingly) gone through all of the models in the inference tests, and it turns out that most of them are not very interesting. For example, gdemo_default is used a lot, and that's already one of the DynamicPPL demo models.

Other models are speciically chosen to highlight issues with DynamicPPL internals rather than AD, so I don't really consider losing the AD testing on these to be a huge problem.

Most of the non-trivial Gibbs models have discrete parameters and so obviously aren't suitable for general AD testing.

There were three models where (in my best judgment) were worth adding some tests for, so I've already transferred these to ADTests: TuringLang/ADTests@4c34c19 (I am aware that ADTests is not really a 'test' in the truest sense, as it doesn't get run in CI.)

More generally @yebai has made the point that for AD testing in CI, we can stick to just the DynamicPPL demo models. Although I do still have slight concerns that that they don't offer complete coverage, I do think that they do cover a pretty wide range of DynamicPPL features, so am happy to sign off on this PR, in the interests of not letting the perfect be the enemy of the good.

What next?

This PR roughly halves the time taken to run Gibbs tests, which are the slowest tests.

I think it would be a very quick win to try to rebalance the test suite in CI, especially if we can split the Gibbs tests up. Right now the AD tests are being run together with 'everything else', which is not optimal.

I also note that the Gibbs demo model tests take an hour to run on 1.10, but around 15 minutes on 1.11. This is a huge source of the 90-minute runtime for the slowest tests (Gibbs on 1.10). I don't know why this is the case. In an ideal world, I'd like somebody to look into it, although we don't have much time.

github-actions · 2025-05-23T11:41:41Z

Turing.jl documentation for PR #2564 is available at:
https://TuringLang.github.io/Turing.jl/previews/PR2564/

codecov · 2025-05-23T11:41:57Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.86%. Comparing base (dea5d19) to head (f1263bf).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2564   +/-   ##
=======================================
  Coverage   85.86%   85.86%           
=======================================
  Files          21       21           
  Lines        1429     1429           
=======================================
  Hits         1227     1227           
  Misses        202      202

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coveralls · 2025-05-23T14:27:58Z

Pull Request Test Coverage Report for Build 15325551548

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 85.924%

Totals
Change from base Build 15324242745:	0.0%
Covered Lines:	1227
Relevant Lines:	1428

💛 - Coveralls

test/runtests.jl

mhauru

I, too, am a bit nervous about how much this reduces our test coverage (and I don't mean lines of code but functionality), but it definitely seems like the right way to do things long term.

One concern I have is if some bits of code that aren't within logdensity_and_gradient calls might still somehow fail, due to e.g. not handling tracer types. For instance, Gibbs does a bunch of resetting of sampler states, and nasty things could happen if some vector element types get messed up. run_ad wouldn't catch that, since it evaluates the LDF without calling step. Maybe in the loop over algorithms in ad.jl we could also do a call to sample with each AD backend and model, just to ensure that it doesn't crash?

test/ad.jl

penelopeysm · 2025-05-28T15:45:34Z

test/mcmc/gibbs.jl

+                Turing.Gibbs(@varname(s) => HMC(0.01, 4), @varname(m) => MH()),
+                Turing.Gibbs(@varname(s) => MH(), @varname(m) => HMC(0.01, 4)),
            ]

-            if !has_dot_assume(model)
-                # Add in some MH samplers, which are not compatible with `.~`.
-                append!(
-                    samplers,
-                    [
-                        Turing.Gibbs(@varname(s) => HMC(0.01, 4), @varname(m) => MH()),
-                        Turing.Gibbs(@varname(s) => MH(), @varname(m) => HMC(0.01, 4)),
-                    ],
-                )
-            end
-


I also removed this (outdated) check for dot assume so now it checks with every sampler combination

test/ad.jl

sunxd3 · 2025-05-29T07:53:58Z

I am not very concerned with the potential coverage lost - we can add tests back when things fail.
The changes makes a lot of sense for me. All look good.

(Is this missed?)

sunxd3 · 2025-05-29T07:56:57Z

Also, we can probably get rid of the adbackend and just use Turing.DEFAULT_ADTYPE? Not essential.

Turing.jl/test/mcmc/gibbs.jl

Line 376 in 5e7ed7d

adbackend = Turing.DEFAULT_ADTYPE

Turing.jl/test/mcmc/hmc.jl

Line 22 in 5e7ed7d

adbackend = Turing.DEFAULT_ADTYPE

Turing.jl/test/mcmc/Inference.jl

Line 23 in 5e7ed7d

adbackend = Turing.DEFAULT_ADTYPE

penelopeysm · 2025-05-29T12:31:53Z

(Is this missed?)

To be completely honest, I have no idea what that test is doing or why it's using a different adtype.

I didn't want to touch it in this PR, but let's put it this way, if someone else were to put in a PR removing this test, I would probably approve...

Also, we can probably get rid of the adbackend and just use Turing.DEFAULT_ADTYPE?

We could, or we could just drop the adtype keyword argument entirely and let it default to the default (i.e. just use NUTS() rather than NUTS(; adtype=Turing.DEFAULT_ADTYPE). I guess it's a question of whether to be explicit or not -- in this case I erred on being explicit. But honestly, it'd be a lot cleaner to not have it in there, so I'll remove them and then merge.

Remove AD backend loops

d8f6dfa

penelopeysm marked this pull request as draft May 23, 2025 11:37

github-actions bot assigned penelopeysm May 23, 2025

Add AD tests to sghmc.jl

df85c6f

penelopeysm changed the title ~~[WIP] Remove AD backend loops~~ [WIP] Remove AD backend loops in test suite May 23, 2025

penelopeysm added 2 commits May 23, 2025 16:50

Remove AD loops from HMC and SGHMC

7369221

Handle Inference.jl

62d414f

penelopeysm force-pushed the py/no-repeated-adtype branch from cc047e3 to 62d414f Compare May 23, 2025 15:50

Remove AD loop in Gibbs

a4387a8

penelopeysm force-pushed the py/no-repeated-adtype branch from ad11da9 to d581efc Compare May 24, 2025 15:49

penelopeysm added 3 commits May 24, 2025 21:04

Fix broken test

fd2359c

Mark testsets as verbose=true

f257f15

Merge remote-tracking branch 'origin/main' into py/no-repeated-adtype

0975a91

penelopeysm force-pushed the py/no-repeated-adtype branch from f3e61d5 to 0975a91 Compare May 24, 2025 20:04

penelopeysm added 2 commits May 28, 2025 00:21

Separate AD tests into their own file

be08026

Finish AD tests

0d2955f

penelopeysm changed the title ~~[WIP] Remove AD backend loops in test suite~~ Remove AD backend loops in test suite May 28, 2025

Merge remote-tracking branch 'origin/main' into py/no-repeated-adtype

986d016

penelopeysm marked this pull request as ready for review May 28, 2025 00:16

penelopeysm requested review from mhauru and sunxd3 May 28, 2025 00:19

No I did not mean to comment out all other tests

5d654c7

penelopeysm commented May 28, 2025

View reviewed changes

test/runtests.jl Show resolved Hide resolved

Move ADTypeCheckContexts and optimisation tests to ad.jl as well

f59a89c

penelopeysm force-pushed the py/no-repeated-adtype branch from 739eb97 to f59a89c Compare May 28, 2025 00:41

Remove stray test_utils file

0c8b28f

mhauru reviewed May 28, 2025

View reviewed changes

test/ad.jl Show resolved Hide resolved

penelopeysm added 3 commits May 28, 2025 15:53

Remove has_dot_assume check

4210f12

Remove TODO comment about Mooncake

e86938e

Test Gibbs sampling

2b2f352

penelopeysm commented May 28, 2025

View reviewed changes

penelopeysm requested a review from mhauru May 28, 2025 15:47

mhauru reviewed May 28, 2025

View reviewed changes

test/ad.jl Outdated Show resolved Hide resolved

Fix test

5e7ed7d

mhauru approved these changes May 28, 2025

View reviewed changes

sunxd3 approved these changes May 29, 2025

View reviewed changes

penelopeysm added 3 commits May 29, 2025 13:35

Remove unnecessary DEFAULT_ADTYPEs

c0be80c

One more

3a43ef7

Merge branch 'main' into py/no-repeated-adtype

f1263bf

penelopeysm merged commit 1a70627 into main May 29, 2025
40 of 46 checks passed

penelopeysm deleted the py/no-repeated-adtype branch May 29, 2025 16:25

penelopeysm mentioned this pull request May 30, 2025

Disable Mooncake on prerelease; bump min Julia to 1.10.2; regroup CI tests #2569

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove AD backend loops in test suite #2564

Remove AD backend loops in test suite #2564

Uh oh!

penelopeysm commented May 23, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 23, 2025

Uh oh!

codecov bot commented May 23, 2025 •

edited

Loading

Uh oh!

coveralls commented May 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

mhauru left a comment

Uh oh!

Uh oh!

penelopeysm May 28, 2025

Uh oh!

Uh oh!

sunxd3 commented May 29, 2025

Uh oh!

sunxd3 commented May 29, 2025

Uh oh!

penelopeysm commented May 29, 2025

Uh oh!

Uh oh!

Uh oh!

Remove AD backend loops in test suite #2564

Remove AD backend loops in test suite #2564

Uh oh!

Conversation

penelopeysm commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Are the demo models enough?

What next?

Uh oh!

github-actions bot commented May 23, 2025

Uh oh!

codecov bot commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coveralls commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 15325551548

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

Uh oh!

Uh oh!

mhauru left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

penelopeysm May 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sunxd3 commented May 29, 2025

Uh oh!

sunxd3 commented May 29, 2025

Uh oh!

penelopeysm commented May 29, 2025

Uh oh!

Uh oh!

Uh oh!

penelopeysm commented May 23, 2025 •

edited

Loading

codecov bot commented May 23, 2025 •

edited

Loading

coveralls commented May 23, 2025 •

edited

Loading