Improvements to VarNamedVector #1098

mhauru · 2025-10-28T19:45:26Z

This PR is an assortment of somewhat unrelated improvements to VarNamedVector. I've broken the changes up into sensible, roughly atomic units in the commits. I recommend reviewing this PR commit-by-commit, it'll make more sense that way.

The total outcome of this is that VarNamedVector is now on par with Metadata for performance at least for the Smorgasbord model:

┌─────────────┬─────┬─────────────┬────────────────┬────────┬────────────────┬─────────────────┐
│       Model │ Dim │  AD Backend │        VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├─────────────┼─────┼─────────────┼────────────────┼────────┼────────────────┼─────────────────┤
│ Smorgasbord │ 201 │ forwarddiff │          typed │   true │          933.0 │            50.3 │
│ Smorgasbord │ 201 │ forwarddiff │        untyped │  false │        11803.0 │             3.6 │
│ Smorgasbord │ 201 │ forwarddiff │        untyped │   true │          840.4 │            51.5 │
│ Smorgasbord │ 201 │ forwarddiff │   typed_vector │   true │         1025.4 │            48.4 │
│ Smorgasbord │ 201 │ forwarddiff │ untyped_vector │  false │         1014.9 │            40.7 │
│ Smorgasbord │ 201 │ forwarddiff │ untyped_vector │   true │          880.1 │            50.1 │
└─────────────┴─────┴─────────────┴────────────────┴────────┴────────────────┴─────────────────┘

I'll do more benchmarking tomorrow.

github-actions · 2025-10-28T19:47:03Z

Benchmark Report for Commit `d30eca8`

Computer Information

Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

┌───────────────────────┬───────┬─────────────┬───────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │           VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │ forwarddiff │             typed │  false │            6.5 │             1.8 │
│           Smorgasbord │   201 │ forwarddiff │             typed │  false │          729.0 │            43.0 │
│           Smorgasbord │   201 │ forwarddiff │ simple_namedtuple │   true │          409.6 │            55.2 │
│           Smorgasbord │   201 │ forwarddiff │           untyped │   true │          779.0 │            36.3 │
│           Smorgasbord │   201 │ forwarddiff │       simple_dict │   true │         6712.0 │            24.9 │
│           Smorgasbord │   201 │ forwarddiff │      typed_vector │   true │          799.3 │            39.5 │
│           Smorgasbord │   201 │ forwarddiff │    untyped_vector │   true │          750.1 │            38.8 │
│           Smorgasbord │   201 │ reversediff │             typed │   true │          901.3 │            45.6 │
│           Smorgasbord │   201 │    mooncake │             typed │   true │          722.1 │             5.7 │
│           Smorgasbord │   201 │      enzyme │             typed │   true │          889.1 │             3.8 │
│    Loop univariate 1k │  1000 │    mooncake │             typed │   true │         3810.8 │             5.8 │
│       Multivariate 1k │  1000 │    mooncake │             typed │   true │          982.2 │             9.0 │
│   Loop univariate 10k │ 10000 │    mooncake │             typed │   true │        41923.3 │             5.3 │
│      Multivariate 10k │ 10000 │    mooncake │             typed │   true │         8659.7 │             9.9 │
│               Dynamic │    10 │    mooncake │             typed │   true │          125.6 │            10.6 │
│              Submodel │     1 │    mooncake │             typed │   true │            8.6 │             6.5 │
│                   LDA │    12 │ reversediff │             typed │   true │          974.9 │             2.0 │
└───────────────────────┴───────┴─────────────┴───────────────────┴────────┴────────────────┴─────────────────┘

codecov · 2025-10-28T19:54:00Z

Codecov Report

❌ Patch coverage is 92.30769% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.17%. Comparing base (1fa3109) to head (d30eca8).
⚠️ Report is 1 commits behind head on mhauru/varnamedvector-speed.

Files with missing lines	Patch %	Lines
src/varnamedvector.jl	92.68%	3 Missing ⚠️
src/varinfo.jl	93.33%	2 Missing ⚠️
src/debug_utils.jl	0.00%	1 Missing ⚠️

Additional details and impacted files

@@                       Coverage Diff                       @@
##           mhauru/varnamedvector-speed    #1098      +/-   ##
===============================================================
- Coverage                        81.39%   81.17%   -0.23%     
===============================================================
  Files                               40       40              
  Lines                             3763     3793      +30     
===============================================================
+ Hits                              3063     3079      +16     
- Misses                             700      714      +14

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-10-29T09:23:24Z

DynamicPPL.jl documentation for PR #1098 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1098/

mhauru · 2025-10-29T10:16:06Z

More benchmarks. These are structured so that the VNV version is always right after the Metadata version. Two of the Metadata ones crashed and are marked as such. Note that for Metadata calling link tightens some element types, which is why the 3rd line, an untyped, unlinked Metadata run, is exceptionally slow.

┌───────────────────────┬───────┬─────────────┬────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │        VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │    mooncake │          typed │  false │           13.2 │             6.2 │
│ Simple assume observe │     1 │    mooncake │   typed_vector │  false │           26.5 │             5.5 │
│ Simple assume observe │     1 │    mooncake │        untyped │  false │          100.7 │             1.8 │
│ Simple assume observe │     1 │    mooncake │ untyped_vector │  false │           21.2 │             6.0 │
│           Smorgasbord │   201 │ reversediff │          typed │  false │         1144.8 │            47.1 │
│           Smorgasbord │   201 │ reversediff │   typed_vector │  false │         1237.6 │            43.6 │
│           Smorgasbord │   201 │ reversediff │        untyped │   true │          834.7 │            74.5 │
│           Smorgasbord │   201 │ reversediff │ untyped_vector │   true │          903.6 │            68.9 │
│    Loop univariate 1k │  1000 │    mooncake │          typed │   true │         5172.6 │             4.7 │
│    Loop univariate 1k │  1000 │    mooncake │   typed_vector │   true │         5193.8 │             4.8 │
│    Loop univariate 1k │  1000 │    mooncake │        untyped │   true │         5220.3 │            17.3 │
│    Loop univariate 1k │  1000 │    mooncake │ untyped_vector │   true │         5138.2 │             4.7 │
│       Multivariate 1k │  1000 │    mooncake │          typed │   true │         1256.1 │             8.3 │
│       Multivariate 1k │  1000 │    mooncake │   typed_vector │   true │         1274.6 │             8.3 │
│       Multivariate 1k │  1000 │    mooncake │        untyped │   true │         4669.2 │             2.3 │
│       Multivariate 1k │  1000 │    mooncake │ untyped_vector │   true │         1211.0 │             8.4 │
│   Loop univariate 10k │ 10000 │    mooncake │          typed │   true │        51657.6 │             4.9 │
│   Loop univariate 10k │ 10000 │    mooncake │   typed_vector │   true │        56827.6 │             5.0 │
│      Multivariate 10k │ 10000 │    mooncake │          typed │   true │        11460.9 │             9.3 │
│      Multivariate 10k │ 10000 │    mooncake │   typed_vector │   true │        11437.1 │             9.3 │
│               Dynamic │    10 │    mooncake │          typed │   true │          243.8 │             6.3 │
│               Dynamic │    10 │    mooncake │   typed_vector │   true │          296.8 │             5.8 │
│               Dynamic │    10 │    mooncake │        untyped │   true │        crashes │         crashes │
│               Dynamic │    10 │    mooncake │ untyped_vector │   true │          235.3 │             6.9 │
│              Submodel │     1 │    mooncake │          typed │   true │           16.2 │             5.2 │
│              Submodel │     1 │    mooncake │   typed_vector │   true │           29.8 │             5.6 │
│              Submodel │     1 │    mooncake │        untyped │   true │           16.2 │             9.8 │
│              Submodel │     1 │    mooncake │ untyped_vector │   true │           27.1 │             5.6 │
│                   LDA │    12 │ reversediff │          typed │   true │         1401.0 │             1.9 │
│                   LDA │    12 │ reversediff │   typed_vector │   true │         1419.9 │             1.9 │
│                   LDA │    12 │ reversediff │        untyped │   true │        crashes │          crashes│
│                   LDA │    12 │ reversediff │ untyped_vector │   true │         1300.9 │             2.0 │
└───────────────────────┴───────┴─────────────┴────────────────┴────────┴────────────────┴─────────────────┘

Cases where VNV is notably slower are the very simplest model (probably not worth our attenion), submodels (same thing, the model is just so tiny that overheads dominate), and loop univariate 10k, where there's a 10% performance regression between Metadata and VNV. I'll look into that a bit more. However, I just checked and the loop univariate 10k case is 570% slower on main, so still worth merging.

mhauru · 2025-10-29T10:19:20Z

docs/src/api.md

 DynamicPPL.loosen_types!!
-DynamicPPL.tighten_types


These two don't actually ever mutate their inputs. Hence the !! is questionable. However, they sometimes return the original object, sometimes an object that shares memory with the original object, so you should use them like you use !! functions: Always catch the return value and never assume that the return value is independent of the input.

mhauru · 2025-10-29T10:20:28Z

src/test_utils/ad.jl

    if isnothing(params)
        params = varinfo[:]
    end
-    params = map(identity, params)  # Concretise


I did a little benchmark of which of these two ways is faster for turning e.g. Vector{Any}[1, 2] into Vector{Int}[1, 2]. The answer is that the comprehension is a tiny bit faster. Since I was changing that in VNV, I also changed it everywhere else where it's used.

mhauru · 2025-10-29T10:22:50Z

src/varinfo.jl

+Create a new sub-metadata for an NTVarInfo. The type is chosen by the types of existing
+SubMetas.
+"""
+@generated function _new_submetadata(


The generated function is needed because we need to check the types in the NamedTuple to see if any of the other sub-metadatas are VNVs, to know whether to make a new one be a VNV or not. This should have always been done, you can see how Metadata used to be hardcoded as the sub-metadata type in push!! before this PR. I hope to get rid of this later by either having only one possible sub-metadata type, or having the VarInfo have a field that specifies this (like what make_leaf does in #1074).

mhauru · 2025-10-29T10:24:39Z

src/varnamedvector.jl

-# making that change here opens some other cans of worms related to how VarInfo uses
-# BangBang, that I don't want to deal with right now.
-VarNamedVector() = VarNamedVector{VarName,Real}()
+VarNamedVector() = VarNamedVector{Union{},Union{},Union{}}()


This is probably the most important and interesting change. By default VNVs are now created with an element type that allows storing nothing. push!! and setindex!! then expand that element type as needed. You should be able to push!! a value of any type into any VNV, and the container types will just flex. This is paving the way towards not having "untyped" VarInfos at all, but rather having them all be as typed as possible.

mhauru · 2025-10-29T10:25:34Z

src/varnamedvector.jl

 2.0  4.0
 ```
 """
 function loosen_types!!(


I've checked and both this and tighten_types!! are compiled to no-ops if the types are already loose enough/as tight as possible.

I should probably write a test to check this. Will need to think about how to do that.

penelopeysm

Some questions, not really holdups.

Out of curiosity, do you know if there's one change that was responsible for most of the performance improvement, or is it just a bit of everything?

penelopeysm · 2025-10-29T11:16:24Z

src/varnamedvector.jl

    See the extended help with `??VarNamedVector` for more details.
    """
-    num_inactive::OrderedDict{Int,Int}
+    num_inactive::Dict{Int,Int}


maybe premature optimisation, but a Dict{Int,Int} is isomorphic to a sparse Vector{Int}. I wonder if a SparseVector (or, actually, a dense Vector{Int} if we don't expect the vector to be too long) might be faster than a Dict.

Hmm, should maybe try that out, this is performance sensitive code. The length of the Vector is not a problem, we already store multiple Vectors of that length. I think this will come down to which operations we do and how often. If there are often inactive entries, then probably a Vector is just fastest. However, I think most often there are none, and then the isempty(vnv.num_inactive) may be faster with a Dict.

I think the fastest solution, and I wonder if this would be worth it, would be to have num_inactive::VI where {VI<:Union{Nothing,Vector{Int}}, and then we can compile away all the checks when we know that there are no inactive entries (num_inactive === nothing). That adds some code complexity though.

Opened an issue: #1101

penelopeysm · 2025-10-29T11:27:51Z

src/varnamedvector.jl

 julia> getindex_internal(vnv, :)
-8-element Vector{Real}:
+8-element Vector{Float64}:
 46.0
 48.0
-  1
-  2
-  3
-  4
-  5
-  6
+  1.0
+  2.0
+  3.0
+  4.0
+  5.0
+  6.0


This probably makes things faster, but is this really 'correct'? The values that were inserted with setindex!! were Ints.

That's a very good question, and I was wondering as well. The question comes down to whether to use this

julia> promote_type(Int, Float64) Float64

like we do now, or this

julia> Base.promote_typejoin(Int, Float64) Real

For an untyped VNV, converting Ints to Float64s probably makes sense, because it makes any model that mixes the two types a lot faster. However, if we move to always using typed VarInfos, maybe we wouldn't need that, assuming that people don't mix Ints and Floats in a single vector. From a correctness point of view I agree that Int would be better, although you could argue that we should respect Julia's existing choice that promote_type(Int, Float64) === Float64.

So this is the 'classic' situation where over-tightening eltype makes a difference:

using DynamicPPL, Distributions, Random struct RandDie <: Distributions.DiscreteUnivariateDistribution end Base.rand(rng::Random.AbstractRNG, ::RandDie) = rand(rng, 1:6) Distributions.logpdf(::RandDie, x::Int) = (1 <= x <= 6) ? log(1/6) : -Inf @model function roll_die() die_roll ~ RandDie() x ~ Normal() end model = roll_die() # Works md = last(DynamicPPL.init!!(model, VarInfo(DynamicPPL.Metadata()), InitFromPrior())) DynamicPPL.evaluate!!(model, md) # Fails vnv = last(DynamicPPL.init!!(model, VarInfo(DynamicPPL.VarNamedVector()), InitFromPrior())) DynamicPPL.evaluate!!(model, vnv) #= ERROR: MethodError: no method matching logpdf(::RandDie, ::Float64) The function `logpdf` exists, but no method is defined for this combination of argument types. Closest candidates are: logpdf(::Chernoff, ::Real) @ Distributions ~/.julia/packages/Distributions/psM3H/src/univariate/continuous/chernoff.jl:155 logpdf(::Hypergeometric, ::Real) @ Distributions ~/.julia/packages/Distributions/psM3H/src/univariates.jl:645 logpdf(::RandDie, ::Int64) @ Main REPL[14]:1 ... =#

(See also https://pysm.dev/FlexiChains.jl/stable/why/#Heterogeneous-parameter-types which describes the same problem when MCMCChains over-concretises types -- I checked all existing Distributions subtypes of DiscreteUnivariateDistribution and all of them define a method for logpdf(dist, ::Real), so this wouldn't happen e.g. with Poisson, but in principle it can cause issues)

Okay, that looks like a bug. Good spot. I'll change to promote_typejoin.

A note for the future if this comes up again: The difference between promote_typejoin and typejoin is that the former treats Nothing and Missing as exceptions, and returns things like

julia> Base.promote_typejoin(Missing, Float64) Union{Missing, Float64}

rather than

julia> typejoin(Missing, Float64) Any

I think Unions might be desirable, but it probably doesn't matter much, and promote_typejoin isn't exported, so typejoin it is.

penelopeysm · 2025-10-29T12:28:37Z

src/varinfo.jl

 function set_transformed!!(vi::VarInfo, val::Bool)
    for vn in keys(vi)
-        set_transformed!!(vi, val, vn)
+        vi = set_transformed!!(vi, val, vn)
    end


I don't know how Julia works on this front, but is it ever possible that changing what vi is bound to might invalidate the iteration over keys(vi)? I think probably not, but ...??

I guess it's not so much rebinding vi that's the issue, but if set_transformed!! were to mutate vi, then that might be problematic. I suppose it can't mutate it in a way such that keys(vi) would result in something different, so this probably can't error. But I'm a bit paranoid about this kind of code structure. Obviously, this predates this PR.

I think this is fine, the keys(vi) call I think is evaluated once when the loop is started, and it'll keep its reference to the original vi even if the name vi is then bound to something else. Like in

julia> d = Dict(:a => 1) Dict{Symbol, Int64} with 1 entry: :a => 1 julia> ks = keys(d) KeySet for a Dict{Symbol, Int64} with 1 entry. Keys: :a julia> d = Dict(:b => 1) Dict{Symbol, Int64} with 1 entry: :b => 1 julia> collect(ks) 1-element Vector{Symbol}: :a

keys(vi) is a lazy iterator though, so what I would feel dicey about is changing the set of keys in vi mid-iteration. This is a thing:

julia> d = Dict(:a => 1) Dict{Symbol, Int64} with 1 entry: :a => 1 julia> ks = keys(d) KeySet for a Dict{Symbol, Int64} with 1 entry. Keys: :a julia> d[:b] = 2 2 julia> collect(ks) 2-element Vector{Symbol}: :a :b

which can result in weirdness:

julia> d = Dict{Any,Int}(:a => 1, :b => 2) Dict{Any, Int64} with 2 entries: :a => 1 :b => 2 julia> for k in keys(d) d[repr(k)] = d[k] end julia> d Dict{Any, Int64} with 7 entries: :a => 1 :b => 2 "\":b\"" => 2 ":b" => 2 "\"\\\":a\\\"\"" => 1 ":a" => 1 "\":a\"" => 1

I think from a programming perspective, maybe the safest way would be to do a version of keys(vi) that's eagerly evaluated, maybe collect(keys(vi))?

But also from a human perspective it's quite easy to see that keys(vi) can't be changed, so also happy to let it slide.

I would keep it as is, because I'm pretty convinced it's okay (and not due to an implementation detail, but because set_transformed!! has no business touching the keys), and lazy iterators are nice. Could add the call to collect if it bothers you.

It doesn't bother me enough to want to do it.

src/varnamedvector.jl

mhauru · 2025-10-29T13:43:51Z

Out of curiosity, do you know if there's one change that was responsible for most of the performance improvement, or is it just a bit of everything?

The OrderedDict to Dict change made a surprisingly big difference for small models. Seems creating them is a lot slower for abstract key types, and for small models this is actually a big part of the evaluation cost:

julia> d = Dict((i % 2 == 0 ? repr(i) : Symbol(i)) => i for i in 1:100);

julia> typeof(d)
Dict{Any, Int64}

julia> od = OrderedDict(d...);

julia> @btime Dict(d...);
  18.666 μs (210 allocations: 14.59 KiB)

julia> @btime OrderedDict(od...);
  78.458 μs (534 allocations: 182.11 KiB)

Keeping element types more tight was also significant for some cases for untyped VarInfo. For some other models I'm not sure what the important change was.

penelopeysm · 2025-10-29T14:38:49Z

Happy to merge, just curious as to what effects promote_type->typejoin has on benchmarks. (The only reason why I found this 'edge case' was because of https://github.com/TuringLang/Turing.jl/blob/bfa61d8a005801dd75576bfa9e530f033177c9ec/src/stdlib/RandomMeasures.jl#L71-L78)

mhauru · 2025-10-29T16:30:40Z

Same benchmarks as before, except made Smorgasbord consistently unlinked and added a couple that were previously missing:

┌───────────────────────┬───────┬─────────────┬────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │        VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │    mooncake │          typed │  false │           10.5 │             7.5 │
│ Simple assume observe │     1 │    mooncake │   typed_vector │  false │           26.1 │             5.6 │
│ Simple assume observe │     1 │    mooncake │        untyped │  false │          112.4 │             1.5 │
│ Simple assume observe │     1 │    mooncake │ untyped_vector │  false │           20.9 │             6.0 │
│           Smorgasbord │   201 │ reversediff │          typed │  false │         1097.3 │            50.6 │
│           Smorgasbord │   201 │ reversediff │   typed_vector │  false │         1175.7 │            47.4 │
│           Smorgasbord │   201 │ reversediff │        untyped │  false │        11934.7 │             4.7 │
│           Smorgasbord │   201 │ reversediff │ untyped_vector │  false │         1021.6 │            55.5 │
│    Loop univariate 1k │  1000 │    mooncake │          typed │   true │         5160.1 │             5.2 │
│    Loop univariate 1k │  1000 │    mooncake │   typed_vector │   true │         5452.6 │             4.9 │
│    Loop univariate 1k │  1000 │    mooncake │        untyped │   true │         5285.5 │            18.1 │
│    Loop univariate 1k │  1000 │    mooncake │ untyped_vector │   true │         5186.1 │             4.8 │
│       Multivariate 1k │  1000 │    mooncake │          typed │   true │         1246.2 │             8.5 │
│       Multivariate 1k │  1000 │    mooncake │   typed_vector │   true │         1264.5 │             8.5 │
│       Multivariate 1k │  1000 │    mooncake │        untyped │   true │         4804.7 │             2.3 │
│       Multivariate 1k │  1000 │    mooncake │ untyped_vector │   true │         1238.4 │             8.3 │
│   Loop univariate 10k │ 10000 │    mooncake │          typed │   true │        52945.9 │             5.4 │
│   Loop univariate 10k │ 10000 │    mooncake │   typed_vector │   true │        58683.3 │             5.1 │
│   Loop univariate 10k │ 10000 │    mooncake │        untyped │   true │        59518.0 │            16.9 │
│   Loop univariate 10k │ 10000 │    mooncake │ untyped_vector │   true │        55665.7 │             4.9 │
│      Multivariate 10k │ 10000 │    mooncake │          typed │   true │        10894.9 │             9.5 │
│      Multivariate 10k │ 10000 │    mooncake │   typed_vector │   true │        11234.5 │             9.3 │
│      Multivariate 10k │ 10000 │    mooncake │        untyped │   true │        44567.0 │             2.4 │
│      Multivariate 10k │ 10000 │    mooncake │ untyped_vector │   true │        10513.4 │             9.6 │
│               Dynamic │    10 │    mooncake │          typed │   true │          216.8 │             6.3 │
│               Dynamic │    10 │    mooncake │   typed_vector │   true │          287.4 │             5.8 │
│               Dynamic │    10 │    mooncake │        untyped │   true │        crashes │         crashes │
│               Dynamic │    10 │    mooncake │ untyped_vector │   true │          235.1 │             7.0 │
│              Submodel │     1 │    mooncake │          typed │   true │           15.7 │             5.2 │
│              Submodel │     1 │    mooncake │   typed_vector │   true │           28.7 │             5.9 │
│              Submodel │     1 │    mooncake │        untyped │   true │           15.7 │             9.7 │
│              Submodel │     1 │    mooncake │ untyped_vector │   true │           26.1 │             5.7 │
│                   LDA │    12 │ reversediff │          typed │   true │         1329.8 │             2.1 │
│                   LDA │    12 │ reversediff │   typed_vector │   true │         1389.9 │             2.0 │
│                   LDA │    12 │ reversediff │        untyped │   true │        crashes │          crashes│
│                   LDA │    12 │ reversediff │ untyped_vector │   true │        crashes │          crashes│
└───────────────────────┴───────┴─────────────┴────────────────┴────────┴────────────────┴─────────────────┘

I can't see any significant changes to before, except that the LDA one now crashes with untyped_vector with the following method ambiguity that doesn't look like our fault, and that I haven't bothered investigating yet:

ERROR: LoadError: MethodError: (::ReverseDiff.ForwardOptimize{typeof(+)})(::ReverseDiff.TrackedReal{Real, Real, ReverseDiff.TrackedArray{…}}, ::ReverseDiff.TrackedReal{Float64, Float64, Nothing}) is ambiguous.

Candidates:
  (self::ReverseDiff.ForwardOptimize{F})(x::Real, t::ReverseDiff.TrackedReal{V, D}) where {F, V, D}
    @ ReverseDiff ~/.julia/packages/ReverseDiff/rKZaG/src/macros.jl:109
  (self::ReverseDiff.ForwardOptimize{F})(t::ReverseDiff.TrackedReal{V, D}, x::Real) where {F, V, D}
    @ ReverseDiff ~/.julia/packages/ReverseDiff/rKZaG/src/macros.jl:121

Possible fix, define
  (::ReverseDiff.ForwardOptimize{F})(::ReverseDiff.TrackedReal{V, D}, ::ReverseDiff.TrackedReal{V, D}) where {F, V, D, V, D}

For future reference, mostly for myself, here's the selection of benchmarks:

chosen_combinations = [
    (
        "Simple assume observe",
        Models.simple_assume_observe(randn(rng)),
        :typed,
        :mooncake,
        false,
    ),
    (
        "Simple assume observe",
        Models.simple_assume_observe(randn(rng)),
        :typed_vector,
        :mooncake,
        false,
    ),
    (
        "Simple assume observe",
        Models.simple_assume_observe(randn(rng)),
        :untyped,
        :mooncake,
        false,
    ),
    (
        "Simple assume observe",
        Models.simple_assume_observe(randn(rng)),
        :untyped_vector,
        :mooncake,
        false,
    ),
    ("Smorgasbord", smorgasbord_instance, :typed, :reversediff, false),
    ("Smorgasbord", smorgasbord_instance, :typed_vector, :reversediff, false),
    ("Smorgasbord", smorgasbord_instance, :untyped, :reversediff, false),
    ("Smorgasbord", smorgasbord_instance, :untyped_vector, :reversediff, false),
    ("Loop univariate 1k", loop_univariate1k, :typed, :mooncake, true),
    ("Loop univariate 1k", loop_univariate1k, :typed_vector, :mooncake, true),
    ("Loop univariate 1k", loop_univariate1k, :untyped, :mooncake, true),
    ("Loop univariate 1k", loop_univariate1k, :untyped_vector, :mooncake, true),
    ("Multivariate 1k", multivariate1k, :typed, :mooncake, true),
    ("Multivariate 1k", multivariate1k, :typed_vector, :mooncake, true),
    ("Multivariate 1k", multivariate1k, :untyped, :mooncake, true),
    ("Multivariate 1k", multivariate1k, :untyped_vector, :mooncake, true),
    ("Loop univariate 10k", loop_univariate10k, :typed, :mooncake, true),
    ("Loop univariate 10k", loop_univariate10k, :typed_vector, :mooncake, true),
    ("Loop univariate 10k", loop_univariate10k, :untyped, :mooncake, true),
    ("Loop univariate 10k", loop_univariate10k, :untyped_vector, :mooncake, true),
    ("Multivariate 10k", multivariate10k, :typed, :mooncake, true),
    ("Multivariate 10k", multivariate10k, :typed_vector, :mooncake, true),
    ("Multivariate 10k", multivariate10k, :untyped, :mooncake, true),
    ("Multivariate 10k", multivariate10k, :untyped_vector, :mooncake, true),
    ("Dynamic", Models.dynamic(), :typed, :mooncake, true),
    ("Dynamic", Models.dynamic(), :typed_vector, :mooncake, true),
    # ("Dynamic", Models.dynamic(), :untyped, :mooncake, true),
    ("Dynamic", Models.dynamic(), :untyped_vector, :mooncake, true),
    ("Submodel", Models.parent(randn(rng)), :typed, :mooncake, true),
    ("Submodel", Models.parent(randn(rng)), :typed_vector, :mooncake, true),
    ("Submodel", Models.parent(randn(rng)), :untyped, :mooncake, true),
    ("Submodel", Models.parent(randn(rng)), :untyped_vector, :mooncake, true),
    ("LDA", lda_instance, :typed, :reversediff, true),
    ("LDA", lda_instance, :typed_vector, :reversediff, true),
    ("LDA", lda_instance, :untyped, :reversediff, true),
    ("LDA", lda_instance, :untyped_vector, :reversediff, true),
]

penelopeysm

Happy to merge, although it looks to me like Dynamic also has maybe 20% worse performance on typed VNV compared to typed Metadata.

mhauru · 2025-10-29T16:37:09Z

I think our benchmark models just don't hit a case where this would make a difference. Here's one that does:

julia> @model function f()
           x ~ Normal()
           y ~ Poisson()
           0.0 ~ Normal(x, float(y))
       end

       m = f()

       vi_md = DynamicPPL.untyped_varinfo(m)
       vi_vnv = DynamicPPL.untyped_vector_varinfo(m)

       @btime DynamicPPL.evaluate!!(m, vi_md)
       @btime DynamicPPL.evaluate!!(m, vi_vnv);
  2.625 μs (49 allocations: 2.03 KiB)
  2.500 μs (49 allocations: 2.19 KiB)

Whereas using promote_type and converting Ints to Float64s:

  2.606 μs (49 allocations: 2.03 KiB)
  369.362 ns (7 allocations: 416 bytes)

mhauru · 2025-10-29T16:47:05Z

If it makes you feel better, on main the untyped_vector benchmark of Dynamic fails with a StackOverflow and typed gives:

┌─────────┬─────┬────────────┬──────────────┬────────┬────────────────┬─────────────────┐
│   Model │ Dim │ AD Backend │      VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├─────────┼─────┼────────────┼──────────────┼────────┼────────────────┼─────────────────┤
│ Dynamic │  10 │   mooncake │        typed │   true │          250.8 │             5.0 │
│ Dynamic │  10 │   mooncake │ typed_vector │   true │          386.2 │             5.6 │
└─────────┴─────┴────────────┴──────────────┴────────┴────────────────┴─────────────────┘

* Remove unnecessary consistency checks for VarNamedVector * Fix benchmark setting Co-authored-by: Penelope Yong <penelopeysm@gmail.com> * Fix typo * Add two benchmarks * Improvements to VarNamedVector (#1098) * Change VNV to use Dict rather than OrderedDict * Change concretisation from map(identity, x) to a comprehension * Improve tighten_types!! and loosen_types!! * Fix use of set_transformed!! * Fix push!! for VarInfos * Change the default element types in VNV to be Union{} * In untyped_vector_varinfo, don't rely on Metadata * Code style * Run formatter * In VNV, use typejoin rather than promote_type * Bump patch version to 0.38.4 --------- Co-authored-by: Penelope Yong <penelopeysm@gmail.com>

mhauru added 8 commits October 28, 2025 19:09

Change VNV to use Dict rather than OrderedDict

150de71

Change concretisation from map(identity, x) to a comprehension

30ac1d0

Improve tighten_types!! and loosen_types!!

4ae0c6d

Fix use of set_transformed!!

4c8b006

Fix push!! for VarInfos

c8b0b88

Change the default element types in VNV to be Union{}

1f7152b

In untyped_vector_varinfo, don't rely on Metadata

61c96b0

Code style

2a10be9

github-actions bot assigned mhauru Oct 28, 2025

Run formatter

bb83d93

mhauru marked this pull request as ready for review October 29, 2025 10:16

mhauru requested a review from penelopeysm October 29, 2025 10:16

mhauru commented Oct 29, 2025

View reviewed changes

penelopeysm reviewed Oct 29, 2025

View reviewed changes

mhauru mentioned this pull request Oct 29, 2025

Optimise VarNamedVector.num_inactive #1101

Open

mhauru added 3 commits October 29, 2025 14:29

In VNV, use typejoin rather than promote_type

99a7d32

Merge branch 'mhauru/varnamedvector-speed' into mhauru/vnv-speed2

513edc5

Bump patch version to 0.38.4

d30eca8

mhauru requested a review from penelopeysm October 29, 2025 14:34

penelopeysm approved these changes Oct 29, 2025

View reviewed changes

mhauru merged commit 3b8b4a8 into mhauru/varnamedvector-speed Oct 29, 2025
18 of 19 checks passed

mhauru deleted the mhauru/vnv-speed2 branch October 29, 2025 16:47

mhauru mentioned this pull request Oct 30, 2025

VNV type stability improvements and tests #1102

Open

		DynamicPPL.loosen_types!!
		DynamicPPL.tighten_types

Improvements to VarNamedVector #1098

Improvements to VarNamedVector #1098

Uh oh!

Conversation

mhauru commented Oct 28, 2025

Uh oh!

github-actions bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Report for Commit d30eca8

Computer Information

Benchmark Results

Uh oh!

codecov bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Oct 29, 2025

Uh oh!

mhauru commented Oct 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

penelopeysm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

penelopeysm Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mhauru commented Oct 29, 2025

Uh oh!

penelopeysm commented Oct 29, 2025

Uh oh!

mhauru commented Oct 29, 2025

Uh oh!

penelopeysm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhauru commented Oct 29, 2025

Uh oh!

github-actions bot commented Oct 28, 2025 •

edited

Loading

Benchmark Report for Commit `d30eca8`

codecov bot commented Oct 28, 2025 •

edited

Loading

penelopeysm Oct 29, 2025 •

edited

Loading

penelopeysm left a comment •

edited

Loading