Skip to content

Conversation

@mhauru
Copy link
Member

@mhauru mhauru commented Oct 28, 2025

This PR is an assortment of somewhat unrelated improvements to VarNamedVector. I've broken the changes up into sensible, roughly atomic units in the commits. I recommend reviewing this PR commit-by-commit, it'll make more sense that way.

The total outcome of this is that VarNamedVector is now on par with Metadata for performance at least for the Smorgasbord model:

┌─────────────┬─────┬─────────────┬────────────────┬────────┬────────────────┬─────────────────┐
│       Model │ Dim │  AD Backend │        VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├─────────────┼─────┼─────────────┼────────────────┼────────┼────────────────┼─────────────────┤
│ Smorgasbord │ 201 │ forwarddiff │          typed │   true │          933.0 │            50.3 │
│ Smorgasbord │ 201 │ forwarddiff │        untyped │  false │        11803.0 │             3.6 │
│ Smorgasbord │ 201 │ forwarddiff │        untyped │   true │          840.4 │            51.5 │
│ Smorgasbord │ 201 │ forwarddiff │   typed_vector │   true │         1025.4 │            48.4 │
│ Smorgasbord │ 201 │ forwarddiff │ untyped_vector │  false │         1014.9 │            40.7 │
│ Smorgasbord │ 201 │ forwarddiff │ untyped_vector │   true │          880.1 │            50.1 │
└─────────────┴─────┴─────────────┴────────────────┴────────┴────────────────┴─────────────────┘

I'll do more benchmarking tomorrow.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 28, 2025

Benchmark Report for Commit d30eca8

Computer Information

Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

┌───────────────────────┬───────┬─────────────┬───────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │           VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │ forwarddiff │             typed │  false │            6.5 │             1.8 │
│           Smorgasbord │   201 │ forwarddiff │             typed │  false │          729.0 │            43.0 │
│           Smorgasbord │   201 │ forwarddiff │ simple_namedtuple │   true │          409.6 │            55.2 │
│           Smorgasbord │   201 │ forwarddiff │           untyped │   true │          779.0 │            36.3 │
│           Smorgasbord │   201 │ forwarddiff │       simple_dict │   true │         6712.0 │            24.9 │
│           Smorgasbord │   201 │ forwarddiff │      typed_vector │   true │          799.3 │            39.5 │
│           Smorgasbord │   201 │ forwarddiff │    untyped_vector │   true │          750.1 │            38.8 │
│           Smorgasbord │   201 │ reversediff │             typed │   true │          901.3 │            45.6 │
│           Smorgasbord │   201 │    mooncake │             typed │   true │          722.1 │             5.7 │
│           Smorgasbord │   201 │      enzyme │             typed │   true │          889.1 │             3.8 │
│    Loop univariate 1k │  1000 │    mooncake │             typed │   true │         3810.8 │             5.8 │
│       Multivariate 1k │  1000 │    mooncake │             typed │   true │          982.2 │             9.0 │
│   Loop univariate 10k │ 10000 │    mooncake │             typed │   true │        41923.3 │             5.3 │
│      Multivariate 10k │ 10000 │    mooncake │             typed │   true │         8659.7 │             9.9 │
│               Dynamic │    10 │    mooncake │             typed │   true │          125.6 │            10.6 │
│              Submodel │     1 │    mooncake │             typed │   true │            8.6 │             6.5 │
│                   LDA │    12 │ reversediff │             typed │   true │          974.9 │             2.0 │
└───────────────────────┴───────┴─────────────┴───────────────────┴────────┴────────────────┴─────────────────┘

@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

❌ Patch coverage is 92.30769% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.17%. Comparing base (1fa3109) to head (d30eca8).
⚠️ Report is 1 commits behind head on mhauru/varnamedvector-speed.

Files with missing lines Patch % Lines
src/varnamedvector.jl 92.68% 3 Missing ⚠️
src/varinfo.jl 93.33% 2 Missing ⚠️
src/debug_utils.jl 0.00% 1 Missing ⚠️
Additional details and impacted files
@@                       Coverage Diff                       @@
##           mhauru/varnamedvector-speed    #1098      +/-   ##
===============================================================
- Coverage                        81.39%   81.17%   -0.23%     
===============================================================
  Files                               40       40              
  Lines                             3763     3793      +30     
===============================================================
+ Hits                              3063     3079      +16     
- Misses                             700      714      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Contributor

DynamicPPL.jl documentation for PR #1098 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1098/

@mhauru
Copy link
Member Author

mhauru commented Oct 29, 2025

More benchmarks. These are structured so that the VNV version is always right after the Metadata version. Two of the Metadata ones crashed and are marked as such. Note that for Metadata calling link tightens some element types, which is why the 3rd line, an untyped, unlinked Metadata run, is exceptionally slow.

┌───────────────────────┬───────┬─────────────┬────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │        VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │    mooncake │          typed │  false │           13.2 │             6.2 │
│ Simple assume observe │     1 │    mooncake │   typed_vector │  false │           26.5 │             5.5 │
│ Simple assume observe │     1 │    mooncake │        untyped │  false │          100.7 │             1.8 │
│ Simple assume observe │     1 │    mooncake │ untyped_vector │  false │           21.2 │             6.0 │
│           Smorgasbord │   201 │ reversediff │          typed │  false │         1144.8 │            47.1 │
│           Smorgasbord │   201 │ reversediff │   typed_vector │  false │         1237.6 │            43.6 │
│           Smorgasbord │   201 │ reversediff │        untyped │   true │          834.7 │            74.5 │
│           Smorgasbord │   201 │ reversediff │ untyped_vector │   true │          903.6 │            68.9 │
│    Loop univariate 1k │  1000 │    mooncake │          typed │   true │         5172.6 │             4.7 │
│    Loop univariate 1k │  1000 │    mooncake │   typed_vector │   true │         5193.8 │             4.8 │
│    Loop univariate 1k │  1000 │    mooncake │        untyped │   true │         5220.3 │            17.3 │
│    Loop univariate 1k │  1000 │    mooncake │ untyped_vector │   true │         5138.2 │             4.7 │
│       Multivariate 1k │  1000 │    mooncake │          typed │   true │         1256.1 │             8.3 │
│       Multivariate 1k │  1000 │    mooncake │   typed_vector │   true │         1274.6 │             8.3 │
│       Multivariate 1k │  1000 │    mooncake │        untyped │   true │         4669.2 │             2.3 │
│       Multivariate 1k │  1000 │    mooncake │ untyped_vector │   true │         1211.0 │             8.4 │
│   Loop univariate 10k │ 10000 │    mooncake │          typed │   true │        51657.6 │             4.9 │
│   Loop univariate 10k │ 10000 │    mooncake │   typed_vector │   true │        56827.6 │             5.0 │
│      Multivariate 10k │ 10000 │    mooncake │          typed │   true │        11460.9 │             9.3 │
│      Multivariate 10k │ 10000 │    mooncake │   typed_vector │   true │        11437.1 │             9.3 │
│               Dynamic │    10 │    mooncake │          typed │   true │          243.8 │             6.3 │
│               Dynamic │    10 │    mooncake │   typed_vector │   true │          296.8 │             5.8 │
│               Dynamic │    10 │    mooncake │        untyped │   true │        crashes │         crashes │
│               Dynamic │    10 │    mooncake │ untyped_vector │   true │          235.3 │             6.9 │
│              Submodel │     1 │    mooncake │          typed │   true │           16.2 │             5.2 │
│              Submodel │     1 │    mooncake │   typed_vector │   true │           29.8 │             5.6 │
│              Submodel │     1 │    mooncake │        untyped │   true │           16.2 │             9.8 │
│              Submodel │     1 │    mooncake │ untyped_vector │   true │           27.1 │             5.6 │
│                   LDA │    12 │ reversediff │          typed │   true │         1401.0 │             1.9 │
│                   LDA │    12 │ reversediff │   typed_vector │   true │         1419.9 │             1.9 │
│                   LDA │    12 │ reversediff │        untyped │   true │        crashes │          crashes│
│                   LDA │    12 │ reversediff │ untyped_vector │   true │         1300.9 │             2.0 │
└───────────────────────┴───────┴─────────────┴────────────────┴────────┴────────────────┴─────────────────┘

Cases where VNV is notably slower are the very simplest model (probably not worth our attenion), submodels (same thing, the model is just so tiny that overheads dominate), and loop univariate 10k, where there's a 10% performance regression between Metadata and VNV. I'll look into that a bit more. However, I just checked and the loop univariate 10k case is 570% slower on main, so still worth merging.

@mhauru mhauru marked this pull request as ready for review October 29, 2025 10:16
@mhauru mhauru requested a review from penelopeysm October 29, 2025 10:16
Comment on lines 416 to -417
DynamicPPL.loosen_types!!
DynamicPPL.tighten_types
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two don't actually ever mutate their inputs. Hence the !! is questionable. However, they sometimes return the original object, sometimes an object that shares memory with the original object, so you should use them like you use !! functions: Always catch the return value and never assume that the return value is independent of the input.

if isnothing(params)
params = varinfo[:]
end
params = map(identity, params) # Concretise
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a little benchmark of which of these two ways is faster for turning e.g. Vector{Any}[1, 2] into Vector{Int}[1, 2]. The answer is that the comprehension is a tiny bit faster. Since I was changing that in VNV, I also changed it everywhere else where it's used.

Create a new sub-metadata for an NTVarInfo. The type is chosen by the types of existing
SubMetas.
"""
@generated function _new_submetadata(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated function is needed because we need to check the types in the NamedTuple to see if any of the other sub-metadatas are VNVs, to know whether to make a new one be a VNV or not. This should have always been done, you can see how Metadata used to be hardcoded as the sub-metadata type in push!! before this PR. I hope to get rid of this later by either having only one possible sub-metadata type, or having the VarInfo have a field that specifies this (like what make_leaf does in #1074).

# making that change here opens some other cans of worms related to how VarInfo uses
# BangBang, that I don't want to deal with right now.
VarNamedVector() = VarNamedVector{VarName,Real}()
VarNamedVector() = VarNamedVector{Union{},Union{},Union{}}()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the most important and interesting change. By default VNVs are now created with an element type that allows storing nothing. push!! and setindex!! then expand that element type as needed. You should be able to push!! a value of any type into any VNV, and the container types will just flex. This is paving the way towards not having "untyped" VarInfos at all, but rather having them all be as typed as possible.

2.0 4.0
```
"""
function loosen_types!!(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked and both this and tighten_types!! are compiled to no-ops if the types are already loose enough/as tight as possible.

I should probably write a test to check this. Will need to think about how to do that.

Copy link
Member

@penelopeysm penelopeysm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions, not really holdups.

Out of curiosity, do you know if there's one change that was responsible for most of the performance improvement, or is it just a bit of everything?

See the extended help with `??VarNamedVector` for more details.
"""
num_inactive::OrderedDict{Int,Int}
num_inactive::Dict{Int,Int}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe premature optimisation, but a Dict{Int,Int} is isomorphic to a sparse Vector{Int}. I wonder if a SparseVector (or, actually, a dense Vector{Int} if we don't expect the vector to be too long) might be faster than a Dict.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, should maybe try that out, this is performance sensitive code. The length of the Vector is not a problem, we already store multiple Vectors of that length. I think this will come down to which operations we do and how often. If there are often inactive entries, then probably a Vector is just fastest. However, I think most often there are none, and then the isempty(vnv.num_inactive) may be faster with a Dict.

I think the fastest solution, and I wonder if this would be worth it, would be to have num_inactive::VI where {VI<:Union{Nothing,Vector{Int}}, and then we can compile away all the checks when we know that there are no inactive entries (num_inactive === nothing). That adds some code complexity though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue: #1101

Comment on lines 123 to 132
julia> getindex_internal(vnv, :)
8-element Vector{Real}:
8-element Vector{Float64}:
46.0
48.0
1
2
3
4
5
6
1.0
2.0
3.0
4.0
5.0
6.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably makes things faster, but is this really 'correct'? The values that were inserted with setindex!! were Ints.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good question, and I was wondering as well. The question comes down to whether to use this

julia> promote_type(Int, Float64)
Float64

like we do now, or this

julia> Base.promote_typejoin(Int, Float64)
Real

For an untyped VNV, converting Ints to Float64s probably makes sense, because it makes any model that mixes the two types a lot faster. However, if we move to always using typed VarInfos, maybe we wouldn't need that, assuming that people don't mix Ints and Floats in a single vector. From a correctness point of view I agree that Int would be better, although you could argue that we should respect Julia's existing choice that promote_type(Int, Float64) === Float64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the 'classic' situation where over-tightening eltype makes a difference:

using DynamicPPL, Distributions, Random

struct RandDie <: Distributions.DiscreteUnivariateDistribution end

Base.rand(rng::Random.AbstractRNG, ::RandDie) = rand(rng, 1:6)
Distributions.logpdf(::RandDie, x::Int) = (1 <= x <= 6) ? log(1/6) : -Inf

@model function roll_die()
    die_roll ~ RandDie()
    x ~ Normal()
end
model = roll_die()

# Works
md = last(DynamicPPL.init!!(model, VarInfo(DynamicPPL.Metadata()), InitFromPrior()))
DynamicPPL.evaluate!!(model, md)

# Fails
vnv = last(DynamicPPL.init!!(model, VarInfo(DynamicPPL.VarNamedVector()), InitFromPrior()))
DynamicPPL.evaluate!!(model, vnv)

#=
ERROR: MethodError: no method matching logpdf(::RandDie, ::Float64)
The function `logpdf` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  logpdf(::Chernoff, ::Real)
   @ Distributions ~/.julia/packages/Distributions/psM3H/src/univariate/continuous/chernoff.jl:155
  logpdf(::Hypergeometric, ::Real)
   @ Distributions ~/.julia/packages/Distributions/psM3H/src/univariates.jl:645
  logpdf(::RandDie, ::Int64)
   @ Main REPL[14]:1
  ...
=#

Copy link
Member

@penelopeysm penelopeysm Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(See also https://pysm.dev/FlexiChains.jl/stable/why/#Heterogeneous-parameter-types which describes the same problem when MCMCChains over-concretises types -- I checked all existing Distributions subtypes of DiscreteUnivariateDistribution and all of them define a method for logpdf(dist, ::Real), so this wouldn't happen e.g. with Poisson, but in principle it can cause issues)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that looks like a bug. Good spot. I'll change to promote_typejoin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note for the future if this comes up again: The difference between promote_typejoin and typejoin is that the former treats Nothing and Missing as exceptions, and returns things like

julia> Base.promote_typejoin(Missing, Float64)
Union{Missing, Float64}

rather than

julia> typejoin(Missing, Float64)
Any

I think Unions might be desirable, but it probably doesn't matter much, and promote_typejoin isn't exported, so typejoin it is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 807 to 810
function set_transformed!!(vi::VarInfo, val::Bool)
for vn in keys(vi)
set_transformed!!(vi, val, vn)
vi = set_transformed!!(vi, val, vn)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how Julia works on this front, but is it ever possible that changing what vi is bound to might invalidate the iteration over keys(vi)? I think probably not, but ...??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's not so much rebinding vi that's the issue, but if set_transformed!! were to mutate vi, then that might be problematic. I suppose it can't mutate it in a way such that keys(vi) would result in something different, so this probably can't error. But I'm a bit paranoid about this kind of code structure. Obviously, this predates this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, the keys(vi) call I think is evaluated once when the loop is started, and it'll keep its reference to the original vi even if the name vi is then bound to something else. Like in

julia> d = Dict(:a => 1)
Dict{Symbol, Int64} with 1 entry:
  :a => 1

julia> ks = keys(d)
KeySet for a Dict{Symbol, Int64} with 1 entry. Keys:
  :a

julia> d = Dict(:b => 1)
Dict{Symbol, Int64} with 1 entry:
  :b => 1

julia> collect(ks)
1-element Vector{Symbol}:
 :a

keys(vi) is a lazy iterator though, so what I would feel dicey about is changing the set of keys in vi mid-iteration. This is a thing:

julia> d = Dict(:a => 1)
Dict{Symbol, Int64} with 1 entry:
  :a => 1

julia> ks = keys(d)
KeySet for a Dict{Symbol, Int64} with 1 entry. Keys:
  :a

julia> d[:b] = 2
2

julia> collect(ks)
2-element Vector{Symbol}:
 :a
 :b

which can result in weirdness:

julia> d = Dict{Any,Int}(:a => 1, :b => 2)
Dict{Any, Int64} with 2 entries:
  :a => 1
  :b => 2

julia> for k in keys(d)
           d[repr(k)] = d[k]
       end

julia> d
Dict{Any, Int64} with 7 entries:
  :a               => 1
  :b               => 2
  "\":b\""         => 2
  ":b"             => 2
  "\"\\\":a\\\"\"" => 1
  ":a"             => 1
  "\":a\""         => 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from a programming perspective, maybe the safest way would be to do a version of keys(vi) that's eagerly evaluated, maybe collect(keys(vi))?

But also from a human perspective it's quite easy to see that keys(vi) can't be changed, so also happy to let it slide.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep it as is, because I'm pretty convinced it's okay (and not due to an implementation detail, but because set_transformed!! has no business touching the keys), and lazy iterators are nice. Could add the call to collect if it bothers you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't bother me enough to want to do it.

@mhauru
Copy link
Member Author

mhauru commented Oct 29, 2025

Out of curiosity, do you know if there's one change that was responsible for most of the performance improvement, or is it just a bit of everything?

The OrderedDict to Dict change made a surprisingly big difference for small models. Seems creating them is a lot slower for abstract key types, and for small models this is actually a big part of the evaluation cost:

julia> d = Dict((i % 2 == 0 ? repr(i) : Symbol(i)) => i for i in 1:100);

julia> typeof(d)
Dict{Any, Int64}

julia> od = OrderedDict(d...);

julia> @btime Dict(d...);
  18.666 μs (210 allocations: 14.59 KiB)

julia> @btime OrderedDict(od...);
  78.458 μs (534 allocations: 182.11 KiB)

Keeping element types more tight was also significant for some cases for untyped VarInfo. For some other models I'm not sure what the important change was.

@mhauru mhauru requested a review from penelopeysm October 29, 2025 14:34
@penelopeysm
Copy link
Member

Happy to merge, just curious as to what effects promote_type->typejoin has on benchmarks. (The only reason why I found this 'edge case' was because of https://github.com/TuringLang/Turing.jl/blob/bfa61d8a005801dd75576bfa9e530f033177c9ec/src/stdlib/RandomMeasures.jl#L71-L78)

@mhauru
Copy link
Member Author

mhauru commented Oct 29, 2025

Same benchmarks as before, except made Smorgasbord consistently unlinked and added a couple that were previously missing:

┌───────────────────────┬───────┬─────────────┬────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │        VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │    mooncake │          typed │  false │           10.5 │             7.5 │
│ Simple assume observe │     1 │    mooncake │   typed_vector │  false │           26.1 │             5.6 │
│ Simple assume observe │     1 │    mooncake │        untyped │  false │          112.4 │             1.5 │
│ Simple assume observe │     1 │    mooncake │ untyped_vector │  false │           20.9 │             6.0 │
│           Smorgasbord │   201 │ reversediff │          typed │  false │         1097.3 │            50.6 │
│           Smorgasbord │   201 │ reversediff │   typed_vector │  false │         1175.7 │            47.4 │
│           Smorgasbord │   201 │ reversediff │        untyped │  false │        11934.7 │             4.7 │
│           Smorgasbord │   201 │ reversediff │ untyped_vector │  false │         1021.6 │            55.5 │
│    Loop univariate 1k │  1000 │    mooncake │          typed │   true │         5160.1 │             5.2 │
│    Loop univariate 1k │  1000 │    mooncake │   typed_vector │   true │         5452.6 │             4.9 │
│    Loop univariate 1k │  1000 │    mooncake │        untyped │   true │         5285.5 │            18.1 │
│    Loop univariate 1k │  1000 │    mooncake │ untyped_vector │   true │         5186.1 │             4.8 │
│       Multivariate 1k │  1000 │    mooncake │          typed │   true │         1246.2 │             8.5 │
│       Multivariate 1k │  1000 │    mooncake │   typed_vector │   true │         1264.5 │             8.5 │
│       Multivariate 1k │  1000 │    mooncake │        untyped │   true │         4804.7 │             2.3 │
│       Multivariate 1k │  1000 │    mooncake │ untyped_vector │   true │         1238.4 │             8.3 │
│   Loop univariate 10k │ 10000 │    mooncake │          typed │   true │        52945.9 │             5.4 │
│   Loop univariate 10k │ 10000 │    mooncake │   typed_vector │   true │        58683.3 │             5.1 │
│   Loop univariate 10k │ 10000 │    mooncake │        untyped │   true │        59518.0 │            16.9 │
│   Loop univariate 10k │ 10000 │    mooncake │ untyped_vector │   true │        55665.7 │             4.9 │
│      Multivariate 10k │ 10000 │    mooncake │          typed │   true │        10894.9 │             9.5 │
│      Multivariate 10k │ 10000 │    mooncake │   typed_vector │   true │        11234.5 │             9.3 │
│      Multivariate 10k │ 10000 │    mooncake │        untyped │   true │        44567.0 │             2.4 │
│      Multivariate 10k │ 10000 │    mooncake │ untyped_vector │   true │        10513.4 │             9.6 │
│               Dynamic │    10 │    mooncake │          typed │   true │          216.8 │             6.3 │
│               Dynamic │    10 │    mooncake │   typed_vector │   true │          287.4 │             5.8 │
│               Dynamic │    10 │    mooncake │        untyped │   true │        crashes │         crashes │
│               Dynamic │    10 │    mooncake │ untyped_vector │   true │          235.1 │             7.0 │
│              Submodel │     1 │    mooncake │          typed │   true │           15.7 │             5.2 │
│              Submodel │     1 │    mooncake │   typed_vector │   true │           28.7 │             5.9 │
│              Submodel │     1 │    mooncake │        untyped │   true │           15.7 │             9.7 │
│              Submodel │     1 │    mooncake │ untyped_vector │   true │           26.1 │             5.7 │
│                   LDA │    12 │ reversediff │          typed │   true │         1329.8 │             2.1 │
│                   LDA │    12 │ reversediff │   typed_vector │   true │         1389.9 │             2.0 │
│                   LDA │    12 │ reversediff │        untyped │   true │        crashes │          crashes│
│                   LDA │    12 │ reversediff │ untyped_vector │   true │        crashes │          crashes│
└───────────────────────┴───────┴─────────────┴────────────────┴────────┴────────────────┴─────────────────┘

I can't see any significant changes to before, except that the LDA one now crashes with untyped_vector with the following method ambiguity that doesn't look like our fault, and that I haven't bothered investigating yet:

ERROR: LoadError: MethodError: (::ReverseDiff.ForwardOptimize{typeof(+)})(::ReverseDiff.TrackedReal{Real, Real, ReverseDiff.TrackedArray{…}}, ::ReverseDiff.TrackedReal{Float64, Float64, Nothing}) is ambiguous.

Candidates:
  (self::ReverseDiff.ForwardOptimize{F})(x::Real, t::ReverseDiff.TrackedReal{V, D}) where {F, V, D}
    @ ReverseDiff ~/.julia/packages/ReverseDiff/rKZaG/src/macros.jl:109
  (self::ReverseDiff.ForwardOptimize{F})(t::ReverseDiff.TrackedReal{V, D}, x::Real) where {F, V, D}
    @ ReverseDiff ~/.julia/packages/ReverseDiff/rKZaG/src/macros.jl:121

Possible fix, define
  (::ReverseDiff.ForwardOptimize{F})(::ReverseDiff.TrackedReal{V, D}, ::ReverseDiff.TrackedReal{V, D}) where {F, V, D, V, D}

For future reference, mostly for myself, here's the selection of benchmarks:

chosen_combinations = [
    (
        "Simple assume observe",
        Models.simple_assume_observe(randn(rng)),
        :typed,
        :mooncake,
        false,
    ),
    (
        "Simple assume observe",
        Models.simple_assume_observe(randn(rng)),
        :typed_vector,
        :mooncake,
        false,
    ),
    (
        "Simple assume observe",
        Models.simple_assume_observe(randn(rng)),
        :untyped,
        :mooncake,
        false,
    ),
    (
        "Simple assume observe",
        Models.simple_assume_observe(randn(rng)),
        :untyped_vector,
        :mooncake,
        false,
    ),
    ("Smorgasbord", smorgasbord_instance, :typed, :reversediff, false),
    ("Smorgasbord", smorgasbord_instance, :typed_vector, :reversediff, false),
    ("Smorgasbord", smorgasbord_instance, :untyped, :reversediff, false),
    ("Smorgasbord", smorgasbord_instance, :untyped_vector, :reversediff, false),
    ("Loop univariate 1k", loop_univariate1k, :typed, :mooncake, true),
    ("Loop univariate 1k", loop_univariate1k, :typed_vector, :mooncake, true),
    ("Loop univariate 1k", loop_univariate1k, :untyped, :mooncake, true),
    ("Loop univariate 1k", loop_univariate1k, :untyped_vector, :mooncake, true),
    ("Multivariate 1k", multivariate1k, :typed, :mooncake, true),
    ("Multivariate 1k", multivariate1k, :typed_vector, :mooncake, true),
    ("Multivariate 1k", multivariate1k, :untyped, :mooncake, true),
    ("Multivariate 1k", multivariate1k, :untyped_vector, :mooncake, true),
    ("Loop univariate 10k", loop_univariate10k, :typed, :mooncake, true),
    ("Loop univariate 10k", loop_univariate10k, :typed_vector, :mooncake, true),
    ("Loop univariate 10k", loop_univariate10k, :untyped, :mooncake, true),
    ("Loop univariate 10k", loop_univariate10k, :untyped_vector, :mooncake, true),
    ("Multivariate 10k", multivariate10k, :typed, :mooncake, true),
    ("Multivariate 10k", multivariate10k, :typed_vector, :mooncake, true),
    ("Multivariate 10k", multivariate10k, :untyped, :mooncake, true),
    ("Multivariate 10k", multivariate10k, :untyped_vector, :mooncake, true),
    ("Dynamic", Models.dynamic(), :typed, :mooncake, true),
    ("Dynamic", Models.dynamic(), :typed_vector, :mooncake, true),
    # ("Dynamic", Models.dynamic(), :untyped, :mooncake, true),
    ("Dynamic", Models.dynamic(), :untyped_vector, :mooncake, true),
    ("Submodel", Models.parent(randn(rng)), :typed, :mooncake, true),
    ("Submodel", Models.parent(randn(rng)), :typed_vector, :mooncake, true),
    ("Submodel", Models.parent(randn(rng)), :untyped, :mooncake, true),
    ("Submodel", Models.parent(randn(rng)), :untyped_vector, :mooncake, true),
    ("LDA", lda_instance, :typed, :reversediff, true),
    ("LDA", lda_instance, :typed_vector, :reversediff, true),
    ("LDA", lda_instance, :untyped, :reversediff, true),
    ("LDA", lda_instance, :untyped_vector, :reversediff, true),
]

Copy link
Member

@penelopeysm penelopeysm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to merge, although it looks to me like Dynamic also has maybe 20% worse performance on typed VNV compared to typed Metadata.

@mhauru
Copy link
Member Author

mhauru commented Oct 29, 2025

I think our benchmark models just don't hit a case where this would make a difference. Here's one that does:

julia> @model function f()
           x ~ Normal()
           y ~ Poisson()
           0.0 ~ Normal(x, float(y))
       end

       m = f()

       vi_md = DynamicPPL.untyped_varinfo(m)
       vi_vnv = DynamicPPL.untyped_vector_varinfo(m)

       @btime DynamicPPL.evaluate!!(m, vi_md)
       @btime DynamicPPL.evaluate!!(m, vi_vnv);
  2.625 μs (49 allocations: 2.03 KiB)
  2.500 μs (49 allocations: 2.19 KiB)

Whereas using promote_type and converting Ints to Float64s:

  2.606 μs (49 allocations: 2.03 KiB)
  369.362 ns (7 allocations: 416 bytes)

@mhauru
Copy link
Member Author

mhauru commented Oct 29, 2025

If it makes you feel better, on main the untyped_vector benchmark of Dynamic fails with a StackOverflow and typed gives:

┌─────────┬─────┬────────────┬──────────────┬────────┬────────────────┬─────────────────┐
│   Model │ Dim │ AD Backend │      VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├─────────┼─────┼────────────┼──────────────┼────────┼────────────────┼─────────────────┤
│ Dynamic │  10 │   mooncake │        typed │   true │          250.8 │             5.0 │
│ Dynamic │  10 │   mooncake │ typed_vector │   true │          386.2 │             5.6 │
└─────────┴─────┴────────────┴──────────────┴────────┴────────────────┴─────────────────┘

@mhauru mhauru merged commit 3b8b4a8 into mhauru/varnamedvector-speed Oct 29, 2025
18 of 19 checks passed
@mhauru mhauru deleted the mhauru/vnv-speed2 branch October 29, 2025 16:47
github-merge-queue bot pushed a commit that referenced this pull request Oct 29, 2025
* Remove unnecessary consistency checks for VarNamedVector

* Fix benchmark setting

Co-authored-by: Penelope Yong <penelopeysm@gmail.com>

* Fix typo

* Add two benchmarks

* Improvements to VarNamedVector (#1098)

* Change VNV to use Dict rather than OrderedDict

* Change concretisation from map(identity, x) to a comprehension

* Improve tighten_types!! and loosen_types!!

* Fix use of set_transformed!!

* Fix push!! for VarInfos

* Change the default element types in VNV to be Union{}

* In untyped_vector_varinfo, don't rely on Metadata

* Code style

* Run formatter

* In VNV, use typejoin rather than promote_type

* Bump patch version to 0.38.4

---------

Co-authored-by: Penelope Yong <penelopeysm@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants