Skip to content

Do we need to store linked variables in VarInfo? #836

Open
@mhauru

Description

@mhauru

As far as I can see, the only reason to do linking is because samplers need it. Samplers moreover interface with a vector of floats, not a VarInfo. So really the only place when we need linked variables is when we call getindex_internal or something like vi[:]. If I'm mistaken about this, and there's a different need, all of the below may be moot, so do say so.

This lead me to wonder why we bother storing linked variables at all, rather than converting at getindex/setindex time. Currently our philosophy seems to be that samplers give and take vectors of linked floats, and when interfacing with a sampler we typically call vi[:] and unflatten(vi, vec_of_floats) to read/write those vectors of floats from/to the internal storage of the VarInfo, where the variables are linked if link has been called on vi at some point in its history. Then whenever we need the unlinked versions of variables, such as when calling logpdf, we convert the internal, linked storage to the unlinked one.

An alternative approach would be to say that VarInfo only ever stores unlinked variables, and rather than calling link on vi once at the beginning of sampling and using getindex/setindex, if you would want linked variables you would need to call getindex_linked and setindex_linked or unflatten_linked, and the linking would happen on the fly in that call.

Needs linked                                Needs unlinked 
                                                           
┌──────────┐         ┌───────────┐          ┌─────────────┐
│          │         │           │          │             │
│ Sampler  │ ◄─────► │  VarInfo  │  ◄─────► │ logpdf call │
│          │         │           │          │             │
└──────────┘   ▲     └───────────┘     ▲    └─────────────┘
               │                       │                   
               │                       │                   
                                                           
          Alternative:             Current:                
          (Un)link here            (Un)link here           

I'm not sure if the alternative is better, but I do see some benefits to it:

  • VarInfos would have less state that they carry. Currently when you call getindex_internal or unflatten you need to know/remember whether this VarInfo has been linked at some point in its history (or check for it) to know what you'll get.
  • Sampler interface code would be more explicit, at the call-site, as to whether it wants linked variables or not. This would make the interface code a bit easier to understand.
  • VarInfo would be simpler. We could drop the istrans flags, and never worry about things like having some variables linked and others not. Note that we would still need to carry the linking transformations in the VarInfo, to know how to transform when someone calls e.g. getindex_linked.

This could also help us move towards something like SimpleVarInfo or the @parameters struct of TuringLang/Turing.jl#2492, where the storage in VarInfo would more closely match how the model sees the variables. I haven't thought about this carefully, but I could imagine this leading to some performance gains and maybe code simplification.

This is currently more an idea to be discussed than a proposal to implement. Any thoughts? Downsides to linking at getindex/setindex time?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions