Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve indexing to support table behavior #44

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

bgroenks96
Copy link
Collaborator

@bgroenks96 bgroenks96 commented Feb 4, 2022

This is a major overhaul of the model interface to better support Table-like indexing behavior, most importantly through setindex.

The basic idea is that AbstractModels should allow Cartesian indexing, i.e. m[i,j] for both setindex! as well as getindex in order to really act like a table.

This PR refactors the existing implementations of getindex and setindex! to accomplish this, and in the process, expands the functionality of setindex! (and Base.setindex for the immutable case) to permit row indices.

The implementations of update/update! are simplified to be simple wrappers around setindex. In addition, update now provides an additional dispatch which permits the application of filtering rules to generate row indices via do syntax:

updated_m = update(m, p -> p.somefield == somevalue) do p
    # generate values here;
    # returning a scalar will apply the new values to the :val field;
    # returning a row vector, NamedTuple, or any Tables.jl compatible type will update all columns in the table.
    p.val * 2.0
end

This allows the user to apply filters to only update subsets of parameters. This is also possible via setindex! using row indices:

mask = map(p -> p.somefield == somevalue, params(m))
m[mask, :val] = x # x must match in size

This is a draft PR at the moment. More clean up needs to be done and docstrings added at the minimum. There are also some behavioral issues that need to be addressed, like how to handle broadcasting.

Current limitations:

  • Multi-column indexing for getindex is not supported, since it's not clear what the return value should be (named tuples?). It is supported by setindex, however.
  • getindex/setindex on subsets of rows are not in general type stable or allocation free. I think this is acceptable for now given the likely use cases.

Resolves #41
Resolves #21

@bgroenks96 bgroenks96 marked this pull request as draft February 4, 2022 17:28
@bgroenks96 bgroenks96 changed the title Imprvoe indexing to support table behavior Improve indexing to support table behavior Feb 4, 2022
@codecov-commenter
Copy link

codecov-commenter commented Feb 4, 2022

Codecov Report

Merging #44 (561654d) into master (9af3c3a) will increase coverage by 3.02%.
The diff coverage is 86.27%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #44      +/-   ##
==========================================
+ Coverage   79.88%   82.91%   +3.02%     
==========================================
  Files           4        4              
  Lines         174      199      +25     
==========================================
+ Hits          139      165      +26     
+ Misses         35       34       -1     
Impacted Files Coverage Δ
src/tables.jl 77.77% <0.00%> (ø)
src/param.jl 85.00% <50.00%> (-1.85%) ⬇️
src/model.jl 82.55% <87.87%> (+4.77%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9af3c3a...561654d. Read the comment docs.

@bgroenks96 bgroenks96 marked this pull request as ready for review February 6, 2022 21:17
@bgroenks96
Copy link
Collaborator Author

@rafaqz This is ready for feedback.

@bgroenks96
Copy link
Collaborator Author

The allocation test that is failing on CI passes on my system (Julia 1.6), so it must be a compiler version issue.

@bgroenks96
Copy link
Collaborator Author

@rafaqz Do you have any thoughts on what to do about this version-dependent test failure?

@rafaqz
Copy link
Owner

rafaqz commented Feb 8, 2022

So its allocating on 1.3 but not 1.6?

You can put that test in a version specific conditional, or we can stop supporting 1.3

@bgroenks96
Copy link
Collaborator Author

I like the version-specific conditional idea. Let's see if it works now.

@bgroenks96
Copy link
Collaborator Author

Also, since this changes indexing behavior (although should be non-breaking for most things...?), we should probably bump the minor version.

@bgroenks96
Copy link
Collaborator Author

Finally all green! Let me know when you've looked at it, @rafaqz . I'm not 100% happy with the way it turned out... I think the dispatch graph for setindex/getindex could probably be a bit cleaner. But it's probably OK for now so long as it works.

@rafaqz
Copy link
Owner

rafaqz commented Feb 8, 2022

Ok I'll try to have a look tonight.

Copy link
Owner

@rafaqz rafaqz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a huge improvement, thanks for putting the time in.

src/model.jl Outdated Show resolved Hide resolved
src/model.jl Outdated
@inline Base.setindex(m::AbstractModel, xs, i::Integer, ::Colon) = Base.setindex(m, xs, i, filter(!_isreserved, keys(m)))
@inline Base.setindex(m::AbstractModel, xs, i::Integer, ::Type{Val{col}}) where col = _setindex(m, xs, i, Val{col})
@inline function Base.setindex(m::AbstractModel, xs, i, cols::Union{Tuple,AbstractVector})
for col in cols
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the performance of this like? Could cols be a mixed-type tuple?

src/model.jl Outdated Show resolved Hide resolved
src/param.jl Outdated Show resolved Hide resolved
src/model.jl Outdated
end
end
@inline Base.setindex!(m::AbstractModel, xs, col::Union{Symbol,Type{Val}}) = setindex!(m, xs, :, col)
@inline Base.setindex!(m::AbstractModel, xs, i) = setindex!(m, xs, i, :)
Copy link
Owner

@rafaqz rafaqz Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes me a little uneasy guessing row/col like this based on type. Can you do this with any other tables?

And should we maybe make i::Union{Int,Colon,AbstractVector} ?

@bgroenks96
Copy link
Collaborator Author

@rafaqz Do you have any other blocking concerns or can we go ahead and merge this? We could make an issue about potential performance concerns.

@rafaqz
Copy link
Owner

rafaqz commented Feb 10, 2022

Normally we would resolve those two open questions first? One is performance and the other is syntax, which would mean a second breaking change if we change that later.

@bgroenks96
Copy link
Collaborator Author

At the moment, there are no breaking changes. The syntax question is really whether or not it's worth it to make such a breaking change in order to be consistent with other table interfaces. The performance issue is low priority, at least for my use case.

@rafaqz
Copy link
Owner

rafaqz commented Feb 10, 2022

The performance issue will take 5 minutes to fix...

But the general idea is I wouldn't ask that question in a review if I didn't want to discuss it before merging.

@bgroenks96
Copy link
Collaborator Author

It doesn't look that easy to me, but maybe I am missing something. In any case, if you already know how to improve it, please feel free to suggest a solution.

@rafaqz
Copy link
Owner

rafaqz commented Feb 10, 2022

Just use map instead of a loop and the mixed type tuple should be type stable

@bgroenks96
Copy link
Collaborator Author

No. There is a loop there for a reason. As I pointed out in the other comment, it iteratively reconstructs m for each column. It's not a map-like operation. There is almost certainly a better way to do it, but I am quite sure it's not that simple. To use map, you would need to come up with a way to merge all of the reconstructed Models (for each column).

@rafaqz
Copy link
Owner

rafaqz commented Feb 10, 2022

That's a fold.... foldl should also be type stable - if you use StaticModel

@bgroenks96
Copy link
Collaborator Author

Ok, I'll try foldl.

@bgroenks96
Copy link
Collaborator Author

bgroenks96 commented Feb 10, 2022

It makes it type stable but does not have any substantial impact on performance, at least on this test case that I tried:

m = Model(s1)
xs = Tables.columns([(val=5.0,bounds=nothing),(val=4.0,bounds=nothing)])
idxs = [1,2]
cols = (:val,:bounds)
@btime Base.setindex($m, $xs, $idxs, $cols)
# before: 26.140 μs (404 allocations: 23.39 KiB)
@btime Base.setindex($m, $xs, $idxs, $cols)
# after: 25.525 μs (404 allocations: 23.39 KiB)

Using StaticModel shaves off a few allocations but still doesn't make a huge difference:

24.250 μs (395 allocations: 22.61 KiB)

@bgroenks96
Copy link
Collaborator Author

It's also worth noting that indexing tables with Tuple is not standard practice. I don't think Matrix or DataFrames support it.

@bgroenks96 bgroenks96 force-pushed the feature/index-interface branch 2 times, most recently from 3893a37 to a19832c Compare February 11, 2022 16:12
@rafaqz
Copy link
Owner

rafaqz commented Feb 11, 2022

Ah right, maybe we should remove it instead.

@rafaqz
Copy link
Owner

rafaqz commented Feb 11, 2022

Also how is the ttfx for this stuff? Seems that type stability and allocations from it might also take a while to compile.

@bgroenks96
Copy link
Collaborator Author

I just started using this branch in my "real world" use case and it's.... not good. Like almost a full minute to compile on a deeply nested model with about 50 parameters. This is with several calls to setindex/update.

@rafaqz
Copy link
Owner

rafaqz commented Feb 11, 2022

Yeah... the type stability is probably more important for that than for runtime speed. Have a look with @snoopi_deep and Cthulhu.jl, there might be some easy wins.

@bgroenks96
Copy link
Collaborator Author

It's probably going to be reconstruct :(

@bgroenks96
Copy link
Collaborator Author

I guess we should come back to this at some point.... I am no longer convinced that there is a strong enough use case for type-stable table updates that is worth the insanely high compilation cost. Maybe we should just accept type instability here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generalize Array interface to behave like a table Facilities to update Table
3 participants