-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A better default dataids(::AbstractArray) #26237
base: master
Are you sure you want to change the base?
Conversation
How about: @generated function dataids(A::AbstractArray)
exprs = Expr[]
if A.mutable
push!(exprs, :(UInt(pointer(A))))
end
append!(exprs, Expr(:..., :(dataids(A.$f))) for f in fieldnames(A))
return :(tuple($(exprs...)))
end I think this subsumes the custom definitions I've added to StaticArrays (https://github.com/JuliaArrays/StaticArrays.jl/pull/376/files) and also would make many of the methods in Base obsolete. Would that make sense? |
That has no reason to be a generated function, it's significantly more complicated, less inferrable, and wrong on v0.7 (due to getproperty overloading): function dataids(A::AbstractArray)
ids = ntuple(i -> dataids(getfield(A, f)), Val(nfields(A)))
if !isimmutable(A)
ids = (UInt(pointer(A)), ids...)
end
return ids
end But I think |
Yes, good points. Should probably be function dataids(A::AbstractArray)
ids = _splatmap(dataids, ntuple(i -> getfield(A, i), Val(nfields(A))))
if !isimmutable(A)
ids = (UInt(pointer_from_objref(A)), ids...)
end
return ids
end BTW, |
That's awesome Martin… but unfortunately it's running into order-dependent type inference. That may also be why you're seeing a difference between These things are rather frustrating, and I always burn more time on it than I'd like when I run into it. I've pushed the version with broken tests… but if I re-evaluate the exact same definition at the REPL and incrementally test it then it works. My understanding from prior issues is that this means that we're sometimes getting a "better" result than expected and that we cannot count on it. |
2359f32
to
eee1b77
Compare
Yes, this abstract method is probably guaranteed never to be inferred reliably. We should probably try to avoid forming a Tuple of the field elements though, since this can be very slow and costly, and instead form a tuple of the |
Is your thinking that we should be forming a homogenous tuple immediately? I'm not sure that's possible, either, as each element of that tuple will be a result from I had really been trying to avoid a generated function here, but I think that's the only path forward. |
Some observations:
|
Looking at the inferabilty issue of dataids(::M1{M2{Array{Int64,1},Int64}})
_splatmap(dataids, ::Tuple{M2{Array{Int64,1},Int64}})
dataids(::M2{Array{Int64,1},Int64})
_splatmap(dataids, ::Tuple{Array{Int64,1},Int64}) Here, Note that if some datatype |
Let's check the fallout here: I expect regressions for three possible reasons:
Will be fun working through the regressions to classify them. Let's hope we don't see any 😄 |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Hooray. That looks great. Now the only question is that access of |
Fine but we'll have to run PkgEval first and make sure that it doesn't actually break anything. |
I'm pretty amazed the benchmarks went so smoothly. How does one run PkgEval? @Keno |
1c153b2
to
c9b5705
Compare
c9b5705
to
67348e0
Compare
Coming back to this, it appears we still had two outstanding classes of error in the ecosystem:
Finally, @sostock had objected to this because it performs fancy automagic aliasing detection but doesn't provide the same sort of automatics in the I've rebased; let's see what has happened in the ecosystem since 2020: @nanosoldier |
The package evaluation job you requested has completed - possible new issues were detected. |
only use pointer_from_objref for Dict itself; otherwise use objectid as some AbstractDicts are immutable
There does still seem to be an issue in MeshArray's @nanosoldier |
Failed to compile (bootstrap); |
Yeah, bootstrap problems — serves me right for making changes blind. 4832890 fixes it, so let's try again: @nanosoldier |
The package evaluation job you requested has completed - possible new issues were detected. |
OK, here's the ecosystem roundup:
|
Ah, I see, these are the inverse case of DefaultArrays — they create So that means that we have four choices:
I was initially leaning towards 2c, but when I went to implement it I realized that it means that we're recursively adding that So I think 2a is probably the only realistic path forward? |
Let's try Pkgeval again, this time also adding back some of the ones that failed when Martin had run it before: @nanosoldier |
The package evaluation job you requested has completed - possible new issues were detected. |
@@ -591,6 +591,8 @@ similar(M::Tridiagonal, ::Type{T}, dims::Union{Dims{1},Dims{2}}) where {T} = sim | |||
# Operations on Tridiagonal matrices | |||
copyto!(dest::Tridiagonal, src::Tridiagonal) = (copyto!(dest.dl, src.dl); copyto!(dest.d, src.d); copyto!(dest.du, src.du); dest) | |||
|
|||
Base.dataids(A::Tridiagonal) = (Base.dataids(A.dl), Base.dataids(A.d), Base.dataids(A.du)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea here that any non-base package also needs to define a similar dataids
method to avoid an error with undef
fields? (when they have those)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they would need to (or otherwise not have the undef fields).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should mention that in the docs along with the other reasons to define custom dataids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean its obscure and no one should do it, but its technically a breaking change
if @generated | ||
:(ids = tuple($([:(dataids(getfield(A, $i))...) for i in 1:fieldcount(A)]...))) | ||
else | ||
ids = _splatmap(dataids, ntuple(i -> getfield(A, i), Val(nfields(A)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check all registered packages for undef fields in AbstractArray
before merging this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PkgEval runs didn't come up with any such arrays to my reading, which is indeed a bit surprising (but welcome)!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha maybe its literally only Tridiagonal
.
How certain are we than package tests actually call dataids
somewhere on all AbstractArray
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question — they'll hit it if they touch any of the builtin implementations of broadcasting, view
, or reshape
... which I'd rate as fairly likely (even if not explicitly as a targeted test, they'd probably do at least one of those in some implementation). I'd think, anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it sounds unlikely none of those are called.
One comment: it's possible the recursive method will lead to false positives in the ecosystem that will have performance consequences. I would guess in AxisArrays.jl as it doesn't define |
So to summarize the status quo of the PR as is implemented:
That's pretty much how this got stuck. Are the above tradeoffs worthwhile? After five years I don't have anything better. |
I was previously callingobjectid
-- I should have been usingpointer_from_objref
. The former is a hash, the latter is much quicker and easier to compute.This PR now implements a more robust default
dataids
definition that attempts to dynamically dig into custom array struct fields. This started as the simpler optimization struck above; that was split off and merged separately at #29854.