Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement and document getfields #54

Merged
merged 21 commits into from
Jul 4, 2022
Merged

implement and document getfields #54

merged 21 commits into from
Jul 4, 2022

Conversation

jw3126
Copy link
Member

@jw3126 jw3126 commented Mar 26, 2022

fix #53

@jw3126 jw3126 requested a review from rafaqz March 26, 2022 19:45
@codecov-commenter
Copy link

codecov-commenter commented Mar 26, 2022

Codecov Report

Merging #54 (cd06f8c) into master (518a6b8) will decrease coverage by 65.81%.
The diff coverage is 59.25%.

@@             Coverage Diff              @@
##            master      #54       +/-   ##
============================================
- Coverage   100.00%   34.18%   -65.82%     
============================================
  Files            3        3               
  Lines          113      117        +4     
============================================
- Hits           113       40       -73     
- Misses           0       77       +77     
Impacted Files Coverage Δ
src/nonstandard.jl 0.00% <0.00%> (-100.00%) ⬇️
src/ConstructionBase.jl 45.97% <61.53%> (-54.03%) ⬇️
src/functions.jl 0.00% <0.00%> (-100.00%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 518a6b8...cd06f8c. Read the comment docs.

@jw3126
Copy link
Member Author

jw3126 commented Mar 29, 2022

@rafaqz if you don't have time for a full review, can you maybe just give a quick feedback on whether you think we should have a fieldvalues function or not?

@rafaqz
Copy link
Member

rafaqz commented Mar 29, 2022

I don't totally understand how this interacts with getproperties ?

@jw3126
Copy link
Member Author

jw3126 commented Mar 30, 2022

The relation between fieldvalues and getproperties is analogous to the relation between getfield and getproperty.

  • getproperties and getproperty can be overloaded, for instance, to hide private details.
  • fieldvalues and getfield should not be overloaded. They expose the raw object including any private fields.

Is this the kind of answer you were looking for?

@rafaqz
Copy link
Member

rafaqz commented Mar 30, 2022

Should it maybe be getfields then?

@rafaqz rafaqz changed the title implement and document fieldvalues implement and document getfields Mar 30, 2022
@@ -32,17 +32,17 @@ julia> constructorof(S)(1,2,4)
ERROR: AssertionError: a + b == checksum
```
Instead `constructor` can be any object that satisfies the following properties:
* It must be possible to reconstruct an object from the `NamedTuple` returned by
`getproperties`:
* It must be possible to reconstruct an object from the `Tuple` returned by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this still be getproperties? I'm a little confused about how this is going to work. We have getproperties methods that let us ignore undefined fields and provide multiple constructorof implementations depending on the number of fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was, that constructorof should accept all private + public fields. Additional methods are optional. But maybe that is a bad idea?

Copy link
Member

@rafaqz rafaqz Mar 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is accessing undefined fields is an error, and they exist in a few Base objects. But we can special case getfields instead of getoroperties to check if fields are defined?

Copy link
Member Author

@jw3126 jw3126 Mar 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can not have undefined fields in user defined types right? So we could take care of all builtin types that have these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we could special case in getfields for undefined fields for Base objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some objects like that here that currently use getproperties

@@ -35,6 +35,7 @@ end
function tridiagonal_constructor(dl::V, d::V, du::V, du2::V) where {V<:AbstractVector{T}} where T
Tridiagonal{T,V}(dl, d, du, du2)
end
getfields(o::Tridiagonal) = Tuple(getproperties(o))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we swap these? getfields depending on getproperties is a bit strange. I'm also not clear why we lose the field names in getfields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you think getfields should return a NamedTuple (with the exception of getfields(::Tuple)::Tuple)? I like that idea, we could then have getproperties(obj) = getfields(obj) as a default implementation.

@oschulz
Copy link

oschulz commented Apr 4, 2022

This currently says "The semantics of getfields should not be changed for user defined types." Maybe we could be a little bit less strict and allow leaving out fields in special cases? There are types that don't allow passing all fields in the ctor (e.g. because a field is always calculated to ensure consistency, and other cases). And types that need a custom rrule for their ctor may need a matching getfields as well. I'm using ConstructionBase in ForwardDiffPullbacks now, AD and ConstructionBase do go together very nicely in principle (it also has a _fieldvals function which would be replaced by the new ConstructionBase.getfields).

@rafaqz
Copy link
Member

rafaqz commented Apr 4, 2022

Yes we need that flexibility for undefined fields as well.

@oschulz
Copy link

oschulz commented Apr 4, 2022

@rafaqz , will Flatten.flatten use ConstructionBase.getfields for it's default implementation in the future?

@rafaqz
Copy link
Member

rafaqz commented Apr 4, 2022

Yes.

@oschulz
Copy link

oschulz commented Apr 4, 2022

Yes.

Awesome, thanks!

@jw3126
Copy link
Member Author

jw3126 commented Apr 4, 2022

@oschulz I see your case. I find it a tricky design problem. We would like to

  • Support as many use cases as possible
  • ConstructionBase API should be small and simple

Does getproperties work for you?

@oschulz
Copy link

oschulz commented Apr 4, 2022

Does getproperties work for you?

Well, it should be the one that is, by contract, the official "inverse" of constructorof, I think.

@jw3126
Copy link
Member Author

jw3126 commented Apr 4, 2022

Yes we need that flexibility for undefined fields as well.

I view undefined fields as a code smell, but maybe there are good reasons for doing this sometimes? We need to support it in Base, because Base is important but otherwise I don't think we should make an effort to support this case?

@oschulz
Copy link

oschulz commented Apr 4, 2022

@piever, can I pull you in here? I think this might be interesting for StructArrays.jl.

@jw3126
Copy link
Member Author

jw3126 commented Apr 4, 2022

Well, it should be the one that is, by contract, the official "inverse" of constructorof, I think.

Yes, but I think there are situations where you would like to pass all the private details to constructorof and then there are cases where you don't want to calculate the computed fields yourself.
We could suggest to use overloading to also support constructorof(getproperties(obj)...)?

@oschulz
Copy link

oschulz commented Apr 4, 2022

@piever, can I pull you in here? I think this might be interesting for StructArrays.jl.

@piever, maybe I should elaborate: If I'm not mistaken, ConstructionBase.constructorof(T)(args...) plays the same role as StructArrays.createinstance(T, args...). If StructArrays would support ConstructionBase, it would be easier to make tricky types compatible with StructArrays, needing only ConstructionBase as a very lightweight dependency. But what contract would StructArrays expect from ConstructionBase.getfields?

@oschulz
Copy link

oschulz commented Apr 4, 2022

We could suggest to use overloading to also support constructorof(getproperties(obj)...)

I like it - if constructorof(T) would be required to handle the output of both getfields(::T) and getproperties(::T) we'd have a clean solution, and applications could use the type of flattening (raw/low-level vs. semantic) they require.

@jw3126
Copy link
Member Author

jw3126 commented Jun 30, 2022

I would like to merge this. If we want bigger API changes like the ones suggested by @oschulz that can be a separate PR.
On master the docs are pretty confusing about what functions are about raw or semantic part of objects. This PR solves that issue.

@aplavin
Copy link
Member

aplavin commented Jun 30, 2022

@jw3126 I'm curious of the motivation to change constructorof correspondence: it was related to getproperties before, now changing to getfields.

Feels like constructing an object from its "semantic" properties is more widely useful, compared to "raw" internal fields. With this change, there would be no way to create an object of a given type from its semantic properties: setpropeties requires an already existing instance of the type.

What are common types where this fields/properties difference matters? What motivates this constructorof change?

@aplavin
Copy link
Member

aplavin commented Jun 30, 2022

Actually, the constructorof is still interpreted related to properties, not fields, even after this PR:

function setproperties_object(obj, patch)
nt = getproperties(obj)
nt_new = merge(nt, patch)
validate_setproperties_result(nt_new, nt, obj, patch)
constructorof(typeof(obj))(Tuple(nt_new)...)
end
.

@jw3126
Copy link
Member Author

jw3126 commented Jun 30, 2022

Thanks @aplavin these are very good questions + observations.

I'm curious of the motivation to change constructorof correspondence: it was related to getproperties before, now changing to getfields.

It was related to getproperies mostly because we did not have getfields and tried to dodge the issue. This was very awkward,
in some places it was assumed that semantic properties work in others fields.

Feels like constructing an object from its "semantic" properties is more widely useful, compared to "raw" internal fields.

Hard to judge what is more common. Fortunately in most cases there is no difference. Examples where you want raw fields are something like Flatten or convert a nested struct to GPU.

With this change, there would be no way to create an object of a given type from its semantic properties: setpropeties requires an already existing instance of the type.

I don't know what is more common. However right now one can neither rely on constructorof working correctly with fields nor with properties I would say.
Also the information stored in an object is 1:1 with its fields + constructorof. This does not hold for properties, there can be additional computed properties or private information not exposed via properties etc.
So constructorof(typeof(obj))(getproperties(obj)...) is a much more "ill posed" than
constructorof(typeof(obj))(getfields(obj)...)

Do you have use cases for constructing from properties without existing object? If there are sufficient use cases for this, we can think about adding it.

What are common types where this fields/properties difference matters? What motivates this constructorof change?

Personally I think overloading propertynames and friends should be done very rarely and getter/setter functions or lenses should be prefered. propertynames overloading can be useful, when interfacing with some "foreign" objects. PyCall is a good example, where it is awesome to do object.method etc.

Actually, the constructorof is still interpreted related to properties, not fields, even after this PR:

function setproperties_object(obj, patch)
nt = getproperties(obj)
nt_new = merge(nt, patch)
validate_setproperties_result(nt_new, nt, obj, patch)
constructorof(typeof(obj))(Tuple(nt_new)...)
end

.

This is just the fallback definition. How else would you implement a fallback setproperties?

@aplavin
Copy link
Member

aplavin commented Jun 30, 2022

Thanks for the detailed response! I agree that it was quite inconsistent before, what the relation between fields/properties/constructorof is.

Examples where you want raw fields are something like Flatten or convert a nested struct to GPU.

I've not used either, could you please clarify why properties don't work in this case?

@ToucheSir
Copy link

Examples where you want raw fields are something like Flatten or convert a nested struct to GPU.

I would argue these cases are exactly where you'd want to use "semantic" fields (with a fallback to raw fields) because it permits additional flexibility on the library author's end. e.g. if I have a type

struct MyModel
  non_trainable_coefficients::Vector{Int}
  # other fields ...
end

I would not want non_trainable_coefficients to be considered for flattening or GPU conversion, but excluding it from get_raw_contents doesn't make sense either since you can't faithfully reconstruct a MyModel without it. This is not to say that get_semantic_contents should be the universal default (you can see me arguing the opposite in JuliaGaussianProcesses/ParameterHandling.jl#43 (comment), after all).

@jw3126
Copy link
Member Author

jw3126 commented Jun 30, 2022

I would argue these cases are exactly where you'd want to use "semantic" fields (with a fallback to raw fields) because it permits additional flexibility on the library author's end. e.g. if I have a type

For trainable parameters I think the right abstraction is an optic. This way you have fine grained runtime control on which params to optimize. You should neither getproperties not getfields let dictate what to optimize. But one of these can be used as a fallback.
I would use getfields, but don't feel very strongly about it.

In contrast to params for move to GPU you probably want to move always the same substructure. Again this should be customizable and neither be dictated by getproperties or getfields. One of these should be used as a default and here I think in most cases you want to move every Array to GPU so I would choose getfields.

I've not used either, could you please clarify why properties don't work in this case?

I think for flatten you want just to "reshape" information so getfields is the right thing as it maps 1:1 with the struct contents. But I don't do much flattening @rafaqz is the expert.

@oschulz
Copy link

oschulz commented Jun 30, 2022

If we want bigger API changes like the ones suggested by @oschulz that can be a separate PR.

I agree, we should get this merged and think about API extensions after.

@oschulz
Copy link

oschulz commented Jun 30, 2022

@ToucheSir: I would argue these cases are exactly where you'd want to use "semantic" fields

Yes, I also think that in the end, we'll need to support both cases (like discussed above): raw fields and semantic fields, and (re-)construction from them.

@ToucheSir
Copy link

I missed that this PR doesn't include get/reconstruct_from_*_contents! Can another thread be spun off to talk about all the extensions then so this one isn't being held up?

@jw3126
Copy link
Member Author

jw3126 commented Jul 1, 2022

@rafaqz any objections to merging this?

@rafaqz
Copy link
Member

rafaqz commented Jul 1, 2022

Sorry I dont have enough bandwidth to object to anything 😅

@aplavin
Copy link
Member

aplavin commented Jul 1, 2022

This is just the fallback definition. How else would you implement a fallback setproperties?

When asking, I was thinking of how to avoid the worst kind of errors - the code executes, but is silently wrong. I think that the only way is to restrict this fallback definition

function setproperties_object(obj, patch)
nt = getproperties(obj)
nt_new = merge(nt, patch)
validate_setproperties_result(nt_new, nt, obj, patch)
constructorof(typeof(obj))(Tuple(nt_new)...)
end
to objects with fieldnames == propertynames, by calling
@generated function check_properties_are_fields(obj)

. Otherwise, whenever the number of properties and fields is the same, setproperties would "work" but give wrong results.

For example:

julia> using StructArrays, ConstructionBase

julia> a = StructArray(x=1:3)
3-element StructArray(::UnitRange{Int64}) with eltype NamedTuple{(:x,), Tuple{Int64}}:
 (x = 1,)
 (x = 2,)
 (x = 3,)

# OK:
julia> getproperties(a)
(x = 1:3,)

# not OK:
julia> setproperties(a, getproperties(a))
0-element StructArray() with eltype Int64 with indices 1:0

@jw3126
Copy link
Member Author

jw3126 commented Jul 1, 2022

I think that the only way is to restrict this fallback definition

That is a great idea!

function setproperties_object(obj, patch)
check_properties_are_fields(obj)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aplavin I added a check as you suggested.

@jw3126 jw3126 merged commit a268e00 into master Jul 4, 2022
@aplavin aplavin mentioned this pull request Feb 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clarification of fieldvalues, API and related packages
7 participants