Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random: introduce gentype, instead of punning on eltype #27756

Merged
merged 2 commits into from
Jul 10, 2018

Conversation

rfourquet
Copy link
Member

In some cases it makes sense to define what type of value rand(rng, x)
will produce, via the newly introduced gentype(x), without having
eltype(x) be meaningful.

@rfourquet rfourquet added the randomness Random number generation and the Random stdlib label Jun 24, 2018
@andreasnoack
Copy link
Member

In some cases

Could you provide an example with such a case?

@rfourquet
Copy link
Member Author

I initially used eltype because it just worked and didn't think too much about it, but with #26852 (to which I reacted with #26940) I expected eltype to not be defined by default anymore, until I discovered that that PR was finally closed. That and me starting to write documentation about the Sampler business prompted me to introduce gentype here. eltype is meant to be defined for collections, and eltype(x) == eltype(typeof(x)) should hold for a non-type object x, while gentype(x) is meant to indicate what is the type of the object returned by rand(x).

Two main use-cases for random generation:

  1. rand(Int), which produces an Int: this was working because an Int happens to be iterable, so eltype(1) == eltype(Int) == Int. But for a custom type NonNumber, typically rand will be defined such that rand(NonNumber) returns a NonNumber object, for which eltype is non meaningfull if it's not a collection
  2. rand(collection), which returns an element from the collection; but it should be possible to define rand(noncollection) where noncollection is neither a collection nor a type.

As by default eltype(x) == Any, keeping eltype instead of gentype would not have dramatic consequences, but this would give suboptimal typing, e.g. rand(NonNumber, 10) would produce a Vector{Any}; the user would need to define eltype(::Type{NonNumber}) = NonNumber in order to get tighter typing: but it's cleaner to ask the user to define instead Random.gentype(::Type{NonNumber}) = NonNumber. Similarly for noncollection.

Actually, going away from eltype allows to give (what appear to me) sane defaults:

  • gentype(::Type{T}) where {T} = T, which allows the creator of NonNumber to not have to define herself gentype(::Type{NonNumber})
  • gentype(x) = eltype(x), which takes care of the cases where x is a collection.

So gentype needs to be defined explicitly only for noncollection and for MyType where rand(MyType) doesn't return a MyType object.

Finally, there is an example where one would like to define both rand(x) and rand(typeof(x)), generating different things, for example:

struct Die
    sides::Vector{Int}
end

rand(::Type{Die}) = Die(rand(1:12, 6))
rand(d::Die) = rand(d.sides)
gentype(::Die) = Int

I.e. we don't necessarily want gentype(x) == gentype(typeof(x)), as is the case for eltype (note: the "real" way to define rand is slightly more verbose than in this example).

@rfourquet
Copy link
Member Author

I will merge by tomorrow if no objection.

@Keno
Copy link
Member

Keno commented Jun 27, 2018

Can you give an example where it would be useful to have gentype be user extendible? In the example above, it seems like eltype(::Type{Die}) = Int would be a sane definition.

@JeffBezanson
Copy link
Member

I guess we would also have gentype(::Type{Die}) = Die. But that API is a bit questionable, since the point of a function like this is to find out something without having an instance; i.e. find out what rand(::Die) would return without having a Die.

@rfourquet
Copy link
Member Author

In the example above, it seems like eltype(::Type{Die}) = Int would be a sane definition.

Indeed, in this example. That said, here the gentype default fallback allows you to not have to define yourself gentype(::Type{Die}) = Die.
An example where it's useful to have user-defined gentype for non-types is when an object represents a set of value, like Random.CloseOpen01() which specifies the set [0, 1), without being per-se a collection (I also had made a proposition to have Random.CloseOpen(a, b) specify the set [a, b), which would lead to over-compilation if this was lifted to the type-level, e.g. CloseOpen{A,B}).

Again, we could keep eltype instead of gentype, but this would lead people to have to define eltype(::Type{MyType}) = MyType anytime they want to define rand(MyType), at least if they want good typing for array calls; which is confusing as an instance of MyType isn't generally a collection of MyType objects (except for numbers).

@Keno
Copy link
Member

Keno commented Jun 27, 2018

Again, we could keep eltype instead of gentype, but this would lead people to have to define eltype(::Type{MyType}) = MyType anytime they want to define rand(MyType), at least if they want good typing for array calls; which is confusing as an instance of MyType isn't generally a collection of MyType objects (except for numbers).

Couldn't we just hardcode that types generate elements of themselves and only use eltype for non-types. I realize that matches the gentype behavior in this PR, but I think introducing an extra generic function only makes sense if we want it as a customization point.

@rfourquet
Copy link
Member Author

rfourquet commented Jun 27, 2018

But that API is a bit questionable, since the point of a function like this is to find out something without having an instance

I agree with this point, and I'm not clear to which extent it turns out to be problematic in practice.

One workaround, which is somewhat ugly, would be to use instead:

# users define those, which have defaults:
gentype(::Type{T}) where T = eltype(T) # for objects, useful default for e.g. rand(1:9)
gentype(::Type{Type{T}}) where T = T # for types, useful default for e.g. rand(Die)

# in the Random library, get the gentype for an object when only an instance is available, not its type:
gentypefromvalue(::X) where X = gentype(X) 
gentypefromvalue(::Type{X}) where X = gentype(Type{X})

@rfourquet
Copy link
Member Author

I think introducing an extra generic function only makes sense if we want it as a customization point.

In that respect, I indeed don't imagine yet a case where one would want eltype(x) != gentype(x). So if we keep eltype:

  • "we just hardcode that types generate elements of themselves"
  • collections with an eltype must return an element of that type when rand is called on them
  • non-collection objects (e.g. CloseOpen01) must also define eltype

I don't really like the first and 3rd bullets, but I guess it's not a big deal. On the other hand, with gentype, most users would never have to deal with it, and it allows a bit or two of extra-flexibility and avoids punning on eltype, so I don't see a problem either with introducing it.

@StefanKarpinski
Copy link
Member

Very high-level comment, I get very suspicious of an API when it needs to do this kind of fiddly disambiguation between operating in the value domain and the type domain.

@rfourquet
Copy link
Member Author

I get very suspicious of an API when it needs to do this kind of fiddly disambiguation between operating in the value domain and the type domain.

I don't disagree, and it's a bit tricky to find the right design here for this reason, but the fact is that for rand, types and non-type objects are used as values... like in the Die example, rand(Die) and rand(Die(...) don't produce the same type of objects.

Given the feedback, I see two solutions forward:

  1. status quo, document that rand(::Type{T}) must produce an object of type T, and rand(object, 3) will produce an Vector{eltype(object)}, so eltype should be defined for tight typing.
  2. introduce gentype, but only for non-type objects: rand(::Type{T}) is still hard-coded to produce an object of type T. This allows to not eliminate this unsatisfying possibility of gentype(x) != gentype(typeof(x)). It's still possible to keep gentype as an internal API (and to "officially" recommend to use eltype, which is the default fall-back of gentype), to give more time to think before commiting to it.

I prefer option 2), as again I don't like punning on eltype; e.g. CloseOpen01() is not a collection, and Sampler(rng, [1, 2, 3]) even less; but with 1), they would need to have eltype defined. But it's not such big a deal either, I don't see yet a use-case where 1) is limiting in terms of functionality.

@rfourquet
Copy link
Member Author

Maybe a small vote: 👍 here for gentype (option 2 above), and 👎 for status quo (option 1).

@rfourquet
Copy link
Member Author

I updated according to option 2 above, and will merge by tomorrow if no decision is taken otherwise (for option 1). In the documentation I wrote, there is no mention of gentype so this will for now remain an implementation detail.

In some cases it makes sense to define what type of value `rand(rng, x)`
will produce, via the newly introduced `gentype(x)`, without having
`eltype(x)` be meaningful.
@ararslan ararslan mentioned this pull request Jul 10, 2018
7 tasks
@rfourquet rfourquet merged commit 8a22d90 into master Jul 10, 2018
@rfourquet rfourquet deleted the rf/rand/gentype branch July 10, 2018 21:05
@tpapp tpapp mentioned this pull request May 8, 2019
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants