Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PoC] Add the Container module for stable and finite collections #12092

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

HertzDevil
Copy link
Contributor

This is a proof of concept for #3790 (comment). It relaxes Indexable.cartesian_product's requirement from an Indexable(Indexable) to a Container(Container). Since Set has a finite size and provides stable iteration, it satisfies Container and one can now do:

x = Container.cartesian_product([Set{1, 2, 3}, Set{'a', 'b', 'c'}])
x # => [[1, 'a'], [1, 'b'], [1, 'c'], [2, 'a'], [2, 'b'], [2, 'c'], [3, 'a'], [3, 'b'], [3, 'c']]

x = Container.cartesian_product(Set{Set{1, 2, 3}, Set{'a', 'b', 'c'}})
x # => [[1, 'a'], [1, 'b'], [1, 'c'], [2, 'a'], [2, 'b'], [2, 'c'], [3, 'a'], [3, 'b'], [3, 'c']]

# `Hash` is also a `Container`
x = Container.cartesian_product([{1 => 2, 3 => 4}, {'a' => "b", 'c' => "d"}])
x # => [[{1, 2}, {'a', "b"}], [{1, 2}, {'c', "d"}], [{3, 4}, {'a', "b"}], [{3, 4}, {'c', "d"}]]

# every `Hash(K, V)` is also a `Container(Tuple(K, V))`, and
# `Tuple(K, V) <= Container(K | V)`, so restriction matches in a non-strict way
# each key-value pair is interpreted as an inner set
x = Container.cartesian_product({1 => 'a', 2 => 'b', 3 => 'c'})
x # => [[1, 2, 3], [1, 2, 'c'], [1, 'b', 3], [1, 'b', 'c'], ['a', 2, 3], ['a', 2, 'c'], ['a', 'b', 3], ['a', 'b', 'c']]

This can be easily generalized to the instance methods and the Iterator-returning variants.

Compared to the existing implementation, Container.each_cartesian simply converts all the inner Sets into Arrays (containers = containers.map { |c| dup_as_array(c) }), because we can now assert that Container#to_a is always safe to call.

Some afterthoughts:

  • If every Container can be safely converted to an Array, that means all methods in Indexable can be moved to Container if every method does this conversion. For example, it becomes possible to call Set#bsearch, and the result is "meaningful" provided that the Set's members are sorted with respect to the iteration order. Do we want this?
  • An idea is that a method can be moved to Container if and only if it makes sense in an unordered container. #sample is one such method, and so is .cartesian_product if we conceptually view its return value as a set. Perhaps the requirement on stable iteration order can be dropped, i.e. multiple calls to #each(&) are allowed to permute the elements, but not add or remove any.
  • Assuming the iteration order guarantee stays, even #index can be moved into Container if we interpret the return value as one that is consistent with #each_with_index:
    Set{'a', 'b', 'c', 'd'}.index('c') # => 2
    Set{'d', 'c', 'b', 'a'}.index('c') # => 2
    That in turn means Set satisfies Indexable, which is definitely unexpected. How do we tell that a Container is not Indexable?
  • One approach might be requiring O(1) complexity in Indexable#unsafe_fetch, which is most certainly not the case for Set. This is more apparent with XML::Attributes, a container backed by a singly linked list. Its #size is exactly the one from Enumerable, and is O(n); same for #[](Int).

IMHO this Container is too powerful and the way cartesian_product works on Hashes is rather confusing. Maybe it is fine that we don't support Set arguments after all?

@jhass
Copy link
Member

jhass commented Jun 7, 2022

Yes, I feel like this might be going a bit too far. I'd rather have a bit of code duplication in standard library than imposing too many generic container'ish interfaces onto users.

I also think pushing some necessary conversion onto the caller sides is fine. It may actually make code easier to reason about, albeit a bit more verbose, compared to hiding them behind stdlib interfaces.

@asterite
Copy link
Member

asterite commented Jun 7, 2022

Maybe the name Container is a bit vague? I think something like EnumerableWithKnowSize or similar, even if a bit verbose, is better. Or maybe FiniteEnumerable.

@asterite
Copy link
Member

asterite commented Jun 7, 2022

So it would be something like this:

  • Enumerable: you can traverse the elements, but you might easily not know how many are there. Maybe there's an infinite amount!
  • FiniteEnumerable: like Enumerable, but you can easily (in an efficient way) know how many elements it has. You still can't easily access an element in a random position.
  • Indexable: like FiniteEnumerable, but now you can also index in any position you want

@HertzDevil
Copy link
Contributor Author

HertzDevil commented Jun 7, 2022

What benefits could we gain from finiteness alone if all the stability guarantees are gone? #10075?

@asterite
Copy link
Member

asterite commented Jun 7, 2022

Just in case, I'm just proposing renaming Container to FiniteEnumerable. What are the "stability guarantees" about?

@straight-shoota
Copy link
Member

The most relevant value such a type provides is the semantic specification that a collection is stable (not necessarily immutable). This implies that essential operations such as iterating or querying the size (ref #10014, #10075) are safe in terms that they do not invalidate the enumerable.

@straight-shoota
Copy link
Member

Regarding names, Container is very unspecific. FiniteEnumerable is better, but not very concise. Finiteness is just an inferred property, I think the focus should be on stability. I don't have a good name for that, though. @ftarulla Suggested Collection. That might work. I'm not sure if that's accurate. It could be deproven by an Enumerable type that is not stable but would be called a collection.

@asterite
Copy link
Member

asterite commented Jun 8, 2022

This implies that essential operations such as iterating or querying the size (ref #10014, #10075) are safe in terms that they do not invalidate the enumerable.

How can you guarantee stability from just a name? There's no way to guarantee that. I'd rather not introduce a type that says "this guarantees X" without having a way to check that.

@asterite
Copy link
Member

asterite commented Jun 8, 2022

That's why FiniteEnumerable makes more sense, because there's an abstract def size there, meaning that the size must be something finite. But there's no method you could ask to guarantee stability.

I also don't understand why stability matters, an which collections aren't stable (which ones are these?)

@straight-shoota
Copy link
Member

There is no way to validate or enforce intrinsic properties such as stability. The type represents a semantic convention. But this is practically very valuable.

I think the same argument can be applied to FiniteEnumerable. #size could easily be implemented to return anything, such as Random.rand(Int32). That makes the size finite. But it doesn't have any reasonable meaning.

@asterite
Copy link
Member

asterite commented Jun 8, 2022

I'm taking a look at the code... so for cartesian product to work, I'm seeing that it converts each container to an Array if it's not already an Array. I'm failing to understand why not, instead, just force the caller to convert things to an Array before calling that. Also, if things are converted to Array, I don't understand where this "stability" concept comes into play. After all the collection is traversed just once. Well, I see some &.size stuff, but that could be avoided if things are converted to arrays, and the size is computed from there.

I just think that the usefulness of this is very tiny compared to the complexity of introducing an unclear abstraction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants