[PoC] Add the `Container` module for stable and finite collections #12092

HertzDevil · 2022-06-05T00:35:00Z

This is a proof of concept for #3790 (comment). It relaxes Indexable.cartesian_product's requirement from an Indexable(Indexable) to a Container(Container). Since Set has a finite size and provides stable iteration, it satisfies Container and one can now do:

x = Container.cartesian_product([Set{1, 2, 3}, Set{'a', 'b', 'c'}])
x # => [[1, 'a'], [1, 'b'], [1, 'c'], [2, 'a'], [2, 'b'], [2, 'c'], [3, 'a'], [3, 'b'], [3, 'c']]

x = Container.cartesian_product(Set{Set{1, 2, 3}, Set{'a', 'b', 'c'}})
x # => [[1, 'a'], [1, 'b'], [1, 'c'], [2, 'a'], [2, 'b'], [2, 'c'], [3, 'a'], [3, 'b'], [3, 'c']]

# `Hash` is also a `Container`
x = Container.cartesian_product([{1 => 2, 3 => 4}, {'a' => "b", 'c' => "d"}])
x # => [[{1, 2}, {'a', "b"}], [{1, 2}, {'c', "d"}], [{3, 4}, {'a', "b"}], [{3, 4}, {'c', "d"}]]

# every `Hash(K, V)` is also a `Container(Tuple(K, V))`, and
# `Tuple(K, V) <= Container(K | V)`, so restriction matches in a non-strict way
# each key-value pair is interpreted as an inner set
x = Container.cartesian_product({1 => 'a', 2 => 'b', 3 => 'c'})
x # => [[1, 2, 3], [1, 2, 'c'], [1, 'b', 3], [1, 'b', 'c'], ['a', 2, 3], ['a', 2, 'c'], ['a', 'b', 3], ['a', 'b', 'c']]

This can be easily generalized to the instance methods and the Iterator-returning variants.

Compared to the existing implementation, Container.each_cartesian simply converts all the inner Sets into Arrays (containers = containers.map { |c| dup_as_array(c) }), because we can now assert that Container#to_a is always safe to call.

Some afterthoughts:

If every Container can be safely converted to an Array, that means all methods in Indexable can be moved to Container if every method does this conversion. For example, it becomes possible to call Set#bsearch, and the result is "meaningful" provided that the Set's members are sorted with respect to the iteration order. Do we want this?
An idea is that a method can be moved to Container if and only if it makes sense in an unordered container. #sample is one such method, and so is .cartesian_product if we conceptually view its return value as a set. Perhaps the requirement on stable iteration order can be dropped, i.e. multiple calls to #each(&) are allowed to permute the elements, but not add or remove any.
Assuming the iteration order guarantee stays, even #index can be moved into Container if we interpret the return value as one that is consistent with #each_with_index:
```
Set{'a', 'b', 'c', 'd'}.index('c') # => 2
Set{'d', 'c', 'b', 'a'}.index('c') # => 2
```
That in turn means Set satisfies Indexable, which is definitely unexpected. How do we tell that a Container is not Indexable?
One approach might be requiring O(1) complexity in Indexable#unsafe_fetch, which is most certainly not the case for Set. This is more apparent with XML::Attributes, a container backed by a singly linked list. Its #size is exactly the one from Enumerable, and is O(n); same for #[](Int).

IMHO this Container is too powerful and the way cartesian_product works on Hashes is rather confusing. Maybe it is fine that we don't support Set arguments after all?

jhass · 2022-06-07T12:37:34Z

Yes, I feel like this might be going a bit too far. I'd rather have a bit of code duplication in standard library than imposing too many generic container'ish interfaces onto users.

I also think pushing some necessary conversion onto the caller sides is fine. It may actually make code easier to reason about, albeit a bit more verbose, compared to hiding them behind stdlib interfaces.

asterite · 2022-06-07T12:41:52Z

Maybe the name Container is a bit vague? I think something like EnumerableWithKnowSize or similar, even if a bit verbose, is better. Or maybe FiniteEnumerable.

asterite · 2022-06-07T12:44:49Z

So it would be something like this:

Enumerable: you can traverse the elements, but you might easily not know how many are there. Maybe there's an infinite amount!
FiniteEnumerable: like Enumerable, but you can easily (in an efficient way) know how many elements it has. You still can't easily access an element in a random position.
Indexable: like FiniteEnumerable, but now you can also index in any position you want

HertzDevil · 2022-06-07T13:52:23Z

What benefits could we gain from finiteness alone if all the stability guarantees are gone? #10075?

asterite · 2022-06-07T13:54:12Z

Just in case, I'm just proposing renaming Container to FiniteEnumerable. What are the "stability guarantees" about?

straight-shoota · 2022-06-07T16:23:15Z

The most relevant value such a type provides is the semantic specification that a collection is stable (not necessarily immutable). This implies that essential operations such as iterating or querying the size (ref #10014, #10075) are safe in terms that they do not invalidate the enumerable.

straight-shoota · 2022-06-07T16:24:52Z

Regarding names, Container is very unspecific. FiniteEnumerable is better, but not very concise. Finiteness is just an inferred property, I think the focus should be on stability. I don't have a good name for that, though. @ftarulla Suggested Collection. That might work. I'm not sure if that's accurate. It could be deproven by an Enumerable type that is not stable but would be called a collection.

asterite · 2022-06-08T10:53:44Z

This implies that essential operations such as iterating or querying the size (ref #10014, #10075) are safe in terms that they do not invalidate the enumerable.

How can you guarantee stability from just a name? There's no way to guarantee that. I'd rather not introduce a type that says "this guarantees X" without having a way to check that.

asterite · 2022-06-08T10:54:55Z

That's why FiniteEnumerable makes more sense, because there's an abstract def size there, meaning that the size must be something finite. But there's no method you could ask to guarantee stability.

I also don't understand why stability matters, an which collections aren't stable (which ones are these?)

straight-shoota · 2022-06-08T11:08:24Z

There is no way to validate or enforce intrinsic properties such as stability. The type represents a semantic convention. But this is practically very valuable.

I think the same argument can be applied to FiniteEnumerable. #size could easily be implemented to return anything, such as Random.rand(Int32). That makes the size finite. But it doesn't have any reasonable meaning.

asterite · 2022-06-08T11:19:30Z

I'm taking a look at the code... so for cartesian product to work, I'm seeing that it converts each container to an Array if it's not already an Array. I'm failing to understand why not, instead, just force the caller to convert things to an Array before calling that. Also, if things are converted to Array, I don't understand where this "stability" concept comes into play. After all the collection is traversed just once. Well, I see some &.size stuff, but that could be avoided if things are converted to arrays, and the size is computed from there.

I just think that the usefulness of this is very tiny compared to the complexity of introducing an unclear abstraction.

HertzDevil added 2 commits June 5, 2022 07:30

Add the Container module for stable and finite collections

9abb052

Move Indexable.cartesian_product to Container.class

91760bb

HertzDevil added status:discussion topic:stdlib:collection labels Jun 5, 2022

HertzDevil marked this pull request as draft June 5, 2022 00:35

caspiano mentioned this pull request Nov 14, 2022

Hash#select!(Enumerable) can skip over elements in the argument #12736

Closed

straight-shoota assigned HertzDevil Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PoC] Add the `Container` module for stable and finite collections #12092

[PoC] Add the `Container` module for stable and finite collections #12092

HertzDevil commented Jun 5, 2022

jhass commented Jun 7, 2022 •

edited

Loading

asterite commented Jun 7, 2022

asterite commented Jun 7, 2022

HertzDevil commented Jun 7, 2022 •

edited

Loading

asterite commented Jun 7, 2022

straight-shoota commented Jun 7, 2022

straight-shoota commented Jun 7, 2022

asterite commented Jun 8, 2022

asterite commented Jun 8, 2022

straight-shoota commented Jun 8, 2022

asterite commented Jun 8, 2022

[PoC] Add the Container module for stable and finite collections #12092

Are you sure you want to change the base?

[PoC] Add the Container module for stable and finite collections #12092

Conversation

HertzDevil commented Jun 5, 2022

jhass commented Jun 7, 2022 • edited Loading

asterite commented Jun 7, 2022

asterite commented Jun 7, 2022

HertzDevil commented Jun 7, 2022 • edited Loading

asterite commented Jun 7, 2022

straight-shoota commented Jun 7, 2022

straight-shoota commented Jun 7, 2022

asterite commented Jun 8, 2022

asterite commented Jun 8, 2022

straight-shoota commented Jun 8, 2022

asterite commented Jun 8, 2022

[PoC] Add the `Container` module for stable and finite collections #12092

[PoC] Add the `Container` module for stable and finite collections #12092

jhass commented Jun 7, 2022 •

edited

Loading

HertzDevil commented Jun 7, 2022 •

edited

Loading