Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naive implementation of cartesian_power. #486

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

iago-lito
Copy link

@iago-lito iago-lito commented Oct 8, 2020

Hello, this is my first contribution to a rust library, can somebody guide me a little out here?

The idea is to complete the cartesian_product logic with a cartesian_power adaptor. It scrolls the cartesian product of an iterator with itself pow times. For instance with the set abc and pow=4, it yields:

aaaa (0)
aaab (1)
aaac (2)
aaba (3)
aabb (4)
aabc (5)
aaca (6)
aacb (7)
aacc (8)
abaa (9)
...
cccc (80 = 3^4 - 1)

Put it another way, it's supposedly equivalent to iproduct!("abc", "abc", "abc", "abc") except for the technical types quirks.

I have a few unresolved questions, which I need help with:

  • Dummy stuff first: the diff is big because I thought it was okay to hit the rustfmt button, was I wrong? Should I reverse formatting?

  • I cannot get the size_hint test to pass because it just.. times out, although I'm unsure why. How am I supposed to correctly test it?

  • What am I supposed to add in the bench files?

  • Deepest questions last:
    How could the Self::Item type be more generic in this case? I have chosen Vec to start with, but the problem is that the expected returned length actually depends on the pow argument.

    • Should I leave it as-is and wait for generic consts to be available?
    • Can I improve it by making the returned collection type (here Vec) a type parameter? If yes, how?

This is it, thank you for your patience, I'd be happy to read your feedback :)

@jswrenn
Copy link
Member

jswrenn commented Oct 8, 2020

Some quick answers:

Dummy stuff first: the diff is big because I thought it was okay to hit the rustfmt button, was I wrong? Should I reverse formatting?

Yes, please! The smaller the PR, the better, since I need to review every changed line before merging. We're really not that particular about formatting. Just indent with four spaces, and I'm sure it'll be fine.

What am I supposed to add in the bench files?

You don't need to add anything!

I'll need a little more time before I can answer your other questions.

@phimuemue
Copy link
Member

phimuemue commented Oct 9, 2020

Hi there. Foremost, thank you for your upfront effort to clarify the modalities.

I hope I do not miss anything, but what is the difference to combinations_with_replacement?

If it is supposed to do the same, you still might look into the existing implementation, and e.g. see if it has a decent size_hint.

@iago-lito
Copy link
Author

@jswrenn Okay, I'll reverse that so it'll be easier for you to read. Don't get tired scrolling meanwhile :)
BTW would you accept a separate PR with only a thorough rustfmt pass to standardize whitespace in the whole project?

@phimuemue The difference is order, combination_with_replacement is insensitive to order, so it only outputs aab, aba OR baa, correct? On the other hand cartesian_power will output aab, aba AND baa.

@phimuemue
Copy link
Member

@phimuemue The difference is order, combination_with_replacement is insensitive to order, so it only outputs aab, aba OR baa, correct? On the other hand cartesian_power will output aab, aba AND baa.

Thanks - Too early to read code...

@iago-lito
Copy link
Author

iago-lito commented Oct 9, 2020

@jswrenn I have reversed the rustfmt pass so the diff with master will be easier for you to read :)

Copy link
Member

@phimuemue phimuemue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there, thanks for the ideas in this PR.

When we do this, we should possibly ask ourselves inhowfar Power is similar to CombinationsWithReplacement. (To be honest, it seems quite similar to me, so I would expect similar inner workings, too. As always, other thought-out opinions are welcome.)

  • Should we go with Vec or with LazyBuffer?
  • Should we have a Vec<I>, iterating on demand to get the next element, or should we go with the index-based approach like in CombinationsWithReplacement?

Some(res)
}
// Check trivial case last as it should be less frequent.
Power::Empty => None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For pow==0/Power::Empty, shouldn't the iterator should yield a single element (namely the empty vector)? (Instead of yielding no elements at all.)

This would ensure that it.cartesian_power(n) yields it.count().pow(n) elements. (We had a similar case back then where (I think it was) combinations had a special case for n==0, which lead to inconsitencies.)

It would possibly also help in getting rid of some special casings (i.e. possible eliminate Power).

Copy link
Author

@iago-lito iago-lito Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phimuemue Well, I had this vague feeling I was neglicting something sneaky here. Thank you for pointing it out.
The sneaky stuff is that cartesian_product("abc", 0) should not yield the same as cartesian_product("", 3). And I bet that your criterion it.count().pow(n) is the right thing to consider. If we agree that:

2 1 0
2 2^2=4 2^1=2 2^0=1
1 1^2=1 1^1=1 1^0=1
0 0^2=0 0^1=0 0^0=1?

Then the adaptor should yield:

2 1 0
"ab" ["aa", "ab", "ba", "bb"] ["a", "b"] [""]
"a" ["aa"] ["a"] [""]
"" [] [] [""]?

So there is not 1 degenerated case, but 2 or 3, depending on the convention we pick for the 0^0 situation. I propose we stick to the "combinatorics" convention that 0^0=1, so there are only 2 corner cases. The only other solution I can think of is erroring or panicking instead when encountering cartesian_product("", 0).

I'll update the PR to better accomodate these degenerated situations :)

[from the future] see 9027ef0

src/adaptors/mod.rs Outdated Show resolved Hide resolved
@jswrenn
Copy link
Member

jswrenn commented Oct 9, 2020

How could the Self::Item type be more generic in this case? I have chosen Vec to start with, but the problem is that the expected returned length actually depends on the pow argument.

Here's another way to look at it: the pow argument equals the returned length, and is thus redundant. You could nix the argument and determine the appropriate Item with a type parameter instead.

@iago-lito
Copy link
Author

inhowfar Power is similar to CombinationsWithReplacement. (To be honest, it seems quite similar to me, so I would expect similar inner workings, too.

@phimuemue I think it is similar in its signature, because they both yield iterables whose length is eventually decided by the user. I've tried to inspire from the combination_with_replacement signature.

This said, I doubt it implies that the inner workings should resemble each other. I could think of a very inefficient implementation of cartesian_product chaining combination_with_replacement, permutations then sort, but I'm not sure this helps much, except maybe for explaining what it does?

@iago-lito
Copy link
Author

@phimuemue

Should we go with Vec or with LazyBuffer?
Should we have a Vec<I>, iterating on demand to get the next element, or should we go with the index-based approach like in CombinationsWithReplacement?

I'm not exactly sure what you mean, but this is sure something I would like to improve. My most flexible idea so far would be to make the user provide any FromIterator<I::Item> type instead of Vec, used as a type argument <C> in next so the procedure would build the return value with let res = state.iter().cloned().collect::<C>(); instead of let res: Vec<_> = state.clone();. But I couldn't get this right.

The other option I like is what @jswrenn suggests:

You could nix the argument and determine the appropriate Item with a type parameter instead.

If you mean that the user would do something like it.cartesian_product<[char; 4]>(), then I would also find it awesome, but I'm afraid it requires generic consts to be featured first?

@phimuemue
Copy link
Member

phimuemue commented Oct 10, 2020

The other option I like is what @jswrenn suggests:

You could nix the argument and determine the appropriate Item with a type parameter instead.

If you mean that the user would do something like it.cartesian_product<[char; 4]>(), then I would also find it awesome, but I'm afraid it requires generic consts to be featured first?

If we want the type-based approach, I could imagine it is possible to do it right now - with an upper bound on the array length.

However, the type-based approach would prevent us from specifying pow at runtime. So, I suggest we first make clear if we want to have a dynamic/run-time pow, or if we want to go with a type-based approach. My guess is that if you know the array length at runtime, you could easily go with iproduct or a nested cartesian_product.

The following probably only applies if we go with the dynamic/runtime approach:

@phimuemue

Should we go with Vec or with LazyBuffer?
Should we have a Vec<I>, iterating on demand to get the next element, or should we go with the index-based approach like in CombinationsWithReplacement?

I'm not exactly sure what you mean, but this is sure something I would like to improve. My most flexible idea so far would be to make the user provide any FromIterator<I::Item> type instead of Vec, used as a type argument <C> in next so the procedure would build the return value with let res = state.iter().cloned().collect::<C>(); instead of let res: Vec<_> = state.clone();. But I couldn't get this right.

CombinationsWithReplacement/Combinations does not require that the iterator is Clone, as it only clones the elements - this is in contrast with your proposal, and I would assume that users expect requirements similar to those of CombinationsWithReplacement/Combinations.

Moreover, CombinationsWithReplacement/Combinations does not unconditionally exhaust the iterator (it uses LazyBuffer, only advancing the iterator on demant) - in contrast to your proposal. Again, I would expect cartesian_power to behave similarily.

Please note that the existing implementations may not be better than your suggestion. All I advocate for is uniform behavior. Still, the existing implementation at least avoids cloneing the iterator, and goes with only cloning the elements instead.

@iago-lito
Copy link
Author

@phimuemue Okay, I'll have a deeper look into CombinationWithReplacement internals so I can answer you better.


I agree about the runtime pow vs. compile time pow valuation. These may actually require different implementations, maybe a (procedural?) macro like ipower!(my_it, 8) would do the trick for the compile time use case?

My guess is that if you know the array length at runtime, you could easily go with iproduct or a nested cartesian_product.

I disagree, because iproduct!(my_it, my_it, ..., my_it) yield tuples, so you cannot iterate on them. As a user, since I know that the tuple will be uniform in types, I would rather have an array or an iterable instead, which makes ipower!(my_it, 8) different in signature. This said, I suspect the same argument holds for combination_with_replacement, have you already have it?

Regarding runtime pow valuation, I'm okay to consider fixed-sized arrays up to 32 (or maybe more?). I also agree that uniform behaviour is a desirable thing.


Before I understand CombinationWithReplacement deeply, I would like to ask whether there exists clear-cut implementation recommendations valid throughout the library. Put it simply, what's the policy regarding adaptors implementation tradeoffs:

  • maximize flexibility with respect to the type system? (user is free to pick Vec, [I::Item; n], HashSet, String as yielded items)
  • maximize execution speed?
  • minimize RAM usage?
  • minimize heap allocations?
  • minimize calls to I::next()?
  • minimize calls to I::Item::clone()?
  • minimize calls to I::clone()?

For instance, I can think of an alternate implementation of cartesian_power that does not use I::clone(), but it requires buffering the whole iterator sequence, which may be not desirable if I turns out to be really long.

@phimuemue
Copy link
Member

Before I understand CombinationWithReplacement deeply, I would like to ask whether there exists clear-cut implementation recommendations valid throughout the library. Put it simply, what's the policy regarding adaptors implementation tradeoffs:

* maximize flexibility with respect to the type system? (user is free to pick `Vec`, `[I::Item; n]`, `HashSet`, `String` as yielded items)
* maximize execution speed?
* minimize RAM usage?
* minimize heap allocations?
* minimize calls to `I::next()`?
* minimize calls to `I::Item::clone()`?
* minimize calls to `I::clone()`?

These are good questions indeed. I do not know if we have specific recommendations. I guess we are usually open to discuss the various tradeoffs - if we do not have any precedence.

If we, however, have precedence, I strongly tend to follow existing patterns. It enforces uniformity, and builds upon possibly good decisions already made by others. If an existing design turns out to be bad, I tend to first fix that, and then build the new thing. In various past situations this technique simplified the code base quite a bit.

In this case, and if we go with the runtime approach, this might even mean looking at CombinationsWithReplacement and Combinations as a first step (to see if they should actually be more similar (see e.g. #487) and if they can be reused for cartesian_power).

But maybe wait for @jswrenn's opinion on all this.

@iago-lito
Copy link
Author

@jswrenn Friendly ping. I suspect you're maybe busy with safe transmute or other stuff. Feel free to ignore me if it's the case, there's no rush here :)

@DusterTheFirst
Copy link

How would this implementation differ/be better than something like this:

use std::fmt::Debug;

use itertools::Itertools;

fn cartesian_power<const N: usize, T, I>(iterator: I) -> impl Iterator<Item = [T; N]>
where
    T: Debug + Clone,
    I: Iterator<Item = T> + Clone,
{
    (0..N)
        .map(|_i| iterator.clone())
        .multi_cartesian_product()
        .map(|vec| TryInto::<[T; N]>::try_into(vec).unwrap())
}

macro_rules! cartesian_power {
    ($iterator:expr, $count:expr) => {
        $crate::cartesian_power::<$count, _, _>($iterator)
    };
}

fn main() {
    for [x, y, z] in cartesian_power::<3, _, _>(0..4) {
        println!("({x},{y},{z})");
    }

    for [a, b, c, d, e] in cartesian_power!(0..4, 5) {
        println!("({a},{b},{c},{d},{e})");
    }
}

@iago-lito
Copy link
Author

@DusterTheFirst I guess it would differ in one or several of the tradeoffs listed here, but IIRC we're still waiting on @jswrenn input here, are we? There was questionning about the (existence of a) general implementation policy in itertools.

@Philippe-Cholet
Copy link
Member

Philippe-Cholet commented May 15, 2024

Hi @iago-lito , I like the idea of having a cartesian power.
We usually try to minimize heap allocations as it usually is the execution bottleneck. Generate Vec items is fine enough for now. (And I see a time where we could additionally generate arrays not too far in the future).
I tend to think we should not care too much on minimizing cloning iterators and items.

But for this to be included, I think it should be somehow more performant than itertools::repeat_n(iter, pow).multi_cartesian_product() (or the documentation of multi_cartesian_product could just mention this).

@iago-lito iago-lito force-pushed the cartesian_power branch 3 times, most recently from 17271f3 to 8957f70 Compare July 15, 2024 22:38
@iago-lito iago-lito force-pushed the cartesian_power branch 2 times, most recently from 5b51535 to 47f44a6 Compare September 23, 2024 20:50
@iago-lito iago-lito force-pushed the cartesian_power branch 2 times, most recently from 6273611 to 8631037 Compare October 3, 2024 17:37
@iago-lito iago-lito force-pushed the cartesian_power branch 2 times, most recently from 30cbc0d to b68611d Compare October 9, 2024 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants