Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add permutations and permutations_sized adaptors #230

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

nwad123
Copy link
Contributor

@nwad123 nwad123 commented Jan 26, 2025

This PR adds the permutations and permutations_sized adaptors, as described in #138.

The permutations adaptor takes in any input sequence and produces an ouput
of all possible permutations of the input. If the input sequence has a length of $l$,
the output sequence is length $l!$. The output permutations are in lexographical
order according to the original order of the input. If the original input is in order,
the output permutations will also be in order.

auto x = flux::ints()         // 0, 1, 2, 3, ...
              .take(3)        // 0, 1, 2
              .permutations() // [0, 1, 2], [0, 2, 1],
                              // [1, 0, 2], [1, 2, 0],
                              // [2, 0, 1], [2, 1, 0]
              // ...

The permutations_sized adaptor differs in operation in that it is parameterized
with a additional input -- SubsequenceSize -- that defines how long each output
permutation should be. If the length of the input sequence is $l$ and the value of
SubsequenceSize is represented by $r$ then the length of the output is calculated
by:

$$ \frac{l!}{(l-r)!} $$

auto x = flux::ints()                  // 0, 1, 2, 3, ...
              .take(3)                 // 0, 1, 2
              .permutations_sized<2>() // [0, 1], [0, 2],
                                       // [1, 0], [1, 2],
                                       // [2, 0], [2, 1]
              // ...

The implementation of both adaptors is spread across three files:

The implementations are based off the implementation of both
Rust and
Python's itertools
permutations adaptors.

The operations of both adaptors is similar. First, they cache the input sequence so that the values can
be repeatedly copied into each output element. Then, each cursor contains a vector, indices containing
the permuted indices of the input vector. These indices are used along with the cached input sequence
to generate the permutation at each element (see example below). When read_at is called, the cached
input sequence is copied into the output vector using the indices stored in the cursor for each value.
When a cursor is advanced, the next permutations of the indices is generated, and then when read_at
is called again, the next permutation of the input is returned.

Here's an example demonstrating the steps. Note that this example is slight pseudocode in order to
better show the steps that are happening.

// Sequence 
auto permuted_sequence = flux::ref({2, 1, 3}).permutations();

// Cursor 
auto c = flux::first(permuted_sequence);

assert(c.indices == [0, 1, 2]);
assert(permuted_sequence.cache == [2, 1, 3]);

// Read
auto x = flux::read_at(permuted_sequence, c);
         // index into the cached values using the cursor indices:
         // x[0] == permuted_sequence.cache[c.indices[0]];
         // x[1] == permuted_sequence.cache[c.indices[1]];
         // x[2] == permuted_sequence.cache[c.indices[2]];

assert(x == [2, 1, 3]);

// Advance
flux::inc(permuted_sequence, c);
      // c.indices = std::next_permutation(c.indices)

assert(c.indices = [0, 2, 1]);

// Read again 
auto y = flux::read_at(permuted_sequence, c);
         // y[0] == permuted_sequence.cache[c.indices[0]];
         // y[1] == permuted_sequence.cache[c.indices[1]];
         // y[2] == permuted_sequence.cache[c.indices[2]];

assert(y == [2, 3, 1]);

I'm marking this as a draft because I'm looking for feedback on several design decisions I made. Once
these are answered I can finish up the documentation, testing, and implementation of these adaptors and
hopefully have them merged in. Also, there are optimizations that could be implemented for different
types of input sequences that I plan on adding over time. Here are the specific areas of feedback I'm
looking for:

  1. Does the decision to split the functionality of the permutations and permutations_sized into
    two adaptors make sense?

  2. Are vectors a good element type for the output? I've thought about using an array if we
    know the size of the input at compile time, but I started with a vector because it's more flexible to
    use. Also, if we have long input sequences, I wasn't sure if it would be good to use a stack-based
    element type for the output.

  3. Are the cursors too expensive to create to justify satisfying flux::regular_cursor for the cursors?
    The documentation says that "cursors may be copied within algorithm implementations, so copying should
    be 'cheap'" (flux::regular_cursor docs).

I'm also happy to hear any other feedback that you have! I'm still pretty new to contributing to C++
projects, but I'm enjoying what I'm learning. Also, December and January were busy with other tasks, but
I should be able to contribute more time to these adaptors for the next several weeks.

  • laid out basics of permutation adaptor, still researching impl strategies
  • Renamed branch to follow other branch names
  • Removed unnecessary comments
  • Added permutations adaptor and started tests
  • Added string-based permutations comparison test
  • Fixed string based comparison test
  • Removed unused imports
  • Added sized permutations adaptor
  • Fixed sized template parameters on sized permutations
  • Added sized permutations test
  • Renamed 'sized_permutations' to 'permutations_sized'
  • Added outline for documentation
  • Renamed and refactored for first draft

Copy link

codecov bot commented Jan 27, 2025

Codecov Report

Attention: Patch coverage is 96.19048% with 4 lines in your changes missing coverage. Please review.

Project coverage is 98.20%. Comparing base (f539c11) to head (0b65e24).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
include/flux/adaptor/permutations_sized.hpp 95.91% 2 Missing ⚠️
include/flux/adaptor/permutations.hpp 97.67% 1 Missing ⚠️
include/flux/adaptor/permutations_base.hpp 92.30% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #230      +/-   ##
==========================================
- Coverage   98.60%   98.20%   -0.40%     
==========================================
  Files          69       72       +3     
  Lines        2578     2737     +159     
==========================================
+ Hits         2542     2688     +146     
- Misses         36       49      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tcbrindle
Copy link
Owner

tcbrindle commented Jan 27, 2025

Hi @nwad123, thanks so much for working on this!

I haven't looked too much at the code yet as you've said it's still a draft, but please let me know if you have any questions about how things work in Flux :)

Regarding the design of the adaptors, I think it would be better to focus only on fixed-size pemutations<N> for now. The "full length" version would need to perform allocations in various places (both in the cursor ops and on each call to read_at) which makes me a bit worried about hidden performance traps -- we don't do any allocations in any other adaptors in the library. People have also noted performance problems with the Rust version.

For the fixed-size version, because we know a priori how many terms we're dealing with we can use a stack array for storage, as we do with the cartesian_power<N> adaptor (in fact, permutations<N> is a subset of this). I appreciate the concerns about stack size, but in general N isn't likely to be all that large, and cursors should be small objects anyway, so storing them in an array should be fine.

To answer your design questions specifically:

  1. Does the decision to split the functionality of the permutations and permutations_sized into
    two adaptors make sense?

Yes, but I'd recommend focusing on the sized variant only for now.

  1. Are vectors a good element type for the output?

No :)

I've thought about using an array if we know the size of the input at compile time, but I started with a vector because it's more flexible to use. Also, if we have long input sequences, I wasn't sure if it would be good to use a stack-based
element type for the output.

Vector has two problems: firstly the allocations we've discussed, and secondly it cannot hold references, so you'd need to copy each input element into the vector for each permutation which is a lot of copying! So I'd rather not do that. Unfortunately arrays can't hold references either, so they're out too.

I think it would be best if permutations<N>::read_at() returned an N-tuple std::tuple<element_t<S>, element_t<S>, ...>, which is what the cartesian_power<N> adaptor yields. (There's some code in cartesian_base.cpp which will give you an alias to such a tuple type). This will correctly handle references and object types, at the cost of being a bit unwieldy to program with in C++.

  1. Are the cursors too expensive to create to justify satisfying flux::regular_cursor for the cursors?
    The documentation says that "cursors may be copied within algorithm implementations, so copying should
    be 'cheap'" (flux::regular_cursor docs).

I would say that we definitely don't want a cursor which holds a vector, but a cursor that holds a fixed-size array of cursors is fine (we do it in several other places in the library).

Regarding caching the input data: if we require that the input is a multipass sequence (which is reasonable, as we're going to be performing multiple passes over the elements!) then I don't think we'd need to take an initial copy of the input?

I hope this helps! I have to say you've chosen a bit of a monster as your first Flux adaptor :) But I'm happy to help as much as I can.

@nwad123
Copy link
Contributor Author

nwad123 commented Jan 28, 2025

Thanks! This helps greatly. I'll let you know how working on this goes. Haha, in regards to this being a tricky adaptor, I am realizing that more and more as I work on it, but it's great fun.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants