Skip to content

Conversation

@michalsustr
Copy link
Collaborator

@michalsustr michalsustr commented Oct 1, 2020

This PR is a work-in-progress for implementing efficient CFR solvers for infostate trees.
@elkhrt @gabrfarina I would appreciate your feedback on the work.

  • Implement infostate tree representation.
  • Make sure the construction works also for a sequence of states where the same player chooses actions repeatedly. (correct alternation of decision/observation nodes, so that the number of decision nodes' children corresponds to the number of actions in those nodes). Also, the child index in the vector should correspond to the legal action index.
  • Add more infostate tree tests. I am not sure how to make them so they are not brittle. Testing tensor observations is not a good idea at the moment, as they change quite a bit right now.
  • Implement CFR that will build infostate trees for both players, and then run iterations only on these lightweight trees.
  • Run this CFR on the same suite of tests as the current implementation.

I expect that this CFR implementation will be much faster, as making rollouts in the game is slow and locating CFRInfoStateValues in a map indexed by strings is nice for understanding the algorithm, but not for efficiency.

Some things I decided not to do (in this PR):

  • Allow to prune the observation nodes. Because we cannot be sure about the construction of the infostate tree until it is finished, some observation nodes can be redundant. We can optionally prune them to get a smaller tree.
  • Identify infoste nodes by a sequence (sequence-form). Not sure whether just having a vector if ints is good enough, but if more info is needed you can always use these to traverse the infostate trees.
  • Compute sequence tuples for any State. I don't need that for my purposes, but I can add it if @gabrfarina would like to have it.

I outlined a number of welcome contributions in the code.

@michalsustr michalsustr changed the title Implement base infostate tree. Implement an efficient CFR solver. Oct 1, 2020
@michalsustr michalsustr mentioned this pull request Oct 2, 2020
14 tasks
Copy link
Member

@elkhrt elkhrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you revert the changes to the other algorithms and just add the new algorithm in this PR?

// A helper struct that allows to propagate reach probs / cf values
// up and down the tree.
struct InfostateTreeValuePropagator {
// Tree and the tree structure information. These must not change!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems ugly. Shouldn't these values be parameters to the propagation function rather than members here?

// Collects legal actions at the specified state.
ActionView(const State& state);

// Provides an iterator over flattened actions. This is equivalent to calling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like unnecessary duplication.

if __name__ == "__main__":
absltest.main()

# TODO checks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make these TODOs clearer. You can link to github issues if you want, or just write a longer description of each in-line here.

}

private:
std::array<InfostateTreeValuePropagator, 2> propagators_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the array for?


private:
std::array<InfostateTreeValuePropagator, 2> propagators_;
// Map from player 1 index (key) to player 0 (value).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear.

@michalsustr
Copy link
Collaborator Author

Thank you for the comments! I will go over them.

Since I have a lot of other code piled up locally, I was thinking about how to incorporate it efficiently. I came up with the idea that I split them up into smaller PRs that are better structured. The code has mostly a linear chain of dependence, and I will post new PRs as we move forward the chain. This will allow us to incorporate the review feedback and have a cleaner git history. I appreciate the quality code review, therefore please do not feel under a rush for reviewing the PRs. This change will only mean that the contributions will come slower from our side.

I will close the PR for now, but I will address the comments in the new smaller PRs. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants