WIP: add `nfsp` algorithm and relative experiment #375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

peterchen96 wants to merge 13 commits into JuliaReinforcementLearning:master from peterchen96:peter/kuhn_nfsp

Member

peterchen96 commented Jul 16, 2021

add nfsp algorithm and test it on julia version KuhnPokerEnv, however, the result is not well for now. #342

peterchen96 mentioned this pull request

problem about precompiling the forked package #377

Closed

peterchen96 added 7 commits

July 18, 2021 17:36


          create nfsp folder in RLZoo/src/algorithms and add `average_learn…

0a14cd4

…er.jl`


          correct some messages

f6c0216


          add nfsp and abstract_nfsp

d03e015


          update algorithms

4eac2af


          modify the average_learner and nfsp

5efb4b2


          add NFSP relative experiments

d2e7d0d


          modify the experiment

5503d31

peterchen96 force-pushed the peter/kuhn_nfsp branch from b3befda to 5503d31 Compare

July 18, 2021 09:39


          correct nfsp and average_learner

68ac91e

johnnychen94 reviewed

View reviewed changes

Contributor

johnnychen94 left a comment

random comments from random person :)

docs/experiments/experiments/NFSP/JuliaRL_NFSP_KuhnPoker.jl Outdated Show resolved Hide resolved

docs/experiments/experiments/NFSP/JuliaRL_NFSP_KuhnPoker.jl

Comment on lines +86 to +88

		savefig("assets/JuliaRL_NFSP_KuhnPoker.png")#hide

		# ![](assets/JuliaRL_NFSP_KuhnPoker.png)

Contributor

johnnychen94 Jul 18, 2021

These are used to generate the progress plot. Whether it is needed depends on how you write the cover field in the front matter (see my previous comment).

src/ReinforcementLearningZoo/src/algorithms/nfsp/nfsp.jl Outdated

+              See the paper https://arxiv.org/abs/1603.01121 for more details.
+              """
+              export NFSPAgent, NFSPAgents

Contributor

johnnychen94 Jul 18, 2021

I don't think users need to directly use NFSPAgent, so no need to export this one.

src/ReinforcementLearningZoo/src/algorithms/nfsp/nfsp.jl Outdated

Comment on lines 10 to 12

+              mutable struct NFSPAgents <: AbstractPolicy
+                  agents::Dict{Any, AbstractPolicy}
+              end

Contributor

johnnychen94 Jul 18, 2021

We don't use plural case NFSPAgents to differentiate with NFSPAgent.

Since we don't really need to export NFSPAgent, we can maybe rename NFSPAgent to NFSPAgentInstance?

src/ReinforcementLearningZoo/src/algorithms/nfsp/nfsp.jl Outdated

Comment on lines 1 to 5

+              """
+              Neural Fictitious Self-Play (NFSP) agent implemented in Julia.
+              See the paper https://arxiv.org/abs/1603.01121 for more details.
+              """

Contributor

johnnychen94 Jul 18, 2021

I believe this docstring should go to NFSPAgents. Please follow the Julia guideline https://docs.julialang.org/en/v1/manual/documentation/

src/ReinforcementLearningZoo/src/algorithms/nfsp/nfsp.jl

Comment on lines +14 to +19

+              mutable struct NFSPAgent <: AbstractPolicy
+                  η
+                  rng
+                  rl_agent::Agent
+                  sl_agent::Agent
+              end

Contributor

johnnychen94 Jul 18, 2021

Is this a usable policy? Otherwise there's no need to <: AbstractPolicy.

src/ReinforcementLearningZoo/src/algorithms/nfsp/nfsp.jl Outdated

+              using Distributions: TruncatedNormal
+              mutable struct NFSPAgents <: AbstractPolicy
+                  agents::Dict{Any, AbstractPolicy}

Contributor

johnnychen94 Jul 18, 2021

Just a question: does it really accept any type of AbstractPolicy?

src/ReinforcementLearningZoo/src/algorithms/nfsp/nfsp.jl Outdated

+                  sl_agent::Agent
+              end
+              function initW(out_size, in_size)

Contributor

johnnychen94 Jul 18, 2021

initW is so common a name that might cause conflict with other algorithms.

Suggested change

      
            function initW(out_size, in_size)
          
            function _NFSP_initW(out_size, in_size)

Member Author

peterchen96 Jul 18, 2021

Thanks for your reviews and sorry for the late response.

I don't think users need to directly use NFSPAgent, so no need to export this one.

We don't use plural case NFSPAgents to differentiate with NFSPAgent.

Since we don't really need to export NFSPAgent, we can maybe rename NFSPAgent to NFSPAgentInstance?

Here, NFSPAgent is the core policy to play the game. In my view, agents in one multi-agent experiment may choose different policies to play the game. (maybe just like players can use different schemes in one game)

And NFSPAgents is more like a special MultiAgentManager in which all agents choose nfsp policy to learn the best response. For more clarity, I'll extract 'NFSPAgents' and its relative methods to a new file and renamed it as 'NFSPAgentManager'.

You can follow this format here:

I believe this docstring should go to NFSPAgents. Please follow the Julia guideline https://docs.julialang.org/en/v1/manual/documentation/

Thanks for your reminder. I'll supplement the description about NFSPAgent and its experiment later.

initW is so common a name that might cause conflict with other algorithms.

Thank you for pointing it out. Maybe I'll rename it as _TruncatedNormal because of the used distribution.

peterchen96 added 5 commits

July 19, 2021 02:03


          add NFSPAgentManager in the nfsp

f1dfb37


          modify the abstract_nfsp_agent

db903a2


          add descriptions about NFSP_KuhnPoker

b625a7a


          update nfsp_manager

a9b95a9


          add descriptions about NFSPAgent

b254dfd

findmyway requested changes

View reviewed changes

Member

findmyway left a comment

The implementation seems to be totally incorrect. I'll sync with you offline later.

docs/experiments/experiments/NFSP/JuliaRL_NFSP_KuhnPoker.jl

+              # author: "[Peter Chen](https://github.com/peterchen96)"
+              # ---
+              #+ tangle=false

Member

findmyway Jul 19, 2021

Why is it set to false?

docs/experiments/experiments/NFSP/JuliaRL_NFSP_KuhnPoker.jl

+              using Flux
+              using Flux.Losses
+              mutable struct ResultNEpisode <: AbstractHook

Member

findmyway Jul 19, 2021

Why is it set to a subtype of AbstractHook here?

docs/experiments/experiments/NFSP/JuliaRL_NFSP_KuhnPoker.jl

+                  episode::Vector{Int}
+                  results
+              end
+              recorder = ResultNEpisode([], [])

Member

findmyway Jul 19, 2021

Please note that it will be set to the global variable in the ReinforcementLearningExperiments.jl package once tangle=true. And obviously, this is not what you want. Better to move it into the internal of the experiment.

docs/experiments/experiments/NFSP/JuliaRL_NFSP_KuhnPoker.jl

Comment on lines +30 to +40

+                  env = KuhnPokerEnv()
+                  states = [
+                      (), (:J,), (:Q,), (:K,),
+                      (:J, :Q), (:J, :K), (:Q, :J), (:Q, :K), (:K, :J), (:K, :Q),
+                      (:J, :bet), (:J, :pass), (:Q, :bet), (:Q, :pass), (:K, :bet), (:K, :pass),
+                      (:J, :pass, :bet), (:J, :bet, :bet), (:J, :bet, :pass), (:J, :pass, :pass),
+                      (:Q, :pass, :bet), (:Q, :bet, :bet), (:Q, :bet, :pass), (:Q, :pass, :pass),
+                      (:K, :pass, :bet), (:K, :bet, :bet), (:K, :bet, :pass), (:K, :pass, :pass),
+                      (:J, :pass, :bet, :pass), (:J, :pass, :bet, :bet), (:Q, :pass, :bet, :pass),
+                      (:Q, :pass, :bet, :bet), (:K, :pass, :bet, :pass), (:K, :pass, :bet, :bet),
+                  ] # collect all states

Member

findmyway Jul 19, 2021

How about adding another state representation to this environment?

src/ReinforcementLearningZoo/src/algorithms/nfsp/abstract_nfsp_agent.jl

+              )
+                  @assert NumAgentStyle(env) isa MultiAgent
+                  @assert DynamicStyle(env) === SEQUENTIAL
+                  @assert RewardStyle(env) === TERMINAL_REWARD

Member

findmyway Jul 19, 2021

Is this required?

Member Author

peterchen96 commented Jul 25, 2021

Sorry for late response. Since this PR’s implementation was totally incorrect, I would close this and create a new PR to record my implementation and relative experiment.

peterchen96 closed this

peterchen96 mentioned this pull request

Implementation of NFSP and NFSP_KuhnPoker experiment #402

Merged

peterchen96 deleted the peter/kuhn_nfsp branch

August 8, 2021 18:07

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet