Skip to content

WIP: add nfsp algorithm and relative experiment #375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

peterchen96
Copy link
Member

add nfsp algorithm and test it on julia version KuhnPokerEnv, however, the result is not well for now. #342

Copy link
Contributor

@johnnychen94 johnnychen94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random comments from random person :)

Comment on lines +86 to +88
savefig("assets/JuliaRL_NFSP_KuhnPoker.png")#hide

# ![](assets/JuliaRL_NFSP_KuhnPoker.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are used to generate the progress plot. Whether it is needed depends on how you write the cover field in the front matter (see my previous comment).


See the paper https://arxiv.org/abs/1603.01121 for more details.
"""
export NFSPAgent, NFSPAgents
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think users need to directly use NFSPAgent, so no need to export this one.

Comment on lines 10 to 12
mutable struct NFSPAgents <: AbstractPolicy
agents::Dict{Any, AbstractPolicy}
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use plural case NFSPAgents to differentiate with NFSPAgent.

Since we don't really need to export NFSPAgent, we can maybe rename NFSPAgent to NFSPAgentInstance?

Comment on lines 1 to 5
"""
Neural Fictitious Self-Play (NFSP) agent implemented in Julia.

See the paper https://arxiv.org/abs/1603.01121 for more details.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this docstring should go to NFSPAgents. Please follow the Julia guideline https://docs.julialang.org/en/v1/manual/documentation/

Comment on lines +14 to +19
mutable struct NFSPAgent <: AbstractPolicy
η
rng
rl_agent::Agent
sl_agent::Agent
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a usable policy? Otherwise there's no need to <: AbstractPolicy.

using Distributions: TruncatedNormal

mutable struct NFSPAgents <: AbstractPolicy
agents::Dict{Any, AbstractPolicy}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question: does it really accept any type of AbstractPolicy?

sl_agent::Agent
end

function initW(out_size, in_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initW is so common a name that might cause conflict with other algorithms.

Suggested change
function initW(out_size, in_size)
function _NFSP_initW(out_size, in_size)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reviews and sorry for the late response.

I don't think users need to directly use NFSPAgent, so no need to export this one.

We don't use plural case NFSPAgents to differentiate with NFSPAgent.

Since we don't really need to export NFSPAgent, we can maybe rename NFSPAgent to NFSPAgentInstance?

Here, NFSPAgent is the core policy to play the game. In my view, agents in one multi-agent experiment may choose different policies to play the game. (maybe just like players can use different schemes in one game)

And NFSPAgents is more like a special MultiAgentManager in which all agents choose nfsp policy to learn the best response. For more clarity, I'll extract 'NFSPAgents' and its relative methods to a new file and renamed it as 'NFSPAgentManager'.

You can follow this format here:

I believe this docstring should go to NFSPAgents. Please follow the Julia guideline https://docs.julialang.org/en/v1/manual/documentation/

Thanks for your reminder. I'll supplement the description about NFSPAgent and its experiment later.

initW is so common a name that might cause conflict with other algorithms.

Thank you for pointing it out. Maybe I'll rename it as _TruncatedNormal because of the used distribution.

Copy link
Member

@findmyway findmyway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation seems to be totally incorrect. I'll sync with you offline later.

# author: "[Peter Chen](https://github.com/peterchen96)"
# ---

#+ tangle=false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it set to false?

using Flux
using Flux.Losses

mutable struct ResultNEpisode <: AbstractHook
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it set to a subtype of AbstractHook here?

episode::Vector{Int}
results
end
recorder = ResultNEpisode([], [])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that it will be set to the global variable in the ReinforcementLearningExperiments.jl package once tangle=true. And obviously, this is not what you want. Better to move it into the internal of the experiment.

Comment on lines +30 to +40
env = KuhnPokerEnv()
states = [
(), (:J,), (:Q,), (:K,),
(:J, :Q), (:J, :K), (:Q, :J), (:Q, :K), (:K, :J), (:K, :Q),
(:J, :bet), (:J, :pass), (:Q, :bet), (:Q, :pass), (:K, :bet), (:K, :pass),
(:J, :pass, :bet), (:J, :bet, :bet), (:J, :bet, :pass), (:J, :pass, :pass),
(:Q, :pass, :bet), (:Q, :bet, :bet), (:Q, :bet, :pass), (:Q, :pass, :pass),
(:K, :pass, :bet), (:K, :bet, :bet), (:K, :bet, :pass), (:K, :pass, :pass),
(:J, :pass, :bet, :pass), (:J, :pass, :bet, :bet), (:Q, :pass, :bet, :pass),
(:Q, :pass, :bet, :bet), (:K, :pass, :bet, :pass), (:K, :pass, :bet, :bet),
] # collect all states
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding another state representation to this environment?

)
@assert NumAgentStyle(env) isa MultiAgent
@assert DynamicStyle(env) === SEQUENTIAL
@assert RewardStyle(env) === TERMINAL_REWARD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required?

@peterchen96
Copy link
Member Author

Sorry for late response. Since this PR’s implementation was totally incorrect, I would close this and create a new PR to record my implementation and relative experiment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants