-
-
Notifications
You must be signed in to change notification settings - Fork 109
WIP: add nfsp
algorithm and relative experiment
#375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: add nfsp
algorithm and relative experiment
#375
Conversation
b3befda
to
5503d31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random comments from random person :)
savefig("assets/JuliaRL_NFSP_KuhnPoker.png")#hide | ||
|
||
#  |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are used to generate the progress plot. Whether it is needed depends on how you write the cover
field in the front matter (see my previous comment).
|
||
See the paper https://arxiv.org/abs/1603.01121 for more details. | ||
""" | ||
export NFSPAgent, NFSPAgents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think users need to directly use NFSPAgent
, so no need to export this one.
mutable struct NFSPAgents <: AbstractPolicy | ||
agents::Dict{Any, AbstractPolicy} | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use plural case NFSPAgents
to differentiate with NFSPAgent
.
Since we don't really need to export NFSPAgent
, we can maybe rename NFSPAgent
to NFSPAgentInstance
?
""" | ||
Neural Fictitious Self-Play (NFSP) agent implemented in Julia. | ||
|
||
See the paper https://arxiv.org/abs/1603.01121 for more details. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this docstring should go to NFSPAgents
. Please follow the Julia guideline https://docs.julialang.org/en/v1/manual/documentation/
mutable struct NFSPAgent <: AbstractPolicy | ||
η | ||
rng | ||
rl_agent::Agent | ||
sl_agent::Agent | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a usable policy? Otherwise there's no need to <: AbstractPolicy
.
using Distributions: TruncatedNormal | ||
|
||
mutable struct NFSPAgents <: AbstractPolicy | ||
agents::Dict{Any, AbstractPolicy} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question: does it really accept any type of AbstractPolicy
?
sl_agent::Agent | ||
end | ||
|
||
function initW(out_size, in_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initW
is so common a name that might cause conflict with other algorithms.
function initW(out_size, in_size) | |
function _NFSP_initW(out_size, in_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reviews and sorry for the late response.
I don't think users need to directly use
NFSPAgent
, so no need to export this one.We don't use plural case
NFSPAgents
to differentiate withNFSPAgent
.Since we don't really need to export
NFSPAgent
, we can maybe renameNFSPAgent
toNFSPAgentInstance
?
Here, NFSPAgent
is the core policy to play the game. In my view, agents in one multi-agent experiment may choose different policies to play the game. (maybe just like players can use different schemes in one game)
And NFSPAgents
is more like a special MultiAgentManager in which all agents choose nfsp
policy to learn the best response. For more clarity, I'll extract 'NFSPAgents' and its relative methods to a new file and renamed it as 'NFSPAgentManager'.
You can follow this format here:
I believe this docstring should go to
NFSPAgents
. Please follow the Julia guideline https://docs.julialang.org/en/v1/manual/documentation/
Thanks for your reminder. I'll supplement the description about NFSPAgent
and its experiment later.
initW
is so common a name that might cause conflict with other algorithms.
Thank you for pointing it out. Maybe I'll rename it as _TruncatedNormal
because of the used distribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation seems to be totally incorrect. I'll sync with you offline later.
# author: "[Peter Chen](https://github.com/peterchen96)" | ||
# --- | ||
|
||
#+ tangle=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it set to false
?
using Flux | ||
using Flux.Losses | ||
|
||
mutable struct ResultNEpisode <: AbstractHook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it set to a subtype of AbstractHook
here?
episode::Vector{Int} | ||
results | ||
end | ||
recorder = ResultNEpisode([], []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that it will be set to the global variable in the ReinforcementLearningExperiments.jl
package once tangle=true
. And obviously, this is not what you want. Better to move it into the internal of the experiment.
env = KuhnPokerEnv() | ||
states = [ | ||
(), (:J,), (:Q,), (:K,), | ||
(:J, :Q), (:J, :K), (:Q, :J), (:Q, :K), (:K, :J), (:K, :Q), | ||
(:J, :bet), (:J, :pass), (:Q, :bet), (:Q, :pass), (:K, :bet), (:K, :pass), | ||
(:J, :pass, :bet), (:J, :bet, :bet), (:J, :bet, :pass), (:J, :pass, :pass), | ||
(:Q, :pass, :bet), (:Q, :bet, :bet), (:Q, :bet, :pass), (:Q, :pass, :pass), | ||
(:K, :pass, :bet), (:K, :bet, :bet), (:K, :bet, :pass), (:K, :pass, :pass), | ||
(:J, :pass, :bet, :pass), (:J, :pass, :bet, :bet), (:Q, :pass, :bet, :pass), | ||
(:Q, :pass, :bet, :bet), (:K, :pass, :bet, :pass), (:K, :pass, :bet, :bet), | ||
] # collect all states |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding another state representation to this environment?
) | ||
@assert NumAgentStyle(env) isa MultiAgent | ||
@assert DynamicStyle(env) === SEQUENTIAL | ||
@assert RewardStyle(env) === TERMINAL_REWARD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required?
Sorry for late response. Since this PR’s implementation was totally incorrect, I would close this and create a new PR to record my implementation and relative experiment. |
add
nfsp
algorithm and test it on julia versionKuhnPokerEnv
, however, the result is not well for now. #342