Tags: rail-berkeley/rlkit
Tags
Change sampling method from randint to choice in Replay and robustify… … policy networks in SAC (#111) * Introduced possibility to change alpha parameter * Fix sum operation which causes trouble for more that two batch dimensions * Replace randint with choice to avoid duplicates * Added replace as an option to the replay buffer and a warning if desired behaviour is not possible