[] RND network [] beta -> UVFA [] retrace loss #r2d2 : value based 정리 학습대상 : R2D2 Embedding Model G_function 핵심개념 intrinsic reward alpha-beta UVFA 질문 구현할 것 Agent57 Meta Controller (Beta, gamma) Long trace Separate network