-
Notifications
You must be signed in to change notification settings - Fork 211
Reward Function
Kartikay Garg edited this page Mar 4, 2018
·
2 revisions
Let,
- l(ti) be the amount of long currency,
- s(ti) be the amount of short currency and
- p(ti) be the price of the currency at time instant ti.
At any timestamp, the reward given to the agent is the actual value its portfolio. It is defined by,
- non zero intermediate rewards allow the agent to converge to a trading strategy in less number of iterations
- however, frequent intermediate rewards are often noisy and tend to destabilize the trading process
where ri is the unrealized PnL reward at ri time instant, ω is suitable parameter and k is the number of lag terms in the exponential weighted average
- balances out the two extremes as described above
- serves dual objectives in guiding policy learning,
- it provides the agent intermediate rewards to facilitate fast learning of the trading strategy
- weighted average over past rewards tends to reduce the noise in the rather frequent rewards