maxent seems to be using max instead of softmax for V_soft? #4
Open
Description
In the backwards pass of MaxEnt (Algo 9.1 Brian's thesis), MaxEnt uses a softmax calculation to update the V
function (soft Value function), but maxent.py seems to call value_iteration.optimal_value which calculates the hard Value
function that is it uses max instead of softmax. This seems like a bug.
Also the initialization seems kind of weird, atleast for gridworld settings only the final state should be initialized to 0 while all others should be -infinity but value_iteration.optimal_value seems to set everything to 0 initially. Any reason for this discrepancy?
Code for reference: https://github.com/MatthewJA/Inverse-Reinforcement-Learning/blob/master/irl/value_iteration.py#L63
Metadata
Assignees
Labels
No labels