maxent seems to be using max instead of softmax for V_soft?

In the backwards pass of MaxEnt (Algo 9.1 Brian's thesis), MaxEnt uses a softmax calculation to update the `V` function (soft Value function), but maxent.py seems to call value_iteration.optimal_value which calculates the `hard Value` function that is it uses max instead of softmax. This seems like a bug. 

Also the initialization seems kind of weird, atleast for gridworld settings only the final state should be initialized to 0 while all others should be -infinity but value_iteration.optimal_value seems to set everything to 0 initially. Any reason for this discrepancy?

Code for reference: https://github.com/MatthewJA/Inverse-Reinforcement-Learning/blob/master/irl/value_iteration.py#L63

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

maxent seems to be using max instead of softmax for V_soft? #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development