Description
Hello Robin,
This is a feature request, not a bug report.
I'd like the output from history$get_data_table()
to include a column for the predicted values of the chosen arms at each step.
For example, for EpsilonGreedyPolicy
it would just be self$theta$mean[[chosen_arm]]
, which I realise is available by setting save_theta = TRUE
in Simulator$new
. If I also set save_context = TRUE
the predicted value of the chosen action can be obtained. (Although I have to take into account the fact that the theta
values are one time step ahead of the values for the current context-arm pair since they have been updated with the reward from the current context-arm pair. That is, the theta
values do not hold the predicted values for the current context-arm pair since they hold the values computed after the reward for the current context-arm pair is known.)
With other policies, such as ContextualEpsilonGreedyPolicy
, using the output from history$get_data_table()
to compute the expected reward for the current action before it is taken is not so straightforward. I see in policy_cmab_lin_epsilon_greedy.R
that you compute expected_rewards[arm]
, but you don't seem to save the values for output later on. It is exactly expected_rewards[arm]
that I would like history$get_data_table()
to include in its output. Having expected_rewards[arm]
for just the chosen arm would be enough for my current needs, but maybe having expected_rewards[arm]
for all arms would be useful in future.
I had a look at history.R
to see if I could work out how to save the values of expected_rewards
, but it looks rather complicated to me and my R
is nowhere near as good as yours :-).
Thanks,
Paul