Skip to content

Save predicted reward for chosen arm (feature request) #10

Open
@pstansell

Description

@pstansell

Hello Robin,

This is a feature request, not a bug report.

I'd like the output from history$get_data_table() to include a column for the predicted values of the chosen arms at each step.

For example, for EpsilonGreedyPolicy it would just be self$theta$mean[[chosen_arm]], which I realise is available by setting save_theta = TRUE in Simulator$new. If I also set save_context = TRUE the predicted value of the chosen action can be obtained. (Although I have to take into account the fact that the theta values are one time step ahead of the values for the current context-arm pair since they have been updated with the reward from the current context-arm pair. That is, the theta values do not hold the predicted values for the current context-arm pair since they hold the values computed after the reward for the current context-arm pair is known.)

With other policies, such as ContextualEpsilonGreedyPolicy, using the output from history$get_data_table() to compute the expected reward for the current action before it is taken is not so straightforward. I see in policy_cmab_lin_epsilon_greedy.R that you compute expected_rewards[arm], but you don't seem to save the values for output later on. It is exactly expected_rewards[arm] that I would like history$get_data_table() to include in its output. Having expected_rewards[arm] for just the chosen arm would be enough for my current needs, but maybe having expected_rewards[arm] for all arms would be useful in future.

I had a look at history.R to see if I could work out how to save the values of expected_rewards, but it looks rather complicated to me and my R is nowhere near as good as yours :-).

Thanks,

Paul

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions