-
Notifications
You must be signed in to change notification settings - Fork 2
Description
The current implementation doesn't work as advertised. Let's dissect it:
sample(
c(high_prob_arms, low_prob_arms), 1,
prob = c(
rep(
p / length(high_prob_arms),
length(high_prob_arms)
),
rep(
(1 - p) / length(low_prob_arms),
length(low_prob_arms)
)
)
)What happens if we pass p = 0 (i.e. complete randomness)? Turns out it'll always pick one of low_prob_arms.
What you probably intended on writing is -- give the same probability of picking for p = 0 (50%:50% for 2 arms case), always pick one of the best arms for p = 1 (100%:0% for 2 arms case), and scale linearly for all values inbetween.
How can we express it differently, so that we arrive at the correct solution?
Let's rephrase it. Let there be two components to weights: random and deterministic. If p = 0, we use 100% of the random component for all weights. If p = 1, we use 100% of the deterministic component for high prob weights only (and nothing for low prob). And to cover everything inbetween, we'll use an appropriate proportion of each of these components.
Summing up the previous paragraph, the values should look more-or-less like follows:
# high prob arm weight
hpa_weight_deterministic <- p * 1
hpa_weight_random <- (1 - p) * 1
hpa_weight <- hpa_weight_deterministic + hpa_weight_random
# low prob arm weight
lpa_weight_deterministic <- p * 0
lpa_weight_random <- (1 - p) * 1
lpa_weight <- lpa_weight_deterministic + lpa_weight_randomIt would be silly to multiply by 1 or 0, so it'll simplify to the following:
hpa_weight <- 1
lpa_weight <- 1 - pNow, the best part is that you can plug it into sample() without standardizing. sample() will do that internally, so you can write:
sample(
c(high_prob_arms, low_prob_arms), 1,
prob = c(
rep(
1,
length(high_prob_arms)
),
rep(
1 - p,
length(low_prob_arms)
)
)
)and not care about it not summing up to 1.