Skip to content

Example

Matt Wheeler edited this page Dec 13, 2023 · 6 revisions

Robust Bayesian Inference

Removing Data Points Examples

There has been some question about the "informative" nature of the priors used in dose-response analysis. The default prior used in ToxicR, EFSA, and the US EPA make inference more reliable. The unknown secret of maximum likelihood estimation (MLE) for dose-response is that some data sets don't inform all parameters for every model. As a result, inference using ML may be seen as suspect for these datasets.

To illustrate what happens and what doesn't happen for Bayesian inference, I look at datasets in [5], which are also in the dataset ``dichotomousDR.'' One can analyze the dataset using the default Bayesian approach and the ML approach. Then, one can remove a couple of data points and look at the differences in inference. These data points are removed to intentionally make ML fail and compare the differences in the inference related to the BMD.

Example 1

These data come from the study "The carcinogenicity of dichloroacetic acid in the male Fischer 344 rat" [2]. The nice thing about this study is there are five non-zero dose groups, but only two of these dose-groups are not at 100% response. So, we can fit the model and see what answer we would get with and without these datapoints. For these data, I will use the Weibull dose-response model, which models the probability of response (in this case a tumor) as:

$$\pi(dose) = a + (1-a)(1-\exp[-\beta\times dose^d]).$$

library(tidyverse)
library(ToxicR)
require(gridExtra)
data_example <- ToxicR::dichotomousDR %>% filter(ID==695) %>% select(dose,n,obs)
#remove intermediate doses
data_example2 <- data_example[-(3:4),]

# Fit the default Bayesian model
fit_a <- single_dichotomous_fit(data_example$dose,data_example$obs,data_example$n,
                                model_type="weibull")
summary(fit_a)
## Summary of single model fit (Bayesian-MAP) using ToxicR
## 
## 
## BMD: 26.74 (8.69, 53.35) 90.0% CI
## 
## Model GOF
## --------------------------------------------------
##                 X^2 P-Value
## Test: X^2 GOF 0.226   0.982
# Fit the default MLE
fit_aa <- single_dichotomous_fit(data_example$dose,data_example$obs,data_example$n,
                                         model_type="weibull",fit_type="mle")
summary(fit_aa)
## Summary of single model fit (MLE) using ToxicR
## Model: Weibull 
## 
## BMD: 47.02 (15.91, 102.96) 90.0% CI
## 
## Model GOF
## --------------------------------------------------
##                 X^2 P-Value
## Test: X^2 GOF 0.056   0.997
grid.arrange(plot(fit_a)+scale_x_continuous(trans="pseudo_log") + ggtitle("Bayesian") ,
             plot(fit_aa)+scale_x_continuous(trans="pseudo_log")+ ggtitle("MLE"), ncol=2)

Here, the ML and Bayesian methods have similar BMDs. The BMDL for the ML model is 15.9 mg/kg per day. The BMDL for the Bayesian method is 8.7 mg/kg day, so the Bayesian model says the point of departure is about 50% that of the ML model. What happens when we remove the intermediate data?

## Summary of single model fit (Bayesian-MAP) using ToxicR
## 
## 
## BMD: 17.62 (2.90, 75.51) 90.0% CI
## 
## Model GOF
## --------------------------------------------------
##                 X^2 P-Value
## Test: X^2 GOF 0.169   0.862
## Summary of single model fit (MLE) using ToxicR
## Model: Weibull 
## 
## BMD: 50.29 (3.07, 268.84) 90.0% CI
## 
## Model GOF
## --------------------------------------------------
##                X^2 P-Value
## Test: X^2 GOF 0.04       1

Example 2

This issue is not only a Dichotomous/Weibull Model problem. It occurs in most of the standard models. Here we do the same thing as above for the log-probit model. $$\pi(dose) = a + (1-a)\Phi(b + d\times \log[dose]) $$ In this example, we look at the data from [3], which is an NTP technical report. there are several dose groups that have 100% and a couple of intermediate dose groups.

##########
data_one <- ToxicR::dichotomousDR %>% group_by(ID) %>% filter(length(ID)>5)

data_example <- ToxicR::dichotomousDR %>% filter(ID==642) %>% select(dose,n,obs)

fit_a <- single_dichotomous_fit(data_example$dose,data_example$obs,data_example$n,
                                model_type="log-probit", fit_type ="laplace")

summary(fit_a)
## Summary of single model fit (Bayesian-MAP) using ToxicR
## 
## 
## BMD: 5.76 (0.79, 17.20) 90.0% CI
## 
## Model GOF
## --------------------------------------------------
##                 X^2 P-Value
## Test: X^2 GOF 0.972   0.878
fit_aa <- single_dichotomous_fit(data_example$dose,data_example$obs,data_example$n,
                                model_type="log-probit",fit_type ="mle")

summary(fit_aa)
## Summary of single model fit (MLE) using ToxicR
## Model:  Log-Probit 
## 
## BMD: 4.21 (0.01, 16.67) 90.0% CI
## 
## Model GOF
## --------------------------------------------------
##                 X^2 P-Value
## Test: X^2 GOF 0.165   0.983
grid.arrange(plot(fit_a)+scale_x_continuous(trans="pseudo_log") + ggtitle("Bayesian") ,
             plot(fit_aa)+scale_x_continuous(trans="pseudo_log")+ ggtitle("MLE"), ncol=2)

Notice that both models fit the data, which we look at visually and using an approximate goodness of fit test. Nothing stands out as obviously problematic. The Bayesian MAP estimate has a BMD of 5.76 (0.79, 17.20), whereas the ML estimate is 4.21 (0.01, 16.67). Unlike the last example, the Bayesian method predicts a higher BMD. Again, we can remove two dose groups and see what happens. Here, the ML gets mixed up in the opposite direction.

data_example2 <- data_example[-(2:3),]

fit_b <- single_dichotomous_fit(data_example2$dose,data_example2$obs,data_example2$n,
                                model_type="log-probit", fit_type ="laplace")
summary(fit_b)
## Summary of single model fit (Bayesian-MAP) using ToxicR
## 
## 
## BMD: 9.91 (0.42, 42.86) 90.0% CI
## 
## Model GOF
## --------------------------------------------------
##                 X^2 P-Value
## Test: X^2 GOF 1.043   0.546
fit_c <- single_dichotomous_fit(data_example2$dose,data_example2$obs,data_example2$n,
                                model_type="log-probit",fit_type = "mle")
summary(fit_c)
## Summary of single model fit (MLE) using ToxicR
## Model:  Log-Probit 
## 
## BMD: 0.00 (0.00, NaN) 90.0% CI
## 
## Model GOF
## --------------------------------------------------
##                 X^2 P-Value
## Test: X^2 GOF 0.069   0.793
grid.arrange(plot(fit_a)+scale_x_continuous(trans="pseudo_log") + ggtitle("Full data - Log-Probit") ,
             plot(fit_b)+scale_x_continuous(trans="pseudo_log")+ ggtitle("Removed data - Log-Probit"), ncol=2)

The Bayesian approach (left) still generally reproduces the same curve and the same BMD 9.91 (0.42, 42.86), whereas the ML estimate blows up and the BMD is no longer calculated correctly.

grid.arrange(plot(fit_b)+scale_x_continuous(trans="pseudo_log") + ggtitle("Bayesian - Log-Probit") ,
             plot(fit_c)+scale_x_continuous(trans="pseudo_log")+ ggtitle("MLE - Log-Probit"), ncol=2)

Visually, the failure of the ML estimate to model a reasonable fit is observed in the above figures. What the ML estimate is trying to do is fit an on/off switch. The response is 100% after a point (actually a single molecule!). The Bayesian model adds information that weakly informs the problem parameter ``d.'' The CI on the BMD is wider, which one expects due to the reduction in information.

Example 3

We have looked at dichotomous data, but the same problems exist with continuous data. Here, I look at a case like Example 1, where no 'intermediate' response exists. We will fit a Hill model:

$$f(dose) = a + \frac{b\times dose^d}{c^d + dose^d}$$

Like the Weibull model in example 1, $d$ is the problem parameter.

resp  <- Y <- c(5.890,6.740,4.976,5.990,6.608,6.527,6.783,5.360,6.606,
       6.272,5.909,4.912,5.486,3.860,4.877,4.641,5.670,
       5.591,4.840,5.801,5.822,6.413,5.708,7.372,6.552,
       5.954,6.253,7.415,7.541,7.490,9.825,9.391,9.469,
       10.966,10.785,10.704,11.716,10.584,10.883)

doses <- c(0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,    
           0.0000,0.0000,0.0000,9.4905,9.4905,9.4905,30.0116,30.0116,
           30.0116,94.9050,94.9050,94.9050,300.1160, 300.1160,300.1160,
           949.0500,949.0500,949.0500,3001.1597,3001.1597,3001.1597,9490.5002,
           9490.5002,  9490.5002, 30011.5969, 30011.5969, 30011.5969, 94905.0023, 
           94905.0023, 94905.0023)

udose <- unique(doses)[-(5:8)]

bayes_a <- single_continuous_fit(doses,resp,model_type = "hill",distribution="normal",fit_type="laplace",BMR=1)
mle_a   <- single_continuous_fit(doses,resp,model_type = "hill",distribution="normal",fit_type="mle",BMR=1)


sub_resp <- resp[doses %in% udose]
sub_dose <- doses[doses %in% udose]


bayes_b <- single_continuous_fit(sub_dose,sub_resp,model_type = "hill",distribution="normal",fit_type="laplace",BMR=1)
mle_b   <- single_continuous_fit(sub_dose,sub_resp,model_type = "hill",distribution="normal",fit_type="mle",BMR=1)

grid.arrange(plot(bayes_a)+scale_x_continuous(trans="pseudo_log"),
             plot(bayes_b)+scale_x_continuous(trans="pseudo_log"))
grid.arrange(plot(mle_a)+scale_x_continuous(trans="pseudo_log"),
              plot(mle_b)+scale_x_continuous(trans="pseudo_log"))
summary(bayes_a)
## Summary of single model fit (Bayesian-MAP) using ToxicR
## 
## 
## BMD: 1269.21 (607.29, 2375.91) 90.0% CI
## 
## Model GOF
## --------------------------------------------------
##                               -2LL    DF P-Value
## Test: Mean Adequate          10.49  6.22   0.117
## Test: Mean/Variance Adequate 22.14 15.22   0.111
summary(bayes_b)
## Summary of single model fit (Bayesian-MAP) using ToxicR
## 
## 
## BMD: 849.19 (136.66, 6272.16) 90.0% CI
## 
## Model GOF
## --------------------------------------------------
##                               -2LL   DF P-Value
## Test: Mean Adequate           7.88 2.92   0.045
## Test: Mean/Variance Adequate 11.93 7.92   0.150
summary(mle_a)
## Summary of single model fit (MLE) using ToxicR
## Model: Hill Distribution: Normal 
## 
## BMD: 1063.02 (526.48, 2068.49) 90.0% CI
## 
## Model GOF
## --------------------------------------------------
##                               -2LL DF P-Value
## Test: Mean Adequate           9.46  6   0.149
## Test: Mean/Variance Adequate 21.11 15   0.133
summary(mle_b)
## Summary of single model fit (MLE) using ToxicR
## Model: Hill Distribution: Normal 
## 
## BMD: 4403.70 (116.35, 22522.88) 90.0% CI
## 
## Model GOF
## --------------------------------------------------
##                               -2LL DF P-Value
## Test: Mean Adequate           6.21  2   0.045
## Test: Mean/Variance Adequate 10.25  7   0.175
bayes_p <- cbind(bayes_a$parameters,bayes_b$parameters)
mle_p <- cbind(mle_a$parameters,mle_b$parameters)
rownames(bayes_p) <- c("a","b","c","d","log(sigma)")
rownames(mle_p) <- c("a","b","c","d","log(sigma)")
bayes_p[3:4,]
##          [,1]        [,2]
## c 5010.831097 2609.014133
## d    1.322293    1.414496
mle_p[3:4,]
##          [,1]         [,2]
## c 5179.200892 14046.706631
## d    1.280015     4.008817

Here, the results are not as stark, but we can observe that the BMD estimate is much more stable. It increases by 25% when the data are removed, but it increases by 415% in the case of the MLE. Further, the parameters of interest, e.g., d, do not change that much vs. the MLE. The Bayes estimates are 1.3 vs. 1.4, and the MLE estimates are 1.3 vs. 4.0. Again, there is more stability in the Bayes approach. Notably, the Bayesian fits are far more consistent with the "full" data MLE estimates when most information is removed from the analysis!

Why?

It is not enough to see that the problem exists; one should understand why it happens. Fortunately, we can readily diagnose why it happens and state why the MLE is inappropriate for inference in these cases. Simply put, the data poorly inform the likelihood, and the central limit theorem does not really ``kick-in.'' Further, there are 100's of examples I can point to that have the same behavior. The Bayesian methodology, with the chosen priors, does a much better job at estimating the dose-response curve and the BMD even when the "important" doses are removed!

require(numDeriv)
set.seed(123)

#Randomly Sample 20 data points
gamma_data <- rgamma(20,10,3)

gamma_log_like <-function(beta){
  like_vals <- dgamma(gamma_data,10,beta,log=TRUE)
  return (-sum(like_vals))
}
log_like <- Vectorize(gamma_log_like)
beta_values <- seq(3.08-0.5,3.08+0.5,0.02)
log_l_values <- log_like(beta_values)
plot(beta_values,log_l_values,type='l',col=1,lwd=2,lty=2)
##################
# Find the Optimum
MLE <- optim(c(3),log_like,method="Brent",lower = c(0), upper = c(10))$par
VAR <- 1/hessian(log_like,MLE)
theory <- 1/(2*VAR)*(beta_values-MLE)^2 + optim(c(3),log_like,method="Brent",lower = c(0), upper = c(10))$value
lines(beta_values,theory,col=2)
######

We will first look at a straightforward example, approximating a parameter from the gamma distribution of what inference should look like, and then extrapolate this to our case. Here, we simulate data from a gamma random variable with $\alpha=10$ and $\beta=3.$ We are interested in estimating $\beta$ and more interested in what the log-likelihood looks like. All this code does is calculate the negative of the log-likelihood over a range of values and plot it. It then compares this to the theoretical values given by the central limit theorem. Here, the black dashed line represents the log-likelihood, and the red line represents the theoretical value the central limit theorem predicts. Notice that one gets an approximate parabola, and the two lines are very close with $n=20.$

You should see an approximate parabola when the central limit theorem (CLT) works well. However, you never see a parabola in a typical Toxicology dose-response data set, but you can still get reasonable approximations using a profile likelihood or other CLT based approximation. Note: The profile likelihood is not a get-out-of-jail-free card with the central limit theorem. Theoretically, every approximation uses the same asymptotic expansion, which implies they both "contract" (i.e., result in the inference) at the same rate. There are some differences in small samples, and it is easier to guarantee things like $BMD \geq 0$ using the profile likelihood method. Some methods work effectively (e.g. [4]) by using Wald based approximations (e.g. direct CLT Normal Theory), and this is because the theory is all based upon the same math.

obs <- c(8,8,8,8)
y   <- c(6,5,7,8)
doses <- c(0,0.5,1,2)/2

weibull_model <- function(parms,doses){
  g <- parms[1]
  b <- parms[2]
  d <- parms[3]
  
  p <- g + (1-g)*(1-exp(-b*doses^d))
  return(p)
}

log_likelihood <- function(parms){
  p <- weibull_model(parms,doses)
  log_like <- dbinom(y,obs,p,log=TRUE)
  return(-sum(log_like))
}

opt_p <- optim(c(0.5,1.22,0.4181133),log_likelihood, method="L-BFGS-B",
               lower=c(1e-8,0.001,0.001),upper=c(1-1e-8,200,18))
MLE <- opt_p$par
OPTV <- opt_p$value
values <- seq(0.001,18,0.001)*0
ii = 1
for (profile_value in seq(0.001,18,0.001)){

  prof_log_likelihood <- function(parms){
    p <- weibull_model(c(parms,profile_value),doses)
    log_like <- dbinom(y,obs,p,log=TRUE)
    return(-sum(log_like))
  }
  
  opt_p <- optim(c(0.5,1.22),prof_log_likelihood, method="L-BFGS-B",
                 lower=c(1e-8,0.00001),upper=c(1-1e-8,200))
  
  values[ii] <- opt_p$value
  ii <- ii + 1
  
}

In this example, we fit the Weibull model using ML to data from [1]. Here, we look at skeletal malformations in pups where the dams were exposed to 0, 0.5, 1.0, and 2.0 mg/kg per day of BDE-99. Here, 6/8, 5/8, 7/8, and 8/8 litters contained pups with skeletal malformations.

Unlike the previous 'deleted' data cases, this is actual data, but notice a background response and a 'high' saturated response. This data issue causes a problem with parameter `d.'

plot(seq(0.001,18,0.001),values,type='l',ylab="-log-likelihood",xlab="values of 'd'")
abline(v=MLE[3],col=2,lty=2)

In the plot, we see there is very little information on parameter `d', where the MLE occurs at 7.78. From a profile likelihood perspective, you can notice two things:

  • The plateau to the right implies essentially no upper bound on the BMDL.
  • The lowest estimate doesn't have a likelihood value greater than 6.22, which implies that we can't compute the BMDL.
  • The lowest numerical value $d=0.001$ gives a BMDL that would be essentially zero.

Using the lowest value of d that was still numerically stable $d=0.01$ using double precision computer arithmetic. The profile likelihood has $\beta = 0.4072,$ and we get the BMDL of less than $4.96e-61.$ This value implies one molecule could increase the risk of skeletal malformation by 10% over background!

require(ToxicR)
model_fit <- single_dichotomous_fit(doses,y,obs,model_type="weibull",fit_type='mcmc',samples=50000)
summary(model_fit)
## Summary of single model fit (MCMC) using ToxicR
## Model: Weibull 
## 
## BMD: 0.20 (0.04, 0.66) 90.0% CI
## 
## Convergence Diagnostics on BMD
## --------------------------------------------------
## Effective Sample Size: 4419.37
## 
## Geweke Z-score that mean of first 30% of 
## MCMC chain is different from last 40%
## Z-Score: -0.979  P-value 0.328

Using the Bayesian approach we have 0.20 (0.04, 0.68). I think the BMDL of 0.04 mg/kg per day is much more reasonable than 1-molecule. We can go further and use model averaging:

model_fit <- ma_dichotomous_fit(doses,y,obs,fit_type='mcmc',samples=50000,
                                threads = 12)
summary(model_fit)
## Summary of single MA BMD
## 
## Individual Model BMDS
## Model                                                            		 BMD (BMDL, BMDU)	Pr(M|Data)
## ___________________________________________________________________________________________
##  Quantal-Linear                                               			0.08 (0.03 ,0.34) 	 0.294
## Weibull                                                       			0.20 (0.04 ,0.65) 	 0.208
##  Multistage                                                   			0.09 (0.05 ,0.16) 	 0.164
##  Log-Probit                                                   			0.39 (0.11 ,0.84) 	 0.101
##  Log-Logistic                                                 			0.29 (0.06 ,0.70) 	 0.098
##  Gamma                                                        			0.21 (0.05 ,2.02) 	 0.079
##  Hill                                                         			1.31 (0.08 ,NaN) 	 0.025
##  Probit                                                       			0.10 (0.04 ,1.00) 	 0.017
##  Logistic                                                     			0.08 (0.04 ,0.81) 	 0.014
## ___________________________________________________________________________________________
## Model Average BMD: 0.13 (0.04, 0.75) 90.0% CI
plot(model_fit)

When you do that, you tend to get the same answer. In this case, the Bayesian approach provides a superior answer.

References

[1] Blanco, Jordi, Miquel Mulero, José L Domingo, and Domènec J Sánchez. 2012. “Gestational Exposure to BDE-99 Produces Toxicity Through Upregulation of CYP Isoforms and ROS Production in the Fetal Rat Liver.” Toxicological Sciences 127 (1): 296–302.

[2] DeAngelo, Anthony B, F Bernard Daniel, Bernard M Most, and Greg R Olson. 1996. “The Carcinogenicity of Dichloroacetic Acid in the Male Fischer 344 Rat.” Toxicology 114 (3): 207–21.

[3] Dunnick, June K. 1992. NTP Technical Report on Toxicity Studies of o-, m-, and p-Nitrotoluenes (Cas Nos.: 88-72-2, 99-01-1, 99-99-0) Administered in Dosed Feed to F344 n Rats and B6C3F 1 Mice. United States Department of Health; Human Services.

[4] Ewald, Jessica, Othman Soufan, Jianguo Xia, and Niladri Basu. 2021. “FastBMD: An Online Tool for Rapid Benchmark Dose–Response Analysis of Transcriptomics Data.” Bioinformatics 37 (7): 1035–36.

[5] Wheeler, Matthew W, Walter W Piegorsch, and Albert John Bailer. 2019. “Quantal Risk Assessment Database: A Database for Exploring Patterns in Quantal Dose-Response Data in Risk Assessment and Its Application to Develop Priors for Bayesian Dose-Response Analysis.” Risk Analysis 39 (3): 616–29.