Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sampling bias correction in Hmsc (Is it possible to provide a weight to sampling units) #196

Open
elgabbas opened this issue Jul 29, 2024 · 1 comment

Comments

@elgabbas
Copy link

Hello,

I am using Hmsc with opportunistic data (mainly from GBIF; the model does not employ trait info). As expected, the data shows signs of sampling bias. I only used presence-(?)absence information per grid cell of 10 kilometres. This spatial thinning can, to some extent, reduce the effect of sampling bias, but this is clearly not sufficient.

To correct for sampling bias, I use a layer representing sampling efforts (sampling intensity of all GBIF data of higher taxonomic level than the species of interest) as a predictor of the model. A similar method is used in many cases in the literature (including Hmsc; e.g. section 7.9 of Ovaskainen & Abrego (2020)). However, I see some concern that the predictions will reflect the sampling bias in the data rather than the predicted biological information (i.e., intensively sampled areas will be likely to be predicted with high probability; as expected, almost all species show a positive effect and positive response curve to the sampling intensity predictor). Although it is possible to use Warton et al.'s approach to correct for it by assigning all sampling units the same value representing optimum (e.g. median) sampling efforts during prediction, my previous experience with this method showed overprediction.

I question whether it is possible to consider the sampling intensity differently in Hmsc? In glm, for example, it is possible to use sampling efforts as weights to the model. It seems that this is not directly allowed using a weights argument. Is it possible to use such info as informed priors or offset? And if so, please give me a snippet of code on how this should be identified.

Thanks in advance,
Ahmed

@elgabbas
Copy link
Author

Here is an example data. Source

require(Hmsc)
require(MASS)
set.seed(6)
n = 100
ns = 5
beta1 = c(-2,-1,0,1,2)
alpha = rep(0,ns)
beta = cbind(alpha,beta1)
x = cbind(rep(1,n),rnorm(n))
Lf = x%*%t(beta)
xycoords = matrix(runif(2*n),ncol = 2)
colnames(xycoords) = c("x-coordinate","y-coordinate")
rownames(xycoords) = 1:n
sigma.spatial = c(2)
alpha.spatial = c(0.35)
Sigma = sigma.spatial^2*exp(-as.matrix(dist(xycoords))/alpha.spatial)
eta1 = mvrnorm(mu = rep(0,n), Sigma = Sigma)
lambda1 = c(1,2,-2,-1,0)
Lr = eta1%*%t(lambda1)
L = Lf + Lr
y = as.matrix(L + matrix(rnorm(n*ns),ncol = ns))
yprob = 1*((L +matrix(rnorm(n*ns),ncol = ns))>0)
XData = data.frame(x1 = x[,2])
studyDesign = data.frame(sample = as.factor(1:n))


studyDesign = data.frame(sample = as.factor(1:n))
rL.spatial = HmscRandomLevel(sData = xycoords)
rL.spatial = setPriors(rL.spatial,nfMin=1,nfMax=1)
m.spatial = Hmsc(Y=yprob, XData=XData, XFormula=~x1,
                    studyDesign=studyDesign, 
                    ranLevels=list("sample"=rL.spatial),distr="probit")

thin = 10
samples = 100
transient = 10
verbose = 0

m.spatial = sampleMcmc(m.spatial, thin = thin, samples = samples, transient = transient,
nChains = nChains, nParallel = nChains, verbose = verbose)

How to include the following sampling efforts vector into the model as a weight or informed prior.

set.seed(8)
SampIntensity <- abs(rnorm(nrow(xycoords), mean = 200, sd = 10))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant