Marginal survey indices? #314

MikkoVihtakari · 2024-03-06T08:28:06Z

MikkoVihtakari
Mar 6, 2024

In cases where a model has a covariate, say value ~ geartype + s(depth), as I understand, the standard approach would be to make a prediction grid with all levels of geartype, run that through predict.sdmTMB(..., return_tmb_object = TRUE) and pass the object to get_index(). This, as I understand, would give a summed survey index for all gear types. One could avoid this by passing only one level of geartype into the prediction grid making the survey index conditional. If the gear types have a different selectivity, one could want to average the gear types (i.e. marginal) to the survey index.

Is this possible using get_index() or get_index_sims()?

Thus far I have not managed to do this using the index functions in sdmTMB and have calculated the marginal index manually using something like the following (including center of gravity and depth):

predict_output$data %>% 
        group_by(year, depth, X, Y) %>% 
        reframe(est = mean(est)) %>% 
        group_by(year) %>% 
        reframe(
          Y = weighted.mean(Y, est),
          X = weighted.mean(X, est),
          depth = weighted.mean(depth, est),
          est = sum(est)*grid_side_size^2
        )

And predict(nsim) variations of that idea to calculate uncertainty. The tidyverse way I have implemented the double summation over groups is inefficient and makes uncertainty simulation to take a very long time. If there are no more elegant "standard" ways of doing this, I need to make the code more effective using data.table and/or vectorization.

Answered by seananderson

Mar 8, 2024

Gear type would usually be considered a catchability covariate and so you would predict for a single reference gear type. As long as gear type is not in the model as a spatially varying coefficient or interacting with some variable that is changing through time, the indexes that you get out of the various gear types should be the same, just shifted up or down for catchability. If the index was being used in a stock assessment, then typically you would either consider it a relative index and estimate catchability (in which case which gear type you pick shouldn't matter unless you are putting a prior on it) or you would treat it as an absolute index, in which case I assume you would want it…

View full answer

seananderson · 2024-03-08T20:02:51Z

seananderson
Mar 8, 2024
Maintainer

Gear type would usually be considered a catchability covariate and so you would predict for a single reference gear type. As long as gear type is not in the model as a spatially varying coefficient or interacting with some variable that is changing through time, the indexes that you get out of the various gear types should be the same, just shifted up or down for catchability. If the index was being used in a stock assessment, then typically you would either consider it a relative index and estimate catchability (in which case which gear type you pick shouldn't matter unless you are putting a prior on it) or you would treat it as an absolute index, in which case I assume you would want it to reflect some specific gear type that you think has 100% catchability.

If you really did want to calculate an average index could you not include all gear types and then divide the total by the number of gear types? Again, I'm not sure why you would do this though and it would be much slower to calculate.

Separately, it's still not clear how much we should trust the output of get_index_sims(), or similar calculations done yourself, unless the draws come from MCMC/tmbstan instead of the implied joint precision matrix.

3 replies

MikkoVihtakari Mar 9, 2024
Author

Thanks for the reply, Sean,

The reason I am asking this question, is because I make a model for different length groups and then use those to calculate index for each length group. Some length groups are caught better with one gear type, while others are caught better with another one. Hence I thought averaging would make some kind of common ground. These are dependent on gear selectivity and cannot be summated, but when comparing to old survey indices, I would like the index to represent the region instead of number of gears (and seasons, which I have in the model too). Hence, I think it makes sense to average. In such comparisons, the level also carries some information (i.e. the size of the region covered, whether the finer resolution better captures distribution patterns, etc.). For assessment, these indices will be handled as relative.

I have not quite thought through the division. Since there are no interactions, one should be able to make the summated index match the averaged by dividing with right values (two covariates). I was not sure whether the CIs would follow the same logic, but will experiment.

get_index(bias_correct = TRUE) does not work for my models which have a large detailed grid (5 km side, the Norwegian coast and the Barents Sea). Detailed grid is necessary, because I am working with Greenland halibut, which has its spawning area along the steep continental slope. Even 5 km seems to be too coarse sometimes. I think the get_index crash is some sort of TMB related memory issue. I can use get_index(bias_correct = FALSE) if you think that is more trustworthy than get_index_sims(). For what I have experimented, these options appear to produce relatively similar CIs.

seananderson Mar 11, 2024
Maintainer

Again, I think the common approach would be to treat length-specific catchability, i.e. selectivity, as a catchability covariate and produce an index for a reference gear type. If selectivity is being estimated within an assessment from age or length composition data, those would typically come from a specific survey that is associated with the reference index.

However, more generally, if memory is an issue, you could try calculating the index for a year (or small group of years) at a time. E.g.

library(sdmTMB)

m <- sdmTMB(
  density ~ 0 + factor(year),
  data = pcod_2011,
  time = "year",
  spatiotemporal = "off", # faster example
  mesh = pcod_mesh_2011,
  family = delta_gamma()
)

# purrr example
index <- purrr::map_dfr(unique(pcod_2011$year), \(y) {
  nd <- replicate_df(qcs_grid, "year", y)
  p <- predict(m, newdata = nd, return_tmb_object = TRUE)
  get_index(p, bias_correct = TRUE)
})

# base R example
index <- lapply(unique(pcod_2011$year), \(y) {
  nd <- replicate_df(qcs_grid, "year", y)
  p <- predict(m, newdata = nd, return_tmb_object = TRUE)
  get_index(p, bias_correct = TRUE)
})
index <- do.call(rbind, index)

index
#>   year      est      lwr      upr  log_est        se
#> 1 2011 286527.7 213616.4 384325.0 12.56559 0.1498258
#> 2 2013 298703.7 235809.9 378372.0 12.60721 0.1206277
#> 3 2015 353797.9 274388.0 456189.6 12.77648 0.1296874
#> 4 2017 190378.2 139694.3 259451.2 12.15677 0.1579397

^{Created on 2024-03-11 with reprex v2.1.0}

seananderson Mar 11, 2024
Maintainer

Also, try the latest version I just pushed to GitHub. I did some testing and I believe this new configuration of bias correction is faster and more memory efficient for most models.

MikkoVihtakari · 2024-03-15T08:33:17Z

MikkoVihtakari
Mar 15, 2024
Author

Apologies for a long response time. Thanks for your response. I appreciate it as well as your effort with sdmTMB. This tool is taking us to the next level in stock assessment.

I have multiple goals: 1) make a sdmTMB model to study the fine-scale distribution of the species over time and space in a manuscript. Here I would also 2) examine whether length binned indices correlate with fishing, climate and other fish indices, and finally 3) to make an index for the Greenland halibut assessment (which does not have to be the same than used in the manuscript). These indices will be passed into an assessment model and can be relative. I can send you the manuscript draft once I get it ready for feedback, if you wish (Eric and Jim are already on the list).

This post has multiple diverging issues. Let's break them down:

Marginal, conditional and summed indices

Obviously, you are right, but I am just trying to wrap my head around this and hence keep on coming back to this. Using the example below, I understand now that the summed and marginal indices are simple sums and means of conditional indices, respectively. Confidence intervals differ, however, and one could not simply summate/average them (this is expected). Summed confidence intervals appear wider in the example than the ones calculated by get_index(). Hence it is best to select one gear type to calculate indices as you say. Since the difference is additive as specified in the sdmTMB formula, it does not matter which gear type one selects to report the indices. Both will be distorted by gear selectivity anyway.

# Index type example ####

library(sdmTMB)
library(tidyverse)
library(cowplot)
#> 
#> Attaching package: 'cowplot'
#> The following object is masked from 'package:lubridate':
#> 
#>     stamp

tmp_dt <- pcod_2011 %>% 
  mutate(geartype = rep(letters[1:2], length.out = n()))

m <- sdmTMB(
  density ~ 0 + geartype + poly(log(depth), 2),
  data = tmp_dt,
  time = "year",
  mesh = pcod_mesh_2011,
  family = delta_gamma()
)

nd <- qcs_grid %>% 
  replicate_df("year", unique(tmp_dt$year)) %>% 
  replicate_df("geartype", unique(tmp_dt$geartype))

p_summed <- predict(m, newdata = nd, return_tmb_object = TRUE, 
                    type = "response")
p_conditional_a <- predict(m, newdata = nd %>% filter(geartype == "a"),
                           return_tmb_object = TRUE)
p_conditional_b <- predict(m, newdata = nd %>% filter(geartype == "b"),
                           return_tmb_object = TRUE)

indices <- 
  bind_rows(
    get_index(p_summed, bias_correct = FALSE) %>% 
      mutate(type = "summed"),
    
    get_index(p_conditional_a, bias_correct = FALSE) %>% 
      mutate(type = "conditional a"),
    
    get_index(p_conditional_b, bias_correct = FALSE) %>% 
      mutate(type = "conditional b"),
    
    p_summed$data %>% 
      group_by(year, depth, X, Y) %>% 
      reframe(est = mean(est)) %>% 
      group_by(year) %>% 
      reframe(est = sum(est)) %>% 
      mutate(lwr = NA, upr = NA, log_est = log(est), se = NA) %>% 
      mutate(type = "marginal")
  ) %>% 
  mutate(method = "predictions")
#> Bias correction is turned off.
#> It is recommended to turn this on for final inference.
#> Bias correction is turned off.
#> It is recommended to turn this on for final inference.
#> Bias correction is turned off.
#> It is recommended to turn this on for final inference.

## Fixed effect vs index ratio ####

fixef <- 
  lapply(seq_along(m$family$family), function(i) {
    tidy(m, conf.int = TRUE, model = i) %>% 
      mutate(family = m$family$family[i],
             link = m$family$link[i], 
             .before = 1)
  }) %>% 
  bind_rows()

## Ratio between fixed effects for a and b:
(fixef %>% 
    filter(family == "binomial", term == "geartypeb") %>% 
    pull(estimate) %>% 
    m$family[[1]]$linkinv() *
    fixef %>% 
    filter(family == "Gamma", term == "geartypeb") %>% 
    pull(estimate) %>% 
    m$family[[2]]$linkinv()) /
  (fixef %>% 
     filter(family == "binomial", term == "geartypea") %>% 
     pull(estimate) %>% 
     m$family[[1]]$linkinv() *
     fixef %>% 
     filter(family == "Gamma", term == "geartypea") %>% 
     pull(estimate) %>% 
     m$family[[2]]$linkinv())
#> [1] 0.8633095

## Ratio between index estimates for a and b:
indices %>%
  filter(grepl("conditional", type)) %>% 
  group_by(year) %>% 
  reframe(ratio = est[type == "conditional b"] / est[type == "conditional a"])
#> # A tibble: 4 × 2
#>    year ratio
#>   <int> <dbl>
#> 1  2011 0.882
#> 2  2013 0.889
#> 3  2015 0.887
#> 4  2017 0.882

## Plot ####

p1 <- indices %>% 
  ggplot(
    aes(x = year, y = est, ymin = lwr, ymax = upr, color = type, 
        fill = type)
  ) +
  geom_ribbon(color = NA, alpha = 0.3) +
  geom_path(size = 1) +
  labs(color = "Index type", fill = "Index type") +
  theme_classic()
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

p2 <- bind_rows(
  indices %>% filter(type %in% c("summed", "marginal")),
  
  indices %>%
    filter(grepl("conditional", type)) %>% 
    group_by(year) %>% 
    reframe(est = sum(est), lwr = sum(lwr), upr = sum(upr)) %>% 
    mutate(type = "summed",
           method = "recalculated"),
  
  indices %>%
    filter(grepl("conditional", type)) %>% 
    group_by(year) %>% 
    reframe(est = mean(est), lwr = mean(lwr), upr = mean(upr)) %>% 
    mutate(type = "marginal",
           method = "recalculated")
) %>% 
ggplot(
  aes(x = year, y = est, ymin = lwr, ymax = upr, color = method, 
      fill = method, linetype = method)
) +
  geom_ribbon(color = NA, alpha = 0.3) +
  geom_path(size = 1, alpha = 0.5) +
  facet_wrap(~type, scales = "free_y") +
  labs(color = "Method", fill = "Method") +
  theme_classic()

cowplot::plot_grid(p1, p2, ncol = 1)
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf

^{Created on 2024-03-15 with reprex v2.1.0}

Bias correction

Unfortunately bias correction still leads to a crash in my high-resolution models on a server with 503G ram and 71 cores. As far as I understand, splitting by year is not an option, because I have gear type, season and depth as covariates in the model. These would get estimated differently for different years. Hence I'll probably need to manage without bias correction for now, but will need to get the indices bias corrected eventually. I examined the differences among the methods below.

Index calculation methods

Based on my experimentation with my high-resolution model, the method of choice to calculate the index leads to approximately similar central estimates (put aside the index type examined above), but the confidence intervals may differ. Based on experimentation with the example dataset supplied in sdmTMB, this is not always the case (see the example below). get_index(bias_correct = FALSE) sums up the prediction grid by year (but I am not sure how it calculates the CIs, hence excluded from sum predict). The values seem to be consequently lower than with get_index(bias_correct = TRUE). In the example, the difference is on average 10%, which is quite a lot for a survey index, although the trends are almost identical and hence the method of choice might not be that important. get_index_sims() does not seem to produce stabile values even when I pump up the number of simulations. In the example, I stabilized the results by using seed, and the average difference is a few percentages from the bias corrected index. Sometimes it was more. Differences were larger when using year as spatiotemporal over fixed effect.

This shows that, as you say, one should do the indices with bias correction if possible. Currently in my case that is not possible, but I will continue experimenting.

Year as fixed or spatiotemporal effect?

My high-resolution model does not converge when using year as both fixed and spatiotemporal effect. This seems to be the case for the example dataset too. So, one has to choose. For my dataset, I would be inclined to use year as spatiotemporal effect, because there are large differences in, both, sampling coverage and distribution among years. I need to experiment using both options. So far, I have only used the spatiotemporal option. Maybe bias correction would work if using year as fixed effect, although I am afraid that would simplify the model too much.

# Index calculation method example ####

library(sdmTMB)
library(tidyverse)
library(cowplot)
#> 
#> Attaching package: 'cowplot'
#> The following object is masked from 'package:lubridate':
#> 
#>     stamp

## Set seed to make get_index_sims results consequent
set.seed(1)

## Model with year as spatiotemporal random field
m <- sdmTMB(
  density ~ 0 + poly(log(depth), 2),
  data = pcod_2011,
  time = "year",
  mesh = pcod_mesh_2011,
  family = delta_gamma()
)

nd <- qcs_grid %>% replicate_df("year", unique(pcod_2011$year))
p <- predict(m, newdata = nd, return_tmb_object = TRUE, type = "response")

## Model with year as a fixed effect
m2 <- sdmTMB(
  density ~ 0 + factor(year) + poly(log(depth), 2),
  data = pcod_2011,
  time = "year",
  spatiotemporal = "off",
  mesh = pcod_mesh_2011,
  family = delta_gamma()
)

p2 <- predict(m2, newdata = nd, return_tmb_object = TRUE, type = "response")

## Compile the indices ####
indices <- 
  bind_rows(
    bind_rows(
      get_index(p, bias_correct = FALSE) %>% 
        mutate(type = "get_index(\nbias_correct = F)"),
      
      get_index(p, bias_correct = TRUE) %>%
        mutate(type = "get_index(\nbias_correct = T)"),
      
      p$data %>% 
        group_by(year, depth, X, Y) %>% 
        reframe(est = mean(est)) %>% 
        group_by(year) %>% 
        reframe(est = sum(est)) %>% 
        mutate(lwr = NA, upr = NA, log_est = log(est), se = NA) %>% 
        mutate(type = "sum predict"),
      
      get_index_sims(obj = predict(m, newdata = nd, nsim = 100)) %>% 
        mutate(type = "get_index_sims") 
    ) %>% 
      mutate(year_effect = "spatiotemporal"),
    
    bind_rows(
      get_index(p2, bias_correct = FALSE) %>%
        mutate(type = "get_index(\nbias_correct = F)"),
      
      get_index(p2, bias_correct = TRUE) %>%
        mutate(type = "get_index(\nbias_correct = T)"),
      
      p2$data %>% 
        group_by(year, depth, X, Y) %>% 
        reframe(est = mean(est)) %>% 
        group_by(year) %>% 
        reframe(est = sum(est)) %>% 
        mutate(lwr = NA, upr = NA, log_est = log(est), se = NA) %>% 
        mutate(type = "sum predict"),
      
      get_index_sims(obj = predict(m2, newdata = nd, nsim = 100)) %>% 
        mutate(type = "get_index_sims") 
    ) %>% 
      mutate(year_effect = "fixed")
  )
#> Bias correction is turned off.
#> It is recommended to turn this on for final inference.
#> We generally recommend using `get_index(..., bias_correct = TRUE)`
#> rather than `get_index_sims()`.
#> Bias correction is turned off.
#> It is recommended to turn this on for final inference.
#> We generally recommend using `get_index(..., bias_correct = TRUE)`
#> rather than `get_index_sims()`.

## Line plot ####

pl <- ggplot(
  data = indices,
  aes(x = year, y = est, ymin = lwr, ymax = upr, color = type, fill = type)
) +
  geom_ribbon(color = NA, alpha = 0.3) +
  geom_path(size = 1) +
  geom_path(
    data = indices %>% 
      mutate(type2 = type, year_effect2 = year_effect) %>% 
      dplyr::select(-type, -year_effect),
    aes(x = year, y = est, color = type2, linetype = year_effect2),  
    inherit.aes = FALSE) +
  scale_linetype_manual(values = c(2,3)) +
  labs(color = "Index method", linetype = "Year effect", fill = "Index method") +
  facet_grid(year_effect~type) +
  theme_classic()
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

## Point plot ####

pp1 <- indices %>% 
  group_by(type, year_effect) %>% 
  reframe(est_sd = sd(est), est = mean(est), lwr = mean(lwr), upr = mean(upr), 
          se = mean(se)) %>% 
  mutate(pr_est = 100*est/mean(est), pr_upr = 100*upr/mean(est), 
         pr_lwr = 100*lwr/mean(est)) %>% 
  ggplot(aes(type, pr_est, ymax = pr_upr, ymin = pr_lwr, color = year_effect)) +
  geom_hline(yintercept = 100, color = "grey") +
  geom_errorbar(width = 0.2, position = position_dodge(width = 0.3)) +
  geom_point(position = position_dodge(width = 0.3)) + 
  labs(y = "Percentage of mean estimate", color = "Year effect", x = "Index method") +
  theme_classic()

pp2 <- indices %>% 
  group_by(type, year_effect) %>% 
  reframe(est_sd = sd(est), est = mean(est), lwr = mean(lwr), upr = mean(upr), 
          se = mean(se)) %>% 
  mutate(pr_est = 100*est/mean(est), pr_upr = 100*upr/mean(est), 
         pr_lwr = 100*lwr/mean(est)) %>% 
  ggplot(aes(year_effect, pr_est, color = type)) +
  geom_hline(yintercept = 100, color = "grey") +
  geom_point(position = position_dodge(width = 0.1)) +
  expand_limits(y = 106) +
  labs(y = "Percentage of mean estimate", color = "Index method", x = "Year effect") +
  theme_classic()

cowplot::plot_grid(pl, pp1, pp2, ncol = 1, labels = "auto")
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf

^{Created on 2024-03-15 with reprex v2.1.0}

0 replies

seananderson · 2024-03-15T23:06:06Z

seananderson
Mar 15, 2024
Maintainer

A few quick answers:

My high-resolution model does not converge when using year as both fixed and spatiotemporal effect.

That could be. That's not a general feature though. In fact, having both is probably the most common index standardization configuration. The pcod_2011 dataset is fairly small with a coarse mesh. Here's an example with the full pcod data frame:

> library(sdmTMB)
> m2 <- sdmTMB(
+     density ~ 0 + factor(year),
+     data = pcod,
+     time = "year",
+     spatiotemporal = "iid",
+     mesh = make_mesh(pcod, c("X", "Y"), cutoff = 10),
+     family = delta_gamma()
+ )
> sanity(m2)
✔ Non-linear minimizer suggests successful convergence
✔ Hessian matrix is positive definite
✔ No extreme or very small eigenvalues detected
✔ No gradients with respect to fixed effects are >= 0.001
✔ No fixed-effect standard errors are NA
✔ No standard errors look unreasonably large
✔ No sigma parameters are < 0.01
✔ No sigma parameters are > 100
✔ Range parameters don't look unreasonably large

get_index(bias_correct = FALSE) sums up the prediction grid by year (but I am not sure how it calculates the CIs, hence excluded from sum predict). The values seem to be consequently lower than with get_index(bias_correct = TRUE). In the example, the difference is on average 10%, which is quite a lot for a survey index

get_index() uses the generalized delta method to calculate the standard error irrespective of the bias correction argument. The bias correction argument implements the generic 'epsilon' bias correction from this paper to adjust the mean (likely/always upwards in this case) to account for the exp() transformation on the random effects. The implementation costs a call to TMB::MakeADFun() and a gradient evaluation.

Unfortunately bias correction still leads to a crash in my high-resolution models on a server with 503G ram and 71 cores. As far as I understand, splitting by year is not an option, because I have gear type, season and depth as covariates in the model. These would get estimated differently for different years.

You would fit with all your data, but in my example above you would predict and integrate over area with get_index() time slice by time slice (or for some subset of time slices). That should be fine here. My implementation could be slightly more efficient (it includes fake missing years, which get bias corrected), but this should still use far less memory than predicting on the grid for every year at the same time. Also, be sure you're using the latest GitHub version, which I recently made more efficient by changing the internal argument on MakeADFun.

I haven't carefully worked through all the code above, but I don't think you would want to add the confidence intervals. You would have to add the variances/covariances from the covariance matrix to combine the uncertainty and get back to the standard error (or add them into the template and call ADREPORT on them or sample from them and add them or maybe even use tmbprofile to combine them, but that might be slow here).

1 reply

MikkoVihtakari Mar 16, 2024
Author

I think I use the new version. Actually, the model does not crash any longer. Using 5 km grid size, get_index() with bias correction returns only NaN's for all values. Without bias correction, the function returns values as expected. With a 10 km grid size also bias correction returns values. I can compare the 5 and 10 km indices without bias correction and if there are no large differences, I can use 10 km grid with bias correction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Marginal survey indices? #314

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Marginal survey indices? #314

MikkoVihtakari Mar 6, 2024

Replies: 3 comments · 4 replies

seananderson Mar 8, 2024 Maintainer

MikkoVihtakari Mar 9, 2024 Author

seananderson Mar 11, 2024 Maintainer

seananderson Mar 11, 2024 Maintainer

MikkoVihtakari Mar 15, 2024 Author

seananderson Mar 15, 2024 Maintainer

MikkoVihtakari Mar 16, 2024 Author

MikkoVihtakari
Mar 6, 2024

Replies: 3 comments 4 replies

seananderson
Mar 8, 2024
Maintainer

MikkoVihtakari Mar 9, 2024
Author

seananderson Mar 11, 2024
Maintainer

seananderson Mar 11, 2024
Maintainer

MikkoVihtakari
Mar 15, 2024
Author

seananderson
Mar 15, 2024
Maintainer

MikkoVihtakari Mar 16, 2024
Author