-
Notifications
You must be signed in to change notification settings - Fork 11
/
3-hmm.Rmd
1977 lines (1657 loc) · 96.7 KB
/
3-hmm.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Hidden Markov models {#hmmcapturerecapture}
## Introduction
In this third chapter, you will learn the basics on Markov models and how to fit them to longitudinal data using NIMBLE. In real life however, individuals may go undetected and their status be unknown. You will also learn how to manipulate the extension of Markov models to hidden states, so-called hidden Markov models.
## Longitudinal data
Let's get back to our survival example, and denote $z_i$ the state of individual $i$ with $z_i = 1$ if alive and $z_i = 0$ if dead. We have a total of $z = \displaystyle{\sum_{i=1}^{n}{z_i}}$ survivors out of $n$ released animals with winter survival probability $\phi$. Our model so far is a combination of a binomial likelihood and a Beta prior with parameters 1 and 1, which is also a uniform distribution between 0 and 1. It can be written as:
\begin{align*}
z &\sim \text{Binomial}(n, \phi) &\text{[likelihood]}
\\
\phi &\sim \text{Beta}(1, 1) &\text{[prior for }\phi \text{]} \\
\end{align*}
Because the binomial distribution is just a sum of independent Bernoulli outcomes, you can rewrite this model as:
\begin{align*}
z_i &\sim \text{Bernoulli}(\phi), \; i = 1, \ldots, N &\text{[likelihood]}
\\
\phi &\sim \text{Beta}(1, 1) &\text{[prior for }\phi \text{]} \\
\end{align*}
It is like flipping a coin for each individual and get a survivor with probability $\phi$.
In this set up, we consider a single winter. But for many species, we need to collect data on the long term to get a representative estimate of survival. Therefore what if we had say $T = 5$ winters?
Let us denote $z_{i,t} = 1$ if individual $i$ alive at winter $t$, and $z_{i,t} = 2$ if dead. Then longitudinal data look like in the table below. Each row is an individual $i$, and columns are for winters $t$, or sampling occasions. Variable $z$ is indexed by both $i$ and $t$, and takes value 1 if individual $i$ is alive in winter $t$, and 2 otherwise.
```{r echo = FALSE}
# 1 = alive, 2 = dead
nind <- 57
nocc <- 5
phi <- 0.8 # survival probability
delta <- c(1,0) # (Pr(alive at t = 1), Pr(dead at t = 1))
Gamma <- matrix(NA, 2, 2) # transition matrix
Gamma[1,1] <- phi # Pr(alive t -> alive t+1)
Gamma[1,2] <- 1 - phi # Pr(alive t -> dead t+1)
Gamma[2,1] <- 0 # Pr(dead t -> alive t+1)
Gamma[2,2] <- 1 # Pr(dead t -> dead t+1)
z <- matrix(NA, nrow = nind, ncol = nocc)
set.seed(2022)
for (i in 1:nind){
z[i,1] <- nimble::rcat(n = 1, prob = delta) # 1 for sure
for (t in 2:nocc){
z[i,t] <- nimble::rcat(n = 1, prob = Gamma[z[i,t-1],1:2])
}
}
colnames(z) <- paste0("winter ", 1:nocc)
z %>%
as_tibble() %>%
add_column(id = 1:nind, .before = "winter 1") #%>%
# kableExtra::kable() %>%
# kableExtra::scroll_box(width = "100%", height = "400px")
# kableExtra::kable_styling(font_size = 8,
# latex_options = "scale_down")
```
## A Markov model for longitudinal data
Let's think of a model for these data. The objective remains the same, estimating survival. To build this model, we'll make assumptions, go through its components and write down its likelihood. Note that we have already encountered Markov models in Section \@ref(markovmodelmcmc).
### Assumptions
First, we assume that the state of an animal in a given winter, alive or dead, is only dependent on its state the winter before. In other words, the future depends only on the present, not the past. This is a Markov process.
Second, if an animal is alive in a given winter, the probability it survives to the next winter is $\phi$. The probability it dies is $1 - \phi$.
Third, if an animal is dead a winter, it remains dead, unless you believe in zombies.
Our Markov process can be represented this way:
```{r, engine = 'tikz', echo = FALSE}
\usetikzlibrary{arrows, fit, positioning, automata}
\begin{tikzpicture}[node distance = 2cm]
\tikzset{state/.style = {circle, draw, minimum size = 30pt, scale = 3, line width=1pt}}
\node [state,fill=lightgray!75] (6) [] {$z_{i,3}$};
\node [state,fill=lightgray!75] (5) [left = 20mm of 6] {$z_{i,2}$};
\node [state,fill=lightgray!75] (4) [left = 20mm of 5] {$z_{i,1}$};
\node [state,fill=lightgray!75] (7) [right = 20mm of 6] {$z_{i,4}$};
\node [state,fill=lightgray!75] (8) [right = 20mm of 7] {$z_{i,5}$};
\draw[->,black, line width=0.25mm,-latex] (4) to (5);
\draw[->,black, line width=0.25mm,-latex] (5) to (6);
\draw[->,black, line width=0.25mm,-latex] (6) to (7);
\draw[->,black, line width=0.25mm,-latex] (7) to (8);
\end{tikzpicture}
```
An example of this Markov process is, for example:
```{r, engine = 'tikz', echo = FALSE}
\usetikzlibrary{arrows, fit, positioning, automata}
\begin{tikzpicture}[node distance = 2cm]
\tikzset{state/.style = {circle, draw, minimum size = 30pt, scale = 3, line width=1pt}}
\node [state,fill=lightgray!75] (6) [] {$1$};
\node [state,fill=lightgray!75] (5) [left = 20mm of 6] {$1$};
\node [state,fill=lightgray!75] (4) [left = 20mm of 5] {$1$};
\node [state,fill=lightgray!75] (7) [right = 20mm of 6] {$2$};
\node [state,fill=lightgray!75] (8) [right = 20mm of 7] {$2$};
\draw[->,black, line width=0.25mm,-latex] (4) -- node[above=3mm, align=center] {\huge $\phi$} (5);
\draw[->,black, line width=0.25mm,-latex] (5) -- node[above=3mm, align=center] {\huge $\phi$} (6);
\draw[->,black, line width=0.25mm,-latex] (6) -- node[above=3mm, align=center] {\huge $1 - \phi$} (7);
\draw[->,black, line width=0.25mm,-latex] (7) -- node[above=3mm, align=center] {\huge 1} (8);
\end{tikzpicture}
```
Here the animal remains alive over the first two time intervals $(z_{i,1} = z_{i,2} = z_{i,3} = 1)$ with probability $\phi$ until it dies over the fourth time interval $(z_{i,4} = 2)$ with probability $1-\phi$ then remains dead from then onwards $(z_{i,5} = 2)$ with probability 1.
### Transition matrix
You might have figured it out already (if not, not a problem), the core of our Markov process is made of transition probabilities between states alive and dead. For example, the probability of transitioning from state alive at $t-1$ to state alive at $t$ is $\Pr(z_{i,t} = 1 | z_{i,t-1} = 1) = \gamma_{1,1}$. It is the survival probability $\phi$. The probability of dying over the interval $(t-1, t)$ is $\Pr(z_{i,t} = 2 | z_{i,t-1} = 1) = \gamma_{1,2} = 1 - \phi$. Now if an animal is dead at $t-1$, then $\Pr(z_t = 1 | z_{t-1} = 2) = 0$ and $\Pr(z_{i,t} = 2 | z_{i,t-1} = 2) = 1$.
We can gather these probabilities of transition between states from one occasion to the next in a matrix, say $\mathbf{\Gamma}$, which we will call the transition matrix:
\begin{align*}
\mathbf{\Gamma} =
\left(\begin{array}{cc}
\gamma_{1,1} & \gamma_{1,2}\\
\gamma_{2,1} & \gamma_{2,2}
\end{array}\right) =
\left(\begin{array}{cc}
\phi & 1 - \phi\\
0 & 1
\end{array}\right)
\end{align*}
To try and remember that the states at $t-1$ are in rows, and the states at $t$ are in columns, I will often write:
$$\begin{matrix}
& \\
\mathbf{\Gamma} =
\left ( \vphantom{ \begin{matrix} 12 \\ 12 \end{matrix} } \right .
\end{matrix}
\hspace{-1.2em}
\begin{matrix}
z_t=1 & z_t=2 \\ \hdashline
\phi & 1-\phi \\
0 & 1
\end{matrix}
\hspace{-0.2em}
\begin{matrix}
& \\
\left . \vphantom{ \begin{matrix} 12 \\ 12 \end{matrix} } \right )
\begin{matrix}
z_{t-1}=1 \; \mbox{(alive)} \\ z_{t-1}=2 \; \mbox{(dead)}
\end{matrix}
\end{matrix}$$
Take the time you need to navigate through this matrix, and get familiar with it. For example, you may start alive at $t$ (first row) then end up dead at $t+1$ (first column) with probability $1-\phi$.
### Initial states
A Markov process has to start somewhere. We need the probabilities of initial states, i.e. the states of an individual at $t = 1$. We will gather the probability of being in each state (alive or 1 and dead or 2) in the first winter in a vector. We will use $\mathbf{\delta} = \left(\Pr(z_{i,1} = 1), \Pr(z_{i,1} = 2)\right)$. For simplicity, we will assume that all individuals are marked and released in the first winter, hence alive when first captured, which means that they are all in state alive or 1 for sure. Therefore we have $\mathbf{\delta} = \left(1, 0\right)$.
### Likelihood
Now that we have built a Markov model, we need its likelihood to apply the Bayes theorem. The likelihood is the probability of the data, given the model. Here the data are the $z$, therefore we need $\Pr(\mathbf{z}) = \Pr(z_1, z_2, \ldots, z_{T-2}, z_{T-1}, z_T)$.
We're gonna work backward, starting from the last sampling occasion. Using conditional probabilities, the likelihood can be written as the product of the probability of $z_T$ i.e. you're alive or not on the last occasion given your past history, that is the states at previous occasions, times the probability of your past history:
\begin{align*}
\Pr(\mathbf{z}) &= \Pr(z_T, z_{T-1}, z_{T-2}, \ldots, z_1) \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)}\\
&= \color{blue}{\Pr(z_T | z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-1}, z_{T-2},\ldots, z_1)} \\
\end{align*}
Then because we have a Markov model, we're memory less, that is the probabilty of next state, here $z_T$, depends only on the current state, that is $z_{T-1}$, and not the previous states:
\begin{align*}
\Pr(\mathbf{z}) &= \Pr(z_T, z_{T-1}, z_{T-2}, \ldots, z_1) \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)}\\
&= \Pr(z_T | z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\
&= \color{blue}{\Pr(z_T | z_{T-1})} \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\
\end{align*}
You can apply the same reasoning to $T-1$. First use conditional probabilities:
\begin{align*}
\Pr(\mathbf{z}) &= \Pr(z_T, z_{T-1}, z_{T-2}, \ldots, z_1) \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)}\\
&= \Pr(z_T | z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\
&= \Pr(z_T | z_{T-1}) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\
&= \Pr(z_T | z_{T-1}) \color{blue}{\Pr(z_{T-1} | z_{T-2}, \ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)}\\
\end{align*}
Then apply the Markovian property:
\begin{align*}
\Pr(\mathbf{z}) &= \Pr(z_T, z_{T-1}, z_{T-2}, \ldots, z_1) \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)}\\
&= \Pr(z_T | z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\
&= \Pr(z_T | z_{T-1}) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\
&= \Pr(z_T | z_{T-1}) \Pr(z_{T-1} | z_{T-2}, \ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)\\
&= \Pr(z_T | z_{T-1}) \color{blue}{\Pr(z_{T-1} | z_{T-2})} \Pr(z_{T-2}, \ldots, z_1)\\
\end{align*}
And so on up to $z_2$. You end up with this expression for the likelihood:
\begin{align*}
\Pr(\mathbf{z}) &= \Pr(z_T, z_{T-1}, z_{T-2}, \ldots, z_1) \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)}\\
&= \Pr(z_T | z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\
&= \Pr(z_T | z_{T-1}) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\
&= \Pr(z_T | z_{T-1}) \Pr(z_{T-1} | z_{T-2}, \ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)\\
&= \Pr(z_T | z_{T-1}) \Pr(z_{T-1} | z_{T-2}) \Pr(z_{T-2}, \ldots, z_1)\\
&= \ldots \\
&= \color{blue}{\Pr(z_T | z_{T-1}) \Pr(z_{T-1} | z_{T-2}) \ldots \Pr(z_{2} | z_{1}) \Pr(z_{1})}\\
\end{align*}
This is a product of conditional probabilities of states given previous states, and the probability of initial states $\Pr(z_1)$. Using a more compact notation for the product of conditional probabilities, we get:
\begin{align*}
\Pr(\mathbf{z}) &= \Pr(z_T, z_{T-1}, z_{T-2}, \ldots, z_1) \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)}\\
&= \Pr(z_T | z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\
&= \Pr(z_T | z_{T-1}) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\
&= \Pr(z_T | z_{T-1}) \Pr(z_{T-1} | z_{T-2}, \ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)\\
&= \Pr(z_T | z_{T-1}) \Pr(z_{T-1} | z_{T-2}) \Pr(z_{T-2}, \ldots, z_1)\\
&= \ldots \\
&= \Pr(z_T | z_{T-1}) \Pr(z_{T-1} | z_{T-2}) \ldots \Pr(z_{2} | z_{1}) \Pr(z_{1})\\
&= \color{blue}{\Pr(z_{1}) \prod_{t=2}^T{\Pr(z_{t} | z_{t-1})}}\\
\end{align*}
In the product, you can recognize the transition parameters $\gamma$ we defined above, so that the likelihood of a Markov model can be written as:
\begin{align*}
\Pr(\mathbf{z}) &= \Pr(z_T, z_{T-1}, z_{T-2}, \ldots, z_1) \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)}\\
&= \Pr(z_{1}) \prod_{t=2}^T{\gamma_{z_{t-1},z_{t}}}\\
\end{align*}
<!-- ## Matrix formulation of the likelihood -->
<!-- \begin{align*} -->
<!-- \Pr(\mathbf{z}) &= \Pr(z_T, z_{T-1}, z_{T-2}, \ldots, z_1) \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)}\\ -->
<!-- &= \Pr(z_{1}) \prod_{t=2}^T{\gamma_{z_{t-1},z_{t}}}\\ -->
<!-- &= \mathbf{\delta} \; \mathbf{\Gamma} \cdots \mathbf{\Gamma} -->
<!-- \end{align*} -->
### Example
I realise these calculations are a bit difficult to follow. Let's take an example to fix ideas. Let's assume an animal is alive, alive at time 2 then dies at time 3. We have $\mathbf{z} = (1, 1, 2)$. What is the contribution of this animal to the likelihood? Let's apply the formula we just derived:
\begin{align*}
\Pr(\mathbf{z} = (1, 1, 2)) &= \Pr(z_1 = 1) \; \gamma_{z_{1} = 1, z_{2} = 1} \; \gamma_{z_{2} = 1, z_{3} = 2}\\
&= 1 \; \phi \; (1 - \phi).
\end{align*}
The probability of having the sequence alive, alive and dead is the probability of being alive first, then to stay alive, eventually to die. The probability of being alive at first occasion being 1, we have that the contribution of this individual to the likelihood is $\phi (1 - \phi)$.
## Bayesian formulation
Before implementing this model in NIMBLE, we provide a Bayesian formulation of our model. We first note that the likelihood is a product of conditional probabilities of binary events (alive or dead). Usually binary events are associated with the Bernoulli distribution. Here however, we will use its extension to several outcomes (from a coin with two sides to a dice with more than two faces) known as the categorical distribution. The categorical distribution is a multinomial distribution with a single draw. To get a better idea of how the categorical distribution works, let's simulate from it with the `rcat()` function. Consider for example a random value drawn from a categorical distribution with probability 0.1, 0.3 and 0.6. Think of a dice with three faces, face 1 has probability 0.1 of occurring, face 2 probability 0.3 and face 3 has probability 0.6, the sum of these probabilities being 1. We expect to get a 3 more often than a 2 and rarely a 1:
```{r}
rcat(n = 1, prob = c(0.1, 0.3, 0.6))
```
Alternatively, you can use the `sample()` function and `sample(x = 1:3, size = 1, replace = FALSE, prob = c(0.1, 0.3, 0.6))`. Here is another example in which we sample 20 times in a categorical distribution with probabilities 0.1, 0.1, 0.4, 0.2 and 0.2, hence a dice with 5 faces:
```{r}
rcat(n = 20, prob = c(0.1, 0.1, 0.4, 0.2, 0.2))
```
In this chapter, you will familiarise yourself with the categorical distribution in binary situations, which should make the transition to more states than just alive and dead smoother in the next chapters.
Initial state is a categorical random variable with probability $\delta$. That is you have a dice with two faces, or a coin, and you have some probability to be alive, and one minus that probability to be dead. Of course, it you want your Markov chain to start, you'd better say it's alive so that $\delta$ is just $(1,0)$:
\begin{align*}
z_1 &\sim \text{Categorical}(\delta) &\text{[likelihood, }t = 1 \text{]}\\
\end{align*}
Now the main part is the dynamic of the states. The state $z_t$ at $t$ depends only on the known state $z_{t-1}$ at $t-1$, and is a categorical random variable which probabilities are given by row $z_{t-1}$ of the transition matrix $\mathbf{\Gamma} = \gamma_{z_{t-1},z_{t}}$:
\begin{align*}
z_1 &\sim \text{Categorical}(\delta) &\text{[likelihood, }t = 1 \text{]}\\
z_t | z_{t-1} &\sim \text{Categorical}(\gamma_{z_{t-1},z_{t}}) &\text{[likelihood, }t > 1 \text{]}\\
\end{align*}
For example, if individual $i$ is alive over $(t-1,t)$ i.e. $z_{t-1} = 1$, we need the first row in $\mathbf{\Gamma}$,
\begin{align*}
\mathbf{\Gamma} =
\left(\begin{array}{cc}
\color{blue}{\phi} & \color{blue}{1 - \phi}\\
0 & 1
\end{array}\right)
\end{align*}
that is $\color{blue}{\gamma_{z_{t-1} = 1,z_{t}} = (\phi, 1-\phi)}$ and $z_t | z_{t-1} = 1 \sim \text{Categorical}((\phi, 1-\phi))$.
Otherwise, if individual $i$ dies over $(t-1,t)$ i.e. $z_{t-1} = 2$, we need the second row in $\mathbf{\Gamma}$:
\begin{align*}
\mathbf{\Gamma} =
\left(\begin{array}{cc}
\phi & 1 - \phi\\
\color{blue}{0} & \color{blue}{1}
\end{array}\right)
\end{align*}
that is $\color{blue}{\gamma_{z_{t-1} = 2,z_{t}} = (0, 1)}$ and $z_t | z_{t-1} = 2 \sim \text{Categorical}((0, 1))$ (if the individual is dead, it remains dead with probability 1).
We also need a prior on survival. Without surprise, we will use a uniform distribution between 0 and 1, which is also a Beta distribution with parameters 1 and 1. Overall our model is:
\begin{align*}
z_1 &\sim \text{Categorical}(\delta) &\text{[likelihood, }t = 1 \text{]}\\
z_t | z_{t-1} &\sim \text{Categorical}(\gamma_{z_{t-1},z_{t}}) &\text{[likelihood, }t > 1 \text{]}\\
\phi &\sim \text{Beta}(1, 1) &\text{[prior for }\phi \text{]} \\
\end{align*}
## NIMBLE implementation
How to implement in NIMBLE the Markov model we just built? We need to put in place a few bricks before running our model. Let's start with the prior on survival, the vector of initial state probabilities and the transition matrix:
```{r eval = FALSE}
markov.survival <- nimbleCode({
phi ~ dunif(0, 1) # prior
delta[1] <- 1 # Pr(alive t = 1) = 1
delta[2] <- 0 # Pr(dead t = 1) = 0
gamma[1,1] <- phi # Pr(alive t -> alive t+1)
gamma[1,2] <- 1 - phi # Pr(alive t -> dead t+1)
gamma[2,1] <- 0 # Pr(dead t -> alive t+1)
gamma[2,2] <- 1 # Pr(dead t -> dead t+1)
...
```
Alternatively, you can define vectors and matrices in NIMBLE like you would do it in R. You can write:
```{r eval = FALSE}
markov.survival <- nimbleCode({
phi ~ dunif(0, 1) # prior
delta[1:2] <- c(1, 0) # vector of initial state probabilities
gamma[1:2,1:2] <- matrix( c(phi, 0, 1 - phi, 1), nrow = 2) # transition matrix
...
```
Now there are two important dimensions to our model, along which we need to repeat tasks, namely individual and time. As for time, we describe the successive events of survival using the categorical distribution `dcat()`, say for individual $i$:
```{r eval = FALSE}
z[i,1] ~ dcat(delta[1:2]) # t = 1
z[i,2] ~ dcat(gamma[z[i,1], 1:2]) # t = 2
z[i,3] ~ dcat(gamma[z[i,2], 1:2]) # t = 3
...
z[i,T] ~ dcat(gamma[z[i,T-1], 1:2]) # t = T
```
There is a more efficient way to write this piece of code by using a for loop, that is a sequence of instructions that we repeat. Here, we condense the previous code into:
```{r eval = FALSE}
z[i,1] ~ dcat(delta[1:2]) # t = 1
for (t in 2:T){ # loop over time t
z[i,t] ~ dcat(gamma[z[i,t-1], 1:2]) # t = 2,...,T
}
```
Now we just need to do the same for all individuals. We use another loop:
```{r eval = FALSE}
for (i in 1:N){ # loop over individual i
z[i,1] ~ dcat(delta[1:2]) # t = 1
for (j in 2:T){ # loop over time t
z[i,j] ~ dcat(gamma[z[i,j-1], 1:2]) # t = 2,...,T
} # t
} # i
```
Puting everything together, the NIMBLE code for our Markov model is:
```{r}
markov.survival <- nimbleCode({
phi ~ dunif(0, 1) # prior
delta[1] <- 1 # Pr(alive t = 1) = 1
delta[2] <- 0 # Pr(dead t = 1) = 0
gamma[1,1] <- phi # Pr(alive t -> alive t+1)
gamma[1,2] <- 1 - phi # Pr(alive t -> dead t+1)
gamma[2,1] <- 0 # Pr(dead t -> alive t+1)
gamma[2,2] <- 1 # Pr(dead t -> dead t+1)
# likelihood
for (i in 1:N){ # loop over individual i
z[i,1] ~ dcat(delta[1:2]) # t = 1
for (j in 2:T){ # loop over time t
z[i,j] ~ dcat(gamma[z[i,j-1], 1:2]) # t = 2,...,T
} # t
} # i
})
```
Note that in this example, $\delta$ is used as a placeholder for more complex models we will build in chapters to come. Here, you could simply write `z[i,1] <- 1`.
Now we're ready to resume our NIMBLE workflow. First we read in data. Because we have loops and indices that do not change, we use constants as explained in Section \@ref(start-nimble):
```{r}
my.constants <- list(N = 57, T = 5)
my.data <- list(z = z)
```
We also specify initial values for survival with a function:
```{r}
initial.values <- function() list(phi = runif(1,0,1))
initial.values()
```
There is a single parameter to monitor:
```{r}
parameters.to.save <- c("phi")
parameters.to.save
```
We run 2 chains with 5000 iterations including 1000 iterations as burnin:
```{r}
n.iter <- 5000
n.burnin <- 1000
n.chains <- 2
```
Let's run NIMBLE:
```{r, eval=FALSE}
mcmc.output <- nimbleMCMC(code = markov.survival,
constants = my.constants,
data = my.data,
inits = initial.values,
monitors = parameters.to.save,
niter = n.iter,
nburnin = n.burnin,
nchains = n.chains)
```
```{r, message=FALSE, echo = FALSE, cache = F}
mcmc.output <- nimbleMCMC(code = markov.survival,
constants = my.constants,
data = my.data,
inits = initial.values,
monitors = parameters.to.save,
niter = n.iter,
nburnin = n.burnin,
nchains = n.chains,
progressBar = FALSE)
```
Let's calculate the usual posterior numerical summaries for survival:
```{r}
MCMCsummary(mcmc.output, round = 2)
```
Posterior mean and median are close to $0.8$. This is fortunate since the data was simulated with (actual) survival $\phi = 0.8$. The code I used was:
```{r echo = TRUE, eval = TRUE}
# 1 = alive, 2 = dead
nind <- 57
nocc <- 5
phi <- 0.8 # survival probability
delta <- c(1,0) # (Pr(alive at t = 1), Pr(dead at t = 1))
Gamma <- matrix(NA, 2, 2) # transition matrix
Gamma[1,1] <- phi # Pr(alive t -> alive t+1)
Gamma[1,2] <- 1 - phi # Pr(alive t -> dead t+1)
Gamma[2,1] <- 0 # Pr(dead t -> alive t+1)
Gamma[2,2] <- 1 # Pr(dead t -> dead t+1)
z <- matrix(NA, nrow = nind, ncol = nocc)
set.seed(2022)
for (i in 1:nind){
z[i,1] <- rcat(n = 1, prob = delta) # 1 for sure
for (t in 2:nocc){
z[i,t] <- rcat(n = 1, prob = Gamma[z[i,t-1],1:2])
}
}
head(z)
```
We could replace `dcat()` by `dbern()` everywhere in the code because we have binary events alive/dead. Would it make any difference? Although `dcat()` uses less efficient samplers than `dbern()`, `dcat()` is convenient for model building to accomodate more than two outcomes, a feature that will become handy in the next chapters.
## Hidden Markov models
### Capture-recapture data {#capturerecapturedata}
:::: {.blackbox data-latex=""}
Unfortunately, the data with alive and dead states is the data we wish we had. In real life, animals cannot be monitored exhaustively, like humans in a medical trial. This is why we use capture-recapture protocols, in which animals are captured, individually marked, and released alive. Then, these animals may be detected again, or go undetected. Whenever animals go undetected, it might be that they were alive but missed, or because they were dead and therefore could not be detected. This issue is usually referred to as that of imperfect detection. As a consequence of imperfect detection, the Markov process for survival is only partially observed: You know an animal is alive when you detect it, but when an animal goes undetected, whether it is alive or dead is unknown to you. This is where hidden Markov models (HMMs) come in.
::::
Let's get back to the data we had in the previous section. The truth is in $z$ which contains the fate of all individuals with $z = 1$ for alive, and $z = 2$ for dead:
```{r echo = FALSE}
colnames(z) <- paste0("winter ", 1:nocc)
z %>%
as_tibble() %>%
add_column(id = 1:nind, .before = "winter 1")
```
Unfortunately, we have only partial access to $z$. What we do observe is $y$ the detections and non-detections. How are $z$ and $y$ connected?
The easiest connection is with dead animals which go undetected for sure. Therefore when an animal is dead i.e. $z = 2$, it cannot be detected, therefore $y = 0$:
```{r echo = FALSE}
z %>%
as_tibble() %>%
replace(. == 2, 0) %>%
add_column(id = 1:nind, .before = "winter 1")
```
Now alive animals may be detected or not. If an animal is alive $z = 1$, it is detected $y = 1$ with probability $p$ or not $y = 0$ with probability $1-p$. In our example, first detection coincides with first winter for all individuals.
```{r echo = FALSE}
set.seed(2022)
nocc <- 5
nind <- 57
p <- 0.6
phi <- 0.8
delta <- c(1,0)
Gamma <- matrix(NA, 2, 2)
Omega <- matrix(NA, 2, 2)
Gamma[1,1] <- phi # Pr(alive t -> alive t+1)
Gamma[1,2] <- 1 - phi # Pr(alive t -> dead t+1)
Gamma[2,1] <- 0 # Pr(dead t -> alive t+1)
Gamma[2,2] <- 1 # Pr(dead t -> dead t+1)
Omega[1,1] <- 1 - p # Pr(alive t -> non-detected t)
Omega[1,2] <- p # Pr(alive t -> detected t)
Omega[2,1] <- 1 # Pr(dead t -> non-detected t)
Omega[2,2] <- 0 # Pr(dead t -> detected t)
z <- matrix(NA, nrow = nind, ncol = nocc)
y <- z
y[,1] <- 2 # all animals detected in first winter
for (i in 1:nind){
z[i,1] <- nimble::rcat(n = 1, prob = delta) # 1 for sure
for (t in 2:nocc){
z[i,t] <- nimble::rcat(n = 1, prob = Gamma[z[i,t-1],1:2])
y[i,t] <- nimble::rcat(n = 1, prob = Omega[z[i,t],1:2])
}
}
y <- y - 1 # non-detection = 0, detection = 1
colnames(y) <- paste0("winter ", 1:nocc)
nobs <- sum(apply(y,1,sum) != 0)
y <- y[apply(y,1,sum) !=0, ] # remove rows w/ non-detections only
y %>%
as_tibble() %>%
add_column(id = 1:nobs, .before = "winter 1")
```
Compare with the previous $z$ table. Some 1's for alive have become 0's for non-detection, other 1's for alive have remained 1's for detection. This $y$ table is what we observe in real life. I hope I have convinced you that to make the connection between observations, the $y$, and true states, the $z$, we need to describe how observations are made (or emitted in the HMM terminology) from the states.
### Observation matrix
The novelty in HMMs is the link between observations and states. This link is made through observation probabilities. For example, the probability of detecting an animal $i$ at $t$ given it is alive at $t$ is $\Pr(y_{i,t}=2|z_{i,t}=1)=\omega_{1,2}$. It is the detection probability $p$. If individual $i$ is dead at $t$, then it is missed for sure, and $\Pr(y_{i,t}=1|z_{i,t}=2)=\omega_{2,1}=1$.
We can gather these observation probabilities into an observation matrix $\mathbf{\Omega}$. In rows we have the states alive $z = 1$ and dead $z = 2$, while in columns we have the observations non-detected $y = 1$ and detected $y = 2$ (previously coded 0 and 1 respectively):
\begin{align*}
\mathbf{\Omega} =
\left(\begin{array}{cc}
\omega_{1,1} & \omega_{1,2}\\
\omega_{2,1} & \omega_{2,2}
\end{array}\right) =
\left(\begin{array}{cc}
1 - p & p\\
1 & 0
\end{array}\right)
\end{align*}
In survival models we will use throughout this book, we condition the fate of individuals on first detection, which boils down to set the corresponding detection probability to 1.
The observation matrix is:
$$\begin{matrix}
& \\
\mathbf{\Omega} =
\left ( \vphantom{ \begin{matrix} 12 \\ 12 \end{matrix} } \right .
\end{matrix}
\hspace{-1.2em}
\begin{matrix}
y_t=1 & y_t=2 \\
\mbox{(non-detected)} & \mbox{(detected)} \\ \hdashline
1 - p & p\\
1 & 0\\
\end{matrix}
\hspace{-0.2em}
\begin{matrix}
& \\
\left . \vphantom{ \begin{matrix} 12 \\ 12 \end{matrix} } \right )
\begin{matrix}
z_{t}=1 \; \mbox{(alive)}\\ z_{t}=2 \; \mbox{(dead)}
\end{matrix}
\end{matrix}$$
### Hidden Markov model
Our hidden Markov model can be represented this way:
```{r, engine = 'tikz', echo = FALSE}
\usetikzlibrary{arrows, fit, positioning, automata}
\begin{tikzpicture}[node distance = 2cm]
\tikzset{state/.style = {circle, draw, minimum size = 30pt, scale = 3, line width=1pt}}
\node [state,fill=lightgray!75] (6) [] {$z_{3}$};
\node [state,fill=lightgray!75] (5) [left = 20mm of 6] {$z_{2}$};
\node [state,fill=lightgray!75] (4) [left = 20mm of 5] {$z_{1} = 1$};
\node [state,fill=lightgray!75] (7) [right = 20mm of 6] {$z_{4}$};
\node [state,fill=lightgray!75] (8) [right = 20mm of 7] {$z_{5}$};
\node [state,fill=white] (16) [above = 20mm of 6] {$y_{3}$};
\node [state,fill=white] (15) [above = 20mm of 5] {$y_{2}$};
\node [state,fill=white] (14) [above = 20mm of 4] {$y_{1} = 2$};
\node [state,fill=white] (17) [above = 20mm of 7] {$y_{4}$};
\node [state,fill=white] (18) [above = 20mm of 8] {$y_{5}$};
\draw[->,black, line width=0.25mm,-latex] (4) to (5);
\draw[->,black, line width=0.25mm,-latex] (5) to (6);
\draw[->,black, line width=0.25mm,-latex] (6) to (7);
\draw[->,black, line width=0.25mm,-latex] (7) to (8);
\draw[->,black, line width=0.25mm,-latex] (4) to (14);
\draw[->,black, line width=0.25mm,-latex] (5) to (15);
\draw[->,black, line width=0.25mm,-latex] (6) to (16);
\draw[->,black, line width=0.25mm,-latex] (7) to (17);
\draw[->,black, line width=0.25mm,-latex] (8) to (18);
\end{tikzpicture}
```
States $z$ are in gray. Observations $y$ are in white. All individuals are first captured in the first winter $t = 1$, and are therefore all alive $z_1 = 1$ and detected $y_1 = 2$.
:::: {.blackbox data-latex=""}
A hidden Markov model is just two time series running in parallel. One for the states with the Markovian property, and the other of for the observations generated from the states. HMM are a special case of state-space models in which latent states are discrete.
::::
Have a look to the example below, in which an individual is detected at first sampling occasion, detected again, then missed for the rest of the study. While on occasion $t=3$ that individual was alive $z_3=1$ and went undetected $y_3=1$, on occasions $t=4$ and $t=5$ it went undetected $y_4=y_5=1$ because it was dead $z_4=z_5=2$. Because we condition on first detection, the link between state and observation at $t=1$ is deterministic and $p = 1$.
```{r, engine = 'tikz', echo = FALSE}
\usetikzlibrary{arrows, fit, positioning, automata}
\begin{tikzpicture}[node distance = 2cm]
\tikzset{state/.style = {circle, draw, minimum size = 30pt, scale = 3, line width=1pt}}
\node [state,fill=lightgray!75] (6) [] {$1$};
\node [state,fill=lightgray!75] (5) [left = 20mm of 6] {$1$};
\node [state,fill=lightgray!75] (4) [left = 20mm of 5] {$1$};
\node [state,fill=lightgray!75] (7) [right = 20mm of 6] {$2$};
\node [state,fill=lightgray!75] (8) [right = 20mm of 7] {$2$};
\node [state,fill=white] (16) [above = 20mm of 6] {$1$};
\node [state,fill=white] (15) [above = 20mm of 5] {$2$};
\node [state,fill=white] (14) [above = 20mm of 4] {$2$};
\node [state,fill=white] (17) [above = 20mm of 7] {$1$};
\node [state,fill=white] (18) [above = 20mm of 8] {$1$};
\draw[->,black, line width=0.25mm,-latex] (4) -- node[above=3mm, align=center] {\huge $\phi$} (5);
\draw[->,black, line width=0.25mm,-latex] (5) -- node[above=3mm, align=center] {\huge $\phi$} (6);
\draw[->,black, line width=0.25mm,-latex] (6) -- node[above=3mm, align=center] {\huge $1 - \phi$} (7);
\draw[->,black, line width=0.25mm,-latex] (7) -- node[above=3mm, align=center] {\huge $1$} (8);
\draw[->,black, line width=0.25mm,-latex] (4) -- node[left=3mm, align=center] {\huge $1$} (14);
\draw[->,black, line width=0.25mm,-latex] (5) -- node[left=3mm, align=center] {\huge $p$} (15);
\draw[->,black, line width=0.25mm,-latex] (6) -- node[left=3mm, align=center] {\huge $1 - p$} (16);
\draw[->,black, line width=0.25mm,-latex] (7) -- node[left=3mm, align=center] {\huge $1$} (17);
\draw[->,black, line width=0.25mm,-latex] (8) -- node[left=3mm, align=center] {\huge $1$} (18);
\end{tikzpicture}
```
### Likelihood {#likelihoodhmm}
In the Bayesian framework, we usually work with the so-called complete likelihood, that is the probability of the observed data $y$ and the latent states $z$ given the parameters of our model, here the survival and detection probabilities $\phi$ and $p$. The complete likelihood for individual $i$ is:
\begin{align*}
\Pr(\mathbf{y}_i, \mathbf{z}_i) &= \Pr(y_{i,1}, y_{i,2}, \ldots, y_{i,T}, z_{i,1}, z_{i,2}, \ldots, z_{i,T})\\
\end{align*}
Using the definition of a conditional probability, we have:
\begin{align*}
\Pr(\mathbf{y}_i, \mathbf{z}_i) &= \Pr(y_{i,1}, y_{i,2}, \ldots, y_{i,T}, z_{i,1}, z_{i,2}, \ldots, z_{i,T})\\
&= \color{blue}{\Pr(y_{i,1}, y_{i,2}, \ldots, y_{i,T} | z_{i,1}, z_{i,2}, \ldots, z_{i,T}) \Pr(z_{i,1}, z_{i,2}, \ldots, z_{i,T})}\\
\end{align*}
Then by using the independence of the $y$ conditional on the $z$, and the likelihood of a Markov chain, we get that:
\begin{align*}
\Pr(\mathbf{y}_i, \mathbf{z}_i) &= \Pr(y_{i,1}, y_{i,2}, \ldots, y_{i,T}, z_{i,1}, z_{i,2}, \ldots, z_{i,T})\\
&= \Pr(y_{i,1}, y_{i,2}, \ldots, y_{i,T} | z_{i,1}, z_{i,2}, \ldots, z_{i,T}) \Pr(z_{i,1}, z_{i,2}, \ldots, z_{i,T})\\
&= \color{blue}{\left(\prod_{t=1}^T{\Pr{(y_{i,t} | z_{i,t})}}\right) \left(\Pr(z_{i,1}) \prod_{t=2}^T{\Pr{(z_{i,t} | z_{i,t-1})}}\right)}\\
\end{align*}
Finally, by recognizing the observation and transition probabilities, we have that the complete likelihood for individual $i$ is:
\begin{align*}
\Pr(\mathbf{y}_i, \mathbf{z}_i) &= \Pr(y_{i,1}, y_{i,2}, \ldots, y_{i,T}, z_{i,1}, z_{i,2}, \ldots, z_{i,T})\\
&= \Pr(y_{i,1}, y_{i,2}, \ldots, y_{i,T} | z_{i,1}, z_{i,2}, \ldots, z_{i,T}) \Pr(z_{i,1}, z_{i,2}, \ldots, z_{i,T})\\
&= \color{blue}{\left(\prod_{t=1}^T{\omega_{z_{i,t}, y_{i,t}}}\right) \left(\Pr(z_{i,1}) \prod_{t=2}^T{\gamma_{z_{i,t-1},z_{i,t}}}\right)}\\
\end{align*}
To obtain the complete likelihood of the whole dataset, we need to multiply this individual likelihood for each animal $\displaystyle{\prod_{i=1}^N{\Pr(\mathbf{y}_i,\mathbf{z}_i)}}$. When several individuals have the same contribution, calculating their individual contribution only once can greatly reduce the computational burden, as illustrated in Section \@ref(pooled-likelihood).
The Bayesian approach with MCMC methods allows treating the latent states $z_{i,t}$ as if they were parameters, and to be estimated as such. However, the likelihood is rather complex with a large number of latent states $z_{i,t}$, which comes with computational costs and slow mixing. There are situations where the latent states are the focus of ecological inference and need to be estimated (see Suggested reading below). However, if not needed, you might want to get rid of the latent states and rely on the so-called marginal likelihood. By doing so, you can avoid sampling the latent states, focus on the ecological parameters, and often speeds up computations and improves mixing as shown in Section \@ref(marginalization). Actually, you can even estimate the latent states afterwards, as illustrated in Section \@ref(decoding).
<!-- It has a matrix formulation: -->
<!-- \begin{align*} -->
<!-- \Pr(\mathbf{y}) &= \mathbf{\delta} \; \mathbf{\Omega} \; \mathbf{\Gamma} \cdots \mathbf{\Omega} \; \mathbf{\Gamma} \; \mathbf{\Omega} \; \mathbb{1} -->
<!-- \end{align*} -->
<!-- ### Example -->
<!-- Let assume an animal is detected, then missed. We have $\mathbf{y} = (2, 1)$. What is the contribution of this animal to the likelihood? -->
<!-- \begin{align*} -->
<!-- \Pr(\mathbf{y} = (2, 1)) &= \sum_{z_1 = 1}^2 \; \sum_{z_2 = 1}^2 w_{z_1, y_1 = 2} w_{z_2, y_2 = 1} \Pr(z_1) \gamma_{z_1, z_2} \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1, z_1, z_1, z_1)}\\ -->
<!-- \end{align*} -->
<!-- Let assume an animal is detected, then missed. We have $\mathbf{y} = (2, 1)$. What is the contribution of this animal to the likelihood? -->
<!-- \begin{align*} -->
<!-- \Pr(\mathbf{y} = (2, 1)) &= \sum_{z_1 = 1}^2 \; \sum_{z_2 = 1}^2 w_{z_1, y_1 = 2} w_{z_2, y_2 = 1} \Pr(z_1) \gamma_{z_1, z_2} \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1, z_1, z_1, z_1)}\\ -->
<!-- &= \sum_{z_1 = 1}^2 \left( w_{z_1, y_1 = 2} w_{z_2 = 1, y_2 = 1} \Pr(z_1) \gamma_{z_1, z_2 = 1} + w_{z_1, y_1 = 2} w_{z_2 = 2, y_2 = 1} \Pr(z_1) \gamma_{z_1, z_2 = 2} \right) \\ -->
<!-- \end{align*} -->
<!-- Let assume an animal is detected, then missed. We have $\mathbf{y} = (2, 1)$. What is the contribution of this animal to the likelihood? -->
<!-- \begin{align*} -->
<!-- \Pr(\mathbf{y} = (2, 1)) &= \sum_{z_1 = 1}^2 \; \sum_{z_2 = 1}^2 w_{z_1, y_1 = 2} w_{z_2, y_2 = 1} \Pr(z_1) \gamma_{z_1, z_2} \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1, z_1, z_1, z_1)}\\ -->
<!-- &= \sum_{z_1 = 1}^2 \left( w_{z_1, y_1 = 2} w_{z_2 = 1, y_2 = 1} \Pr(z_1) \gamma_{z_1, z_2 = 1} + w_{z_1, y_1 = 2} w_{z_2 = 2, y_2 = 1} \Pr(z_1) \gamma_{z_1, z_2 = 2} \right) \\ -->
<!-- &= w_{z_1 = 1, y_1 = 2} w_{z_2 = 1, y_2 = 1}\delta_1 \gamma_{z_1 = 1, z_2 = 1} + w_{z_1 = 1, y_1 = 2} w_{z_2 = 2, y_2 = 1} \delta_1 \gamma_{z_1 = 1, z_2 = 2} -->
<!-- \end{align*} -->
<!-- Note: $\Pr(z_1 = 1) = \delta_1 = 1$ and $\Pr(z_1 = 2) = 0$. -->
<!-- Let assume an animal is detected, then missed. We have $\mathbf{y} = (2, 1)$. What is the contribution of this animal to the likelihood? -->
<!-- \begin{align*} -->
<!-- \Pr(\mathbf{y} = (2, 1)) &= \sum_{z_1 = 1}^2 \; \sum_{z_2 = 1}^2 w_{z_1, y_1 = 2} w_{z_2, y_2 = 1} \Pr(z_1) \gamma_{z_1, z_2} \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1, z_1, z_1, z_1)}\\ -->
<!-- &= \sum_{z_1 = 1}^2 \left( w_{z_1, y_1 = 2} w_{z_2 = 1, y_2 = 1} \Pr(z_1) \gamma_{z_1, z_2 = 1} + w_{z_1, y_1 = 2} w_{z_2 = 2, y_2 = 1} \Pr(z_1) \gamma_{z_1, z_2 = 2} \right) \\ -->
<!-- &= w_{z_1 = 1, y_1 = 2} w_{z_2 = 1, y_2 = 1} \delta_1 \gamma_{z_1 = 1, z_2 = 1} + w_{z_1 = 1, y_1 = 2} w_{z_2 = 2, y_2 = 1} \delta_1 \gamma_{z_1 = 1, z_2 = 2}\\ -->
<!-- &= (1 - p) \phi + (1-\phi) -->
<!-- \end{align*} -->
<!-- Note: $w_{z_1 = 1, y_1 = 2} = \Pr(y_1 = 2 | z_1 = 1) = 1$ because we condition on first capture. -->
## Fitting HMM with NIMBLE {#fittinghmmnimble}
If we denote *first* the time of first detection, then our model so far is written as follows:
\begin{align*}
z_{\text{first}} &\sim \text{Categorical}(1, \delta) &\text{[likelihood]}\\
z_t | z_{t-1} &\sim \text{Categorical}(1, \gamma_{z_{t-1},z_{t}}) &\text{[likelihood, t>first]}\\
y_t | z_{t} &\sim \text{Categorical}(1, \omega_{z_{t}}) &\text{[likelihood, t>first]}\\
\phi &\sim \text{Beta}(1, 1) &\text{[prior for }\phi \text{]} \\
p &\sim \text{Beta}(1, 1) &\text{[prior for }p \text{]} \\
\end{align*}
It has an observation layer for the $y$'s, conditional on the $z$'s. We also consider uniform priors for the detection and survival probabilities. How to implement this model in NIMBLE?
```{r, echo=FALSE}
hmm.survival <- nimbleCode({
phi ~ dunif(0, 1) # prior survival
p ~ dunif(0, 1) # prior detection
# likelihood
gamma[1,1] <- phi # Pr(alive t -> alive t+1)
gamma[1,2] <- 1 - phi # Pr(alive t -> dead t+1)
gamma[2,1] <- 0 # Pr(dead t -> alive t+1)
gamma[2,2] <- 1 # Pr(dead t -> dead t+1)
delta[1] <- 1 # Pr(alive t = 1) = 1
delta[2] <- 0 # Pr(dead t = 1) = 0
omega[1,1] <- 1 - p # Pr(alive t -> non-detected t)
omega[1,2] <- p # Pr(alive t -> detected t)
omega[2,1] <- 1 # Pr(dead t -> non-detected t)
omega[2,2] <- 0 # Pr(dead t -> detected t)
for (i in 1:N){
z[i,1] ~ dcat(delta[1:2])
for (j in 2:T){
z[i,j] ~ dcat(gamma[z[i,j-1], 1:2])
y[i,j] ~ dcat(omega[z[i,j], 1:2])
}
}
})
```
We start with priors for survival and detection probabilities:
```{r eval=FALSE}
hmm.survival <- nimbleCode({
phi ~ dunif(0, 1) # prior survival
p ~ dunif(0, 1) # prior detection
...
```
Then we define initial states, transition and observation matrices:
```{r eval=FALSE}
...
# parameters
delta[1] <- 1 # Pr(alive t = first) = 1
delta[2] <- 0 # Pr(dead t = first) = 0
gamma[1,1] <- phi # Pr(alive t -> alive t+1)
gamma[1,2] <- 1 - phi # Pr(alive t -> dead t+1)
gamma[2,1] <- 0 # Pr(dead t -> alive t+1)
gamma[2,2] <- 1 # Pr(dead t -> dead t+1)
omega[1,1] <- 1 - p # Pr(alive t -> non-detected t)
omega[1,2] <- p # Pr(alive t -> detected t)
omega[2,1] <- 1 # Pr(dead t -> non-detected t)
omega[2,2] <- 0 # Pr(dead t -> detected t)
...
```
Then the likelihood:
```{r eval=FALSE}
...
# likelihood
for (i in 1:N){
z[i,1] ~ dcat(delta[1:2])
for (j in 2:T){
z[i,j] ~ dcat(gamma[z[i,j-1], 1:2])
y[i,j] ~ dcat(omega[z[i,j], 1:2])
}
}
})
```
The loop over time for each individual `for (j in 2:T){}` starts after the first time individuals are detected (this is time 2 for all of them here), because we work conditional on the first detection.
Overall, the code looks like:
```{r eval = FALSE}
hmm.survival <- nimbleCode({
phi ~ dunif(0, 1) # prior survival
p ~ dunif(0, 1) # prior detection
# likelihood
delta[1] <- 1 # Pr(alive t = first) = 1
delta[2] <- 0 # Pr(dead t = first) = 0
gamma[1,1] <- phi # Pr(alive t -> alive t+1)
gamma[1,2] <- 1 - phi # Pr(alive t -> dead t+1)
gamma[2,1] <- 0 # Pr(dead t -> alive t+1)
gamma[2,2] <- 1 # Pr(dead t -> dead t+1)
omega[1,1] <- 1 - p # Pr(alive t -> non-detected t)
omega[1,2] <- p # Pr(alive t -> detected t)
omega[2,1] <- 1 # Pr(dead t -> non-detected t)
omega[2,2] <- 0 # Pr(dead t -> detected t)
for (i in 1:N){
z[i,1] ~ dcat(delta[1:2])
for (j in 2:T){
z[i,j] ~ dcat(gamma[z[i,j-1], 1:2])
y[i,j] ~ dcat(omega[z[i,j], 1:2])
}
}
})
```
Now we specify the constants:
```{r}
my.constants <- list(N = nrow(y), T = 5)
my.constants
```
The data are made of 0's for non-detections and 1's for detections. To use the categorical distribution, we need to code 1's and 2's. We simply add 1 to get the correct format, that is $y = 1$ for non-detection and $y = 2$ for detection:
```{r}
my.data <- list(y = y + 1)
```
<!-- **Using 1 and 2 would make my life easier... The 0/1 coding is a convention; Using the 1/2 coding would make clear that non-detections are actual data (while the use of 0s for non-detections is sometimes confusing). Also, it might help to replace states 1 and 2 by A and D for dead and alive. Even if not mathematically convenient, I guess it would help the understanding. Do it, do it. We want non-detection first, so always 1's.** -->
Now let's write a function for the initial values:
```{r}
zinits <- y + 1 # non-detection -> alive
zinits[zinits == 2] <- 1 # dead -> alive
initial.values <- function() list(phi = runif(1,0,1),
p = runif(1,0,1),
z = zinits)
```
As initial values for the latent states, we assumed that whenever an individual was non-detected, it was alive, with with `zinits <- y + 1`, and we make sure dead individuals are alive with `zinits[zinits == 2] <- 1`.
We specify the parameters we'd like to monitor:
```{r}
parameters.to.save <- c("phi", "p")
parameters.to.save
```
We provide MCMC details:
```{r}
n.iter <- 5000
n.burnin <- 1000
n.chains <- 2
```
At last, we're ready to run NIMBLE:
```{r, message=FALSE, eval = FALSE}
start_time <- Sys.time()
mcmc.output <- nimbleMCMC(code = hmm.survival,
constants = my.constants,
data = my.data,
inits = initial.values,
monitors = parameters.to.save,
niter = n.iter,
nburnin = n.burnin,
nchains = n.chains)
end_time <- Sys.time()
end_time - start_time
```
```{r, message=FALSE, echo = FALSE}
start_time <- Sys.time()
mcmc.output <- nimbleMCMC(code = hmm.survival,
constants = my.constants,
data = my.data,
inits = initial.values,
monitors = parameters.to.save,
niter = n.iter,
nburnin = n.burnin,
nchains = n.chains,
progressBar = FALSE)
end_time <- Sys.time()
end_time - start_time
```
We can have a look to numerical summaries:
```{r}
MCMCsummary(mcmc.output, round = 2)
```
The estimates for survival and detection are close to true survival $\phi = 0.8$ and detection $p = 0.6$ with which we simulated the data. The code I used is:
```{r eval = FALSE}
set.seed(2022) # for reproducibility
nocc <- 5 # nb of winters or sampling occasions
nind <- 57 # nb of animals
p <- 0.6 # detection prob
phi <- 0.8 # survival prob
# Vector of initial states probabilities
delta <- c(1,0) # all individuals are alive in first winter
# Transition matrix
Gamma <- matrix(NA, 2, 2)
Gamma[1,1] <- phi # Pr(alive t -> alive t+1)
Gamma[1,2] <- 1 - phi # Pr(alive t -> dead t+1)
Gamma[2,1] <- 0 # Pr(dead t -> alive t+1)
Gamma[2,2] <- 1 # Pr(dead t -> dead t+1)
# Observation matrix
Omega <- matrix(NA, 2, 2)
Omega[1,1] <- 1 - p # Pr(alive t -> non-detected t)
Omega[1,2] <- p # Pr(alive t -> detected t)
Omega[2,1] <- 1 # Pr(dead t -> non-detected t)
Omega[2,2] <- 0 # Pr(dead t -> detected t)
# Matrix of states
z <- matrix(NA, nrow = nind, ncol = nocc)
y <- z
y[,1] <- 2 # all individuals are detected in first winter, as we condition on first detection
for (i in 1:nind){
z[i,1] <- rcat(n = 1, prob = delta) # 1 for sure
for (t in 2:nocc){
# state at t given state at t-1
z[i,t] <- rcat(n = 1, prob = Gamma[z[i,t-1],1:2])
# observation at t given state at t
y[i,t] <- rcat(n = 1, prob = Omega[z[i,t],1:2])
}
}
y
y <- y - 1 # non-detection = 0, detection = 1
```
## Marginalization {#marginalization}
In some situations, you will not be interested in inferring the hidden states $z_{i,t}$, so why bother estimating them? The good news is that you can get rid of the states, so that the marginal likelihood is a function of survival and detection probabilities $\phi$ and $p$ only.
### Brute-force approach
Using the formula of total probability, you get the marginal likelihood by summing over all possible states in the complete likelihood:
\begin{align*}
\Pr(\mathbf{y}_i) &= \Pr(y_{i,1}, y_{i,2}, \ldots, y_{i,T})\\
&= \sum_{\mathbf{z}_i} \Pr(\mathbf{y}_i, \mathbf{z}_i)\\
&= \sum_{z_{i,1}} \cdots \sum_{z_{i,T}} \Pr(y_{i,1}, y_{i,2}, \ldots, y_{i,T}, z_{i,1}, z_{i,2}, \ldots, z_{i,T})\\
\end{align*}
Going through the same steps as for deriving the complete likelihood, we obtain the marginal likelihood:
\begin{align*}
\Pr(\mathbf{y}_i) &= \sum_{z_{i,1}} \cdots \sum_{z_{i,T}} \left(\prod_{t=1}^T{\omega_{z_{i,t}, y_{i,t}}}\right) \left(\Pr(z_{i,1}) \prod_{t=2}^T{\gamma_{z_{i,t-1},z_{i,t}}}\right)\\
\end{align*}
Let's go through an example. Let's imagine we have $T = 3$ winters, and we'd like to write the likelihood for an individual having the encounter history detected, detected then non-detected. Remember that non-detected is coded 1 and detected is coded 2, while alive is coded 1 and dead is coded 2. We need to calculate $\Pr(y_1 = 2, y_2 = 2, y_3 = 1)$ which, according to the formula above, is given by:
\begin{align*}
\begin{split}
\Pr(y_1 = 2, y_2 = 2, y_3 = 1) &= \sum_{i=1}^{2} \sum_{j=1}^{2} \sum_{k=1}^{2} \Pr(y_1 = 2 | z_1 = i) \Pr(y_2 = 2 | z_2 = j) \Pr(y_3 = 1 | z_3 = k) \\
& \qquad \Pr(z_1=i) \Pr(z_2 = j | z_1 = i) \Pr(z_3 = k | z_2 = j)\\
\end{split}
\end{align*}
Expliciting all the sums in $\Pr(y_1 = 2, y_2 = 2, y_3 = 1)$, we get the long and ugly expression:
\begin{align*}
\begin{split}
\Pr(y_1 = 2, y_2 = 2, y_3 = 1) &= \\
& \Pr(y_1 = 2 | z_1 = 1) \Pr(y_2 = 2 | z_2 = 1) \Pr(y_3 = 1 | z_3 = 1) \times \\
& \qquad \Pr(z_1 = 1) \Pr(z_2 = 1 | z_1 = 1) \Pr(z_3 = 1 | z_2 = 1) +\\
& \Pr(y_1 = 2 | z_1 = 2) \Pr(y_2 = 2 | z_2 = 1) \Pr(y_3 = 1 | z_3 = 1) \times\\
& \qquad \Pr(z_1 = 2) \Pr(z_2 = 1 | z_1 = 2) \Pr(z_3 = 1 | z_2 = 1) +\\
& \Pr(y_1 = 2 | z_1 = 1) \Pr(y_2 = 2 | z_2 = 2) \Pr(y_3 = 1 | z_3 = 1) \times\\
& \qquad \Pr(z_1 = 1) \Pr(z_2 = 2 | z_1 = 1) \Pr(z_3 = 1 | z_2 = 2) +\\
& \Pr(y_1 = 2 | z_1 = 2) \Pr(y_2 = 2 | z_2 = 2) \Pr(y_3 = 1 | z_3 = 1) \times\\
& \qquad \Pr(z_1 = 2) \Pr(z_2 = 2 | z_1 = 2) \Pr(z_3 = 1 | z_2 = 2) +\\
& \Pr(y_1 = 2 | z_1 = 1) \Pr(y_2 = 2 | z_2 = 1) \Pr(y_3 = 1 | z_3 = 2) \times\\
& \qquad \Pr(z_1 = 1) \Pr(z_2 = 1 | z_1 = 1) \Pr(z_3 = 2 | z_2 = 1) +\\
& \Pr(y_1 = 2 | z_1 = 2) \Pr(y_2 = 2 | z_2 = 1) \Pr(y_3 = 1 | z_3 = 2) \times\\
& \qquad \Pr(z_1 = 2) \Pr(z_2 = 1 | z_1 = 2) \Pr(z_3 = 2 | z_2 = 1) +\\
& \Pr(y_1 = 2 | z_1 = 1) \Pr(y_2 = 2 | z_2 = 2) \Pr(y_3 = 1 | z_3 = 2) \times\\
& \qquad \Pr(z_1 = 1) \Pr(z_2 = 2 | z_1 = 1) \Pr(z_3 = 2 | z_2 = 2) +\\
& \Pr(y_1 = 2 | z_1 = 2) \Pr(y_2 = 2 | z_2 = 2) \Pr(y_3 = 1 | z_3 = 2) \times\\
& \qquad \Pr(z_1 = 2) \Pr(z_2 = 2 | z_1 = 2) \Pr(z_3 = 2 | z_2 = 2)\\
\end{split}
\end{align*}
You can simplify this expression by noticing that i) all individuals are alive for sure when marked and released in first winter, or $\Pr(z_1=2) = 0$ and ii) dead individuals are non-detected for sure, or $\Pr(y_t = 2|z_t = 2) = 0$, which lead to:
\begin{align*}
\begin{split}
\Pr(y_1 = 2, y_2 = 2, y_3 = 1) &= \\
& \Pr(y_1 = 2 | z_1 = 1) \Pr(y_2 = 2 | z_2 = 1) \Pr(y_3 = 1 | z_3 = 1) \times \\
& \qquad \Pr(z_1 = 1) \Pr(z_2 = 1 | z_1 = 1) \Pr(z_3 = 1 | z_2 = 1) +\\
& \Pr(y_1 = 2 | z_1 = 1) \Pr(y_2 = 2 | z_2 = 1) \Pr(y_3 = 1 | z_3 = 2) \times\\
& \qquad \Pr(z_1 = 1) \Pr(z_2 = 1 | z_1 = 1) \Pr(z_3 = 2 | z_2 = 1)\\
\end{split}
\end{align*}