-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
364 lines (260 loc) · 13.1 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# MoNAn <img src="https://raw.githubusercontent.com/stocnet/MoNAn/main/MoNAn_logo.png" align="right" width="150"/>
<!-- badges: start -->
<!-- badges: end -->
MoNAn is the software implementation of the statistical model outlined in:
Block, P., Stadtfeld, C., & Robins, G. (2022). A statistical model for the analysis of mobility
tables as weighted networks with an application to faculty hiring networks. Social Networks, 68, 264-278.
[Link to the study in Social Networks](https://doi.org/10.1016/j.socnet.2021.08.003).
[Pre-print with minor differences to the journal version](https://osf.io/preprints/socarxiv/n86rx/).
The model allows for the analysis of emergent structures (known from network analysis) in mobility tables, alongside exogenous predictors. It combines features from classical log-linear models for mobility tables and Exponential Random Graph Models (ERGMs). In the absence of emergent structures, the models reduces to the former, when ignoring characteristics of individuals, it is an ERGM for weighted networks.
Announcements about workshops etc. can be found [here](https://www.suz.uzh.ch/de/institut/professuren/block/monan_software.html).
The **MoNAn Manual** is available here on github in the manual folder or on [SocArXiv](https://osf.io/preprints/socarxiv/8q2xu/)
## Note from the developers
This is version 1.1.0 of MoNAn (Sep 2024). The update to version 1.0.0 (March 2024) included many changes, including some that mean code written for older versions will not work anymore!
Code use for version 1.0.0 has been simplified; the most important changes are:
- the cache is now hidden from the user and does not need to be create or specified in functions anymore
- there is a new way to specify the model with createEffects and addEffect
- in gofMobilityNetwork there is no need to specify "simulations" anymore
- new way to generate the process state using monanDataCreate
As of 30 Aug 2023, we have a CRAN release of the package.
Nevertheless, we cannot guarantee that the package and the documentation is
free of bugs or errors; further, you might
not be able to do what you want. In that case, or if you are unsure
please write the package
maintainer under his institutional email address.
We are currently (Sep 2024) on version 1.1.0 on github.
Version 1.1.0 will be released to CRAN in Sep 2024.
# Installation
You can install MoNAn either from CRAN or from GitHub.
The advantage of installing from github is that you will always stay up-to-date
with the latest developments, in particular, new functionality and effects.
You can install the current version of MoNAn from GitHub using:
``` r
# install.packages("remotes")
remotes::install_github("stocnet/MoNAn")
```
However, you can also get the (often slightly older) CRAN version with:
``` r
install.packages("MoNAn")
```
# Example
In this section, we outline a simple example with synthetic data stored in the MoNAn package.
```{r}
library(MoNAn)
```
## The Data
The example case uses synthetic data that represents mobility of 742 individuals between 17 organisations. The artificial data includes an edgelist, i.e., a list of origins (column 1) and destinations (column 2) of all individuals sorted by origin.
```{r}
mobilityEdgelist[1:10,]
```
The data further includes (artificial) characteristics of individuals, which we will call sex. However, using continuous covariates is also possible, although for some effects this would not be meaningful.
```{r}
indSex[1:10]
```
Characteristics of the organisations are the region in which they are located (binary, northern/southern island) and the organisations' size that represents a composite measure of the number of workers, assets, and revenue.
```{r}
orgRegion
orgSize
```
## Working with the package
### Preparing the data
First, we create MoNAn data objects from the introduced data files. A necessary part is that we define and name the nodesets, i.e., we name who ("people") is mobile between what ("organisations").
```{r}
# extract number of individuals and organisations from the mobility data
N_ind <- nrow(mobilityEdgelist)
N_org <- length(unique(as.numeric(mobilityEdgelist)))
# create monan objects
people <- monanEdges(N_ind)
organisations <- monanNodes(N_org)
transfers <- monanDependent(mobilityEdgelist,
nodes = "organisations",
edges = "people")
sameRegion <- outer(orgRegion, orgRegion, "==") * 1
sameRegion <- dyadicCovar(sameRegion, nodeSet = c("organisations", "organisations"))
region <- monadicCovar(orgRegion, nodeSet = "organisations")
size <- monadicCovar(orgSize, nodeSet = "organisations")
sex <- monadicCovar(indSex, nodeSet = "people", addSame = F, addSim = F)
```
We combine the data objects into the process state, i.e., a MoNAn object that stores all information about the data that will be used in the estimation later. We must include one outcome variable (transfers) and two nodeSets (people, organisations).
```{r}
myState <- monanDataCreate(transfers,
people,
organisations,
sameRegion,
region,
size,
sex)
# inspect the created object
myState
```
### Specifying the model
The predictors in the model are called "Effects". To specify a model, first an empty effects object
is created that only uses the state, and then effects are added one-by-one. Adding
an effect requires the effects name and any additional parameters it needs.
```{r}
# create an effects object
myEffects <- createEffects(myState)
myEffects <- addEffect(myEffects, loops)
myEffects <- addEffect(myEffects, concentration_AC, alpha = 4)
myEffects <- addEffect(myEffects, reciprocity_AC, alpha = 4)
myEffects <- addEffect(myEffects, dyadic_covariate, attribute.index = "sameRegion")
myEffects <- addEffect(myEffects, alter_covariate, attribute.index = "size")
myEffects <- addEffect(myEffects, resource_covar_to_node_covar,
attribute.index = "region",
resource.attribute.index = "sex")
myEffects <- addEffect(myEffects, loops_resource_covar, resource.attribute.index = "sex")
# There is also a simpler way using pipes (|>) and using the more intuitive
# node.attribute & edge.attribute instead of the older
# attribute.index & resource.attribute.index:
myEffects <- createEffects(myState) |>
addEffect(loops) |>
addEffect(concentration_AC, alpha = 4) |>
addEffect(reciprocity_AC, alpha = 4) |>
addEffect(dyadic_covariate, node.attribute = "sameRegion") |>
addEffect(alter_covariate, node.attribute = "size") |>
addEffect(resource_covar_to_node_covar,
node.attribute = "region",
edge.attribute = "sex") |>
addEffect(loops_resource_covar, edge.attribute = "sex")
# inspect the created object
myEffects
```
### Optional: Pre-estimation
We can run a pseudo-likelihood estimation that gives a (biased) guess of the model results. We use this to get improved initial estimates, which increases the chances of model convergence in the first run of the estimation considerably. To get pseudo-likelihood estimates, we need to use functions from other libraries to estimate a multinomial logit model (e.g., "dfidx" and "mlogit")
```{r}
# create multinomial statistics object pseudo-likelihood estimation
myStatisticsFrame <- getMultinomialStatistics(myState, myEffects)
### additional script to get pseudo-likelihood estimates, requires the dfidx and mlogit package
# library(dfidx)
# library(mlogit)
# my.mlogit.dataframe <- dfidx(myStatisticsFrame,
# shape = "long",
# choice = "choice")
#
# colnames(my.mlogit.dataframe) <- gsub(" ", "_", colnames(my.mlogit.dataframe))
#
# IVs <- (colnames(my.mlogit.dataframe)[2:(ncol(myStatisticsFrame)-2)])
#
# form <- as.formula(paste("choice ~ 1 + ", paste(IVs, collapse = " + "), "| 0"))
#
# my.mlogit.results <- mlogit(formula = eval(form), data = my.mlogit.dataframe,
# heterosc = FALSE)
#
# summary(my.mlogit.results)
#
# initEst <- my.mlogit.results$coefficients[1:length(IVs)]
```
### Specifying the estimation algorithm
The estimation algorithm requires multiple parameters that guide the simulation of the chain. Most values are generated by default from the state and the effects object.
```{r}
myAlg <- monanAlgorithmCreate(myState, myEffects, nsubN2 = 3,
multinomialProposal = FALSE)
```
### Estimation
Now we can estimate the model. The first two lines indicate the data (state), effects, and algorithm. The third line specifies the initial estimates, where the previously obtained pseudo-likelihood estimates can be used. The remaining lines define whether parallel computing should be used, how much console output is desired during the estimation, and whether the simulations from phase 3 should be stored.
Running the model takes a while (up to 10 minutes for this data with parallel computing).
```{r, eval=FALSE}
myResDN <- monanEstimate(
myState, myEffects,
myAlg,
initialParameters = NULL,
parallel = TRUE, cpus = 4,
verbose = TRUE,
returnDeps = TRUE,
fish = FALSE
)
```
In case pseudo-likelihood estimates have been obtained previously, this can be specified by
```{r, eval=FALSE}
myResDN <- monanEstimate(
myState, myEffects, myAlg,
initialParameters = initEst,
parallel = TRUE, cpus = 4,
verbose = TRUE,
returnDeps = TRUE,
fish = FALSE
)
```
Check convergence to see whether the results are reliable. In case the maximum convergence ratio is above 0.1 (or 0.2 for less precise estimates), another run is necessary.
```{r}
max(abs(myResDN$convergenceStatistics))
```
If convergence is too high, update algorithm, re-run estimation with previous results as starting values and check convergence:
```{r, eval=FALSE}
# estimate mobility network model again based on previous results to improve convergence
# with an adjusted algorithm
myAlg <- monanAlgorithmCreate(myState, myEffects, multinomialProposal = TRUE,
initialIterationsN2 = 200, nsubN2 = 1, initGain = 0.02, iterationsN3 = 1000)
# for users of other stocnet packages, we can also use monan07 to run an estimation
# (it is an alias for estimateMobilityNetwork)
myResDN <- monan07(
myState, myEffects, myAlg,
prevAns = myResDN,
parallel = TRUE, cpus = 4,
verbose = TRUE,
returnDeps = TRUE,
fish = FALSE
)
```
```{r}
# check convergence
max(abs(myResDN$convergenceStatistics))
```
In case convergence is still poor, updating the algorithm might be necessary. Otherwise, we can view the results. The first column is the estimate, the second column the standard error, and the third column the convergence ratio. All values in the final column should be below 0.1 (see above).
```{r}
myResDN
```
## Diagnostics of the estimated model
The following two diagnostics indicate the extent to which the chain mixes (i.e., whether the thinning was chosen appropriately). The autoCorrelationTest indicates the degree to which the values of the dependent variable of consecutive draws from the chain in phase 3 are correlated. Here lower values are better. Values above 0.5 are very problematic and indicate that a higher thinning is needed.
```{r}
autoCorrelationTest(myResDN)
```
The output of extractTraces indicates the correlation of statistics between subsequent draws from the chain in phase 3. The plot should show data points randomly scattered around the target line, as shown below. If patterns in the traces are discernible, a higher thinning is needed.
```{r}
traces <- extractTraces(myResDN, myEffects)
par(mfrow = c(1,2))
plot(traces)
```
## Score-tests to check model specification
Based on an estimated model, a score-type test is available that shows whether statistics representing non-included effects are well represented. If this is not the case, it is likely that including them will result in significant estimates.
```{r}
myEffects2 <- createEffects(myState) |>
addEffect(transitivity_min)
test_ME.2 <- scoreTest(myResDN, myEffects2)
test_ME.2
```
The interpretation is that there appears to be some transitive clustering in the data that the model does not account for in its current form.
## GOF testing
Akin to ERGMs, goodness of fit testing is available to see whether auxiliary statistics are well captured by the model.
```{r}
myGofIndegree <- monanGOF(ans = myResDN, gofFunction = getIndegree, lvls = 1:70)
plot(myGofIndegree, lvls = 20:70)
myGofTieWeight <- monanGOF(ans = myResDN, gofFunction = getTieWeights, lvls = 1:20)
plot(myGofTieWeight, lvls = 1:20)
```
## Simulation
The package also provides the possibility to exemplarily simulate mobility networks based on the data and the specified effects and parameters.
```{r, eval=FALSE}
mySimDN <- monanSimulate(
myState,
myEffects,
parameters = c(2, 1, 1.5, 0.5, 0.1, -1, -0.5),
allowLoops = TRUE,
burnin = 45000,
thinning = 15000,
nSimulations = 10
)
mySimDN[[1]]
```