Reorganizing pages

gvegayon · May 11, 2024 · 85d712d · 85d712d
1 parent 7d8a489
commit 85d712d
Show file tree

Hide file tree

Showing 9 changed files with 311 additions and 295 deletions.
diff --git a/03.rda b/03.rda
diff --git a/_quarto.yaml b/_quarto.yaml
@@ -9,38 +9,46 @@ book:
   google-analytics: UA-40608305-4
   title: Applied Network Science with R
   author: George G. Vega Yon, Ph.D.
-  date: 2024-05-07
+  date: today
   cover-image: img/front-page-dalle.png
   cover-image-alt: 'An AI image generated with Bing: Draw an image of a social network. Include a person examining the network and holding a laptop in one hand. The laptop should have the logo of the R programming language.'
   page-footer: Applied Network Science with R - [https://ggvy.cl](https://ggvy.cl){target="_blank"}
   repo-url: https://github.com/gvegayon/appliedsnar
   repo-branch: master
   repo-actions: [edit]
+  image: img/front-page-dalle.png
+  twitter-card:
+    description: 'This (WIP) book is a collection of examples using the R programming for network science. It includes examples of network data processing, visualization, simulation, and modeling.'
+    creator: "@gvegayon"
+  site-url: https://book.ggvy.cl
+  sharing: [twitter, linkedin]
   navbar:
-    background: light
+    # background: light
     search: true
 
   chapters: 
     - index.qmd
-    - part: Applications
+    - part-01-01-intro.qmd
+    - part-01-02-the-basics.qmd
+    - part: "**Applications**"
       chapters:
-        - part-01-01-intro.qmd
-        - part-01-02-the-basics.qmd
         - part-01-03-week-1-sns-study.qmd
+        - part-01-06-network-simulation-and-viz.qmd
+        - part-01-07-egonets.qmd
+        - part-01-09-netdiffuser.qmd
+    - part: "**Statistical inference**"
+      chapters: 
         - part-01-04-ergms.qmd
         - part-01-05-ergms-constrains.qmd
         - part-01-05-stergm.qmd
-        - part-01-06-network-simulation-and-viz.qmd
-        - part-01-07-egonets.qmd
         - part-01-08-netboot.qmd
-        - part-01-09-netdiffuser.qmd
         - part-01-10-siena.qmd
         - part-01-11-power.qmd
-    - part: Statistical Foundations
+    - part: "**Foundations**"
       chapters:
         - part-02-10-statistical-foundations.qmd
         - part-02-11-power.qmd
-    - part: Appendix
+    - part: "**Appendix**"
       chapters:
         - part-03-12-data-appendix.qmd
     - references.qmd
@@ -52,11 +60,12 @@ bibliography: book.bib
 
 biblio-style: apalike
 
-format:
+format: 
   html:
     html-math-method: mathjax
     toc: true
-    number-sections: true
+    number-sections: false
+    theme: cerulean
   pdf:
     geometry: 
       - top=1in

diff --git a/book.bib b/book.bib
@@ -309,4 +309,12 @@ @Manual{R
     address = {Vienna, Austria},
     year = {2024},
     url = {https://www.R-project.org/},
-  }
+  }
+
+@Manual{R-latticeExtra,
+  title = {latticeExtra: Extra Graphical Utilities Based on Lattice},
+  author = {Deepayan Sarkar and Felix Andrews},
+  year = {2022},
+  note = {R package version 0.6-30},
+  url = {http://latticeextra.r-forge.r-project.org/},
+}
diff --git a/ergm.rda b/ergm.rda
diff --git a/part-01-03-week-1-sns-study.qmd b/part-01-03-week-1-sns-study.qmd
@@ -1,10 +1,10 @@
 ---
-date-modified: 2024-05-09
+date-modified: 2024-05-10
 ---
 
-# Network Nomination Data
+# School networks
 
-This chapter provides a start-to-finish example for processing survey-type data in R. The chapter features the Social Network Study [SNS] dataset. You can download the data for this chapter [here](https://cdn.rawgit.com/gvegayon/appliedsnar/fdc0d26f/03-sns.dta); and the codebook for the data provided here is in [the appendix](#sns-data).
+This chapter provides a start-to-finish example for processing survey-type data in R. The chapter features the Social Network Study [SNS] dataset. You can download the data for this chapter [here](https://cdn.rawgit.com/gvegayon/appliedsnar/fdc0d26f/03-sns.dta), and the codebook for the data provided here is in [the appendix](#sns-data).
 
 The goals for this chapter are:
 

diff --git a/part-01-04-ergms.qmd b/part-01-04-ergms.qmd
@@ -1,6 +1,10 @@
+---
+date-modified: 2024-05-11
+---
+
 # Exponential Random Graph Models
 
-I strongly suggest reading the vignette included in the `ergm` R package.
+I strongly suggest reading the vignette in the `ergm` R package.
 
 :::{.content-hidden}
 {{< include math.tex >}}
@@ -44,13 +48,13 @@ $$
 
 ,is the normalizing factor that ensures that equation @eq-main-ergm is a legitimate probability distribution. Even after fixing $\mathcal{Y}$ to be all the networks that have size $n$, the size of $\mathcal{Y}$ makes this type of statistical model hard to estimate as there are $N = 2^{n(n-1)}$ possible networks! [@Hunter2008]
 
-Recent developments include new forms of dependency structures to take into account more general neighborhood effects. These models relax the one-step Markovian dependence assumptions, allowing investigation of longer-range configurations, such as longer paths in the network or larger cycles (Pattison and Robins 2002). Models for bipartite (Faust and Skvoretz 1999) and tripartite (Mische and Robins 2000) network structures have been developed. [@Hunter2008 p. 9]
+Later developments include new dependency structures to consider more general neighborhood effects. These models relax the one-step Markovian dependence assumptions, allowing investigation of longer-range configurations, such as longer paths in the network or larger cycles (Pattison and Robins 2002). Models for bipartite (Faust and Skvoretz 1999) and tripartite (Mische and Robins 2000) network structures have been developed. [@Hunter2008 p. 9]
 
 ## A naïve example
 
-In the simplest case, ERGMs equate a logistic regression. By simple, I mean cases in which there are no Markovian terms--motifs involving more than one edge--for example, the Bernoulli graph. In the Bernoulli graph, ties are independent of each other, so the presence/absence of a tie between nodes $i$ and $j$ won't affect the presence/absence of a tie between nodes $k$ and $l$.
+In the simplest case, ERGMs equate a logistic regression. By simple, I mean cases with no Markovian terms--motifs involving more than one edge--for example, the Bernoulli graph. In the Bernoulli graph, ties are independent, so the presence/absence of a tie between nodes $i$ and $j$ won't affect the presence/absence of a tie between nodes $k$ and $l$.
 
-Let's fit an ERGM using the `sampson` dataset included in the `ergm` package.
+Let's fit an ERGM using the `sampson` dataset in the `ergm` package.
 
 
 ```{r part-01-04-loading-data, echo=TRUE, collapse=TRUE, message=FALSE}
@@ -100,7 +104,7 @@ Again, the same result. The Bernoulli graph is not the only ERGM model that can
 
 ## Estimation of ERGMs
 
-The ultimate goal is to perform statistical inference on the proposed model. In a *standard* setting, we would be able to use Maximum-Likelihood-Estimation (MLE), which consists of finding the model parameters $\theta$ that, given the observed data, maximize the likelihood of the model. For the latter, we generally use [Newton's method](https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization). Newton's method requires been able to compute the log-likelihood of the model, which in ERGMs can be challenging.
+The ultimate goal is to perform statistical inference on the proposed model. In a *standard* setting, we could use Maximum Likelihood Estimation (MLE), which consists of finding the model parameters $\theta$ that, given the observed data, maximize the likelihood of the model. For the latter, we generally use [Newton's method](https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization). Newton's method requires computing the model's log-likelihood, which can be challenging in ERGMs.
 
 For ERGMs, since part of the likelihood involves a normalizing constant that is a function of all possible networks, this is not as straightforward as in the regular setting. Because of this, most estimation methods rely on simulations.
 
@@ -144,15 +148,15 @@ For more details, see [@Hunter2008]. A sketch of the algorithm follows:
 
 1.  Initialize the algorithm with an initial guess of $\theta$, call it $\theta^{(t)}$ (must be a rather OK guess)
 
-2.  While (no convergence) do:
-    
-    a.  Using $\theta^{(t)}$, simulate $M$ networks by means of small changes in the $\mathbf{Y}_{obs}$ (the observed network). This part is done by using an importance-sampling method which weights each proposed network by its likelihood conditional on $\theta^{(t)}$
+2. While (no convergence) do:
     
-    b.  With the networks simulated, we can do the Newton step to update the parameter $\theta^{(t)}$ (this is the iteration part in the `ergm` package): $\theta^{(t)}\to\theta^{(t+1)}$.
-    
-    c.  If convergence has been reached (which usually means that $\theta^{(t)}$ and $\theta^{(t + 1)}$ are not very different), then stop; otherwise, go to step a.
+  a. Using $\theta^{(t)}$, simulate $M$ networks by means of small changes in the $\mathbf{Y}_{obs}$ (the observed network). This part is done by using an importance-sampling method which weights each proposed network by its likelihood conditional on $\theta^{(t)}$
+  
+  b. With the networks simulated, we can do the Newton step to update the parameter $\theta^{(t)}$ (this is the iteration part in the `ergm` package): $\theta^{(t)}\to\theta^{(t+1)}$.
+  
+  c. If convergence has been reached (which usually means that $\theta^{(t)}$ and $\theta^{(t + 1)}$ are not very different), then stop; otherwise, go to step a.
 
-For more details see [@lusher2012;@admiraal2006;@Snijders2002;@Wang2009] provides details on the algorithm used by PNet (which is the same as the one used in `RSiena`). [@lusher2012] provides a short discussion on the differences between `ergm` and `PNet`. 
+[@lusher2012;@admiraal2006;@Snijders2002;@Wang2009] provides details on the algorithm used by PNet (the same as the one used in `RSiena`), and [@lusher2012] provides a short discussion on the differences between `ergm` and `PNet`. 
 
 
 ## The `ergm` package
@@ -391,56 +395,65 @@ sample_uncentered <- coda::mcmc.list(sample_uncentered)
 
 Under the hood:
 
-1.  _Empirical means and sd, and quantiles_: 
-    ```{r coda-summary}
-    summary(sample_uncentered)
-    ```
-2.  _Cross correlation_: 
-    ```{r coda-corr}
-    coda::crosscorr(sample_uncentered)
-    ```
-3.  _Autocorrelation_: For now, we will only look at autocorrelation for chain one. Autocorrelation should be small (in a general MCMC setting). If autocorrelation is high, then it means that your sample is not idd (no Markov property). A way out to solve this is *thinning* the sample.
-    ```{r coda-autocorr}
-    coda::autocorr(sample_uncentered)[[1]]
-    ```
-4.  _Geweke Diagnostic_: From the function's help file:
-    
-    > "If the samples are drawn from the stationary distribution of the chain, the two means are equal and Geweke's statistic has an asymptotically standard normal distribution. [...]
-    The Z-score is calculated under the assumption that the two parts of the chain are asymptotically independent, which requires that the sum of frac1 and frac2 be strictly less than 1.""
-    >
-    > ---?coda::geweke.diag 
-    
-    Let's take a look at a single chain:
-    
-    ```{r coda-geweke.diag}
-    coda::geweke.diag(sample_uncentered)[[1]]
-    ```
-5.  _(not included) Gelman Diagnostic_: From the function's help file:
-    
-    > Gelman and Rubin (1992) propose a general approach to monitoring convergence of MCMC output in which m > 1 parallel chains are run with starting values that are overdispersed relative to the posterior distribution. Convergence is diagnosed when the chains have ‘forgotten’ their initial values, and the output from all chains is indistinguishable. The gelman.diag diagnostic is applied to a single variable from the chain. It is based a comparison of within-chain and between-chain variances, and is similar to a classical analysis of variance.
-    > ---?coda::gelman.diag
-    
-    As a difference from the previous diagnostic statistic, this uses all chains simulatenously:
-    
-    ```{r coda-gelman.diag}
-    coda::gelman.diag(sample_uncentered)
-    ```
+1. _Empirical means and sd, and quantiles_: 
+
+  ```{r coda-summary}
+  summary(sample_uncentered)
+  ```
+
+2. _Cross correlation_: 
+
+  ```{r coda-corr}
+  coda::crosscorr(sample_uncentered)
+  ```
+
+3. Autocorrelation_: For now, we will only look at autocorrelation for chain one. Autocorrelation should be small (in a general MCMC setting). If autocorrelation is high, then it means that your sample is not idd (no Markov property). A way out to solve this is *thinning* the sample.
+
+  ```{r coda-autocorr}
+  coda::autocorr(sample_uncentered)[[1]]
+  ```
+
+4. _Geweke Diagnostic_: From the function's help file:
+  
+  > "If the samples are drawn from the stationary distribution of the chain, the two means are equal and Geweke's statistic has an asymptotically standard normal distribution. [...]
+  The Z-score is calculated under the assumption that the two parts of the chain are asymptotically independent, which requires that the sum of frac1 and frac2 be strictly less than 1.""
+  >
+  > ---?coda::geweke.diag 
+  
+  Let's take a look at a single chain:
+  
+  ```{r coda-geweke.diag}
+  coda::geweke.diag(sample_uncentered)[[1]]
+  ```
+
+5. _(not included) Gelman Diagnostic_: From the function's help file:
     
-    As a rule of thumb, values that are in the $[.9,1.1]$ are good.
+  > Gelman and Rubin (1992) propose a general approach to monitoring convergence of MCMC output in which m > 1 parallel chains are run with starting values that are overdispersed relative to the posterior distribution. Convergence is diagnosed when the chains have ‘forgotten’ their initial values, and the output from all chains is indistinguishable. The gelman.diag diagnostic is applied to a single variable from the chain. It is based a comparison of within-chain and between-chain variances, and is similar to a classical analysis of variance.
+  > ---?coda::gelman.diag
+  
+  As a difference from the previous diagnostic statistic, this uses all chains simultaneously:
+  
+  ```{r coda-gelman.diag}
+  coda::gelman.diag(sample_uncentered)
+  ```
+  
+  As a rule of thumb, values in the $[.9,1.1]$ are good.
  
-One nice feature of the `mcmc.diagnostics` function is the nice trace and posterior distribution plots that it generates. If you have the R package `latticeExtra` [@R-latticeExtra], the function will override the default plots used by `coda::plot.mcmc` and use lattice instead, creating a nicer looking plots. The next code chunk calls the `mcmc.diagnostic` function, but we suppress the rest of the output (see figure \@ref(fig:coda-plots)).
+One nice feature of the `mcmc.diagnostics` function is the nice trace and posterior distribution plots that it generates. If you have the R package `latticeExtra` [@R-latticeExtra], the function will override the default plots used by `coda::plot.mcmc` and use lattice instead, creating nicer-looking plots. The next code chunk calls the `mcmc.diagnostic` function, but we suppress the rest of the output (see figure @fig-coda-plots).
 
 
-```{r coda-plots, fig.align='center', fig.height=8, cache=FALSE, echo=TRUE, results='hide', warning=FALSE, fig.cap=c("Trace and posterior distribution of sampled network statistics.", "Trace and posterior distribution of sampled network statistics (cont'd)."), fig.pos='!h'}
+```{r}
+#| label: fig-coda-plots
+#| fig-align: center
+#| fig-cap: "Trace and posterior distribution of sampled network statistics."
 # [2022-03-13] This line is failing for what it could be an ergm bug
 # mcmc.diagnostics(ans0, center = FALSE) # Suppressing all the output
 ```
 
 
 If we call the function `mcmc.diagnostics`, this message appears at the end:
 
->
-MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).
+> MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).
 >
 > ---`mcmc.diagnostics(ans0)`
 

diff --git a/part-01-05-stergm.qmd b/part-01-05-stergm.qmd
@@ -1,3 +1,3 @@
-# (Separable) Temporal Exponential Family Random Graph Models
+# Temporal Exponential Family Random Graph Models
 
-This tutorial is great! https://statnet.org/trac/raw-attachment/wiki/Sunbelt2016/tergm_tutorial.pdf
+This tutorial is great! [https://statnet.org/trac/raw-attachment/wiki/Sunbelt2016/tergm_tutorial.pdf](https://statnet.org/trac/raw-attachment/wiki/Sunbelt2016/tergm_tutorial.pdf){target="_blank"}
diff --git a/part-01-06-network-simulation-and-viz.qmd b/part-01-06-network-simulation-and-viz.qmd
@@ -1,4 +1,4 @@
-# Simulating and visualizing networks
+# Simulation and vizualization
 
 In this chapter, we will build and visualize artificial networks using Exponential
 Random Graph Models [ERGMs.] Together with chapter 3, this will be an extended