Using causaleffect and igraph.Rmd

---
title: "Generating and calculating causal models"
author: "Cory Whitney"
date: "`r Sys.Date()`"
output: 
 html_document:
      toc: true
      toc_float: true
      toc_collapsed: false
bibliography: references/Causal_refs.bib
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(bnlearn)
library(causaleffect)
library(igraph)
```

# Causal maps

Make a causal map in `igraph` [@igraph]. For example the notation X -+ Z identifies and draws a directed edge (a causal arrow) from X to Z. These can be words or phrases but this does not look very good yet and we probably need to do a bit of work learning and 'hacking' igraph if we want these to be very verbose text boxes.

```{r test_fig}
test_fig <- graph.formula(W -+ X, W -+ Z, X -+ Z, Z -+ Y, X -+ Y, Y -+ X, simplify = FALSE)

plot(test_fig)
```

The fifth and sixth arrow of the "graph.formula" (connecting X and Y) are bidirected. We therefore have to add the following "set.edge.attribute" command:

```{r fig1 ammend}
test_fig_ammend <- set.edge.attribute(graph = test_fig, name = "description", index = c(5,6), value = "U")

plot(test_fig_ammend)
```

# Mathematical expressions

Get a mathematical expression for the 'interventional distribution' (do-Calculus) using the `causal.effect` function from the causaleffect library [@tikkaIdentifyingCausalEffects2017].

```{r ce1}
ce_formula <- causal.effect(y = "Y", x = "X", z = NULL, G = test_fig_ammend, expr = TRUE)

ce_formula
```

The result of this function is readable in Latex. We can replace the double slash '\\' with a single '\' so that it is also readable in Rmarkdown.

Or we can avoid that manual work and use the 'Concatenate and Print' `cat()` function. 

```{r cat ce1}
cat(ce_formula)
```

The results can be added to Rmd files when wrapped in '$' symbols in R markdown. This results in a LateX formula looks like this: 

$\sum_{W,Z}\left(\sum_{X}P(Y|W,X,Z)P(X|W)\right)P(Z|W,X)P(W)$

# Nutrition example

We generate a logical causal impact pathway for an intervention to planting trees for better nutrition [c.f. @whitneyIntegrationFruitTrees2018]. 

```{r message=FALSE, warning=FALSE}
#decision impact pathway

nutrition_decision <- graph.formula(PT -+  FY, #plant_trees -+ fruit_yield, 
                                    PT -+  C, #plant_trees -+ costs,
                                    FY -+  EF, #fruit_yield -+ eat_fruit,
                                    NK -+  EF, #nutrition_knowledge -+ eat_fruit,
                                    FY -+  S, #fruit_yield -+ sales,
                                    EF -+  N, #eat_fruit -+ nutrition,
                                    C -+  I, #costs -+ income,
                                    S -+  I, #sales -+ income,
                                    I -+  N, #income -+ nutrition,
                                    NN -+  N, #nutrition_needs -+ nutrition,
                   simplify = FALSE)

plot(nutrition_decision)
```

Use the `igraph` model to get the formula for the causal effect of the nutrition intervention to plant trees `PT` on nutrition outcomes `N`. Concatenate and print the formula output from the `causal.effect` output with `cat` from base R. 

```{r get_nutrition_formula, message=FALSE, warning=FALSE, results='hide'}
nutrition_formula <- causal.effect(y = "N", x = "PT", z = NULL, G = nutrition_decision, expr = TRUE)

cat(nutrition_formula)
```

$\sum_{NK,NN,FY,EF,S,I}P(N|PT,NK,NN,FY,EF,S,I)P(I|PT,FY,S)P(S|PT,FY)P(EF|PT,NK,FY)P(FY|PT)P(NN)P(NK)$


Use the `bnlearn` library to assess the `nutrition_decision_data` model. The first step is to create an empty graph with as a 'bn' object.More examples of this process are given in the [bnlearn blog](http://www.bnlearn.com/examples/dag/).

```{r empty_bn}
nutrition_graph = bnlearn::empty.graph(c("PT", "FY", #plant_trees, fruit_yield, 
                                         "C", "EF",#costs, eat_fruit,
                                         "NK", "S", #nutrition_knowledge, sales, 
                                         "N", "I", #nutrition, income,
                                         "NN"), #nutrition_needs
                                       num = 1)
```

Create a matrix of edges (arrows) for the BN. 

```{r nutrition_edges_matrix}
nutrition_edges = matrix(c("PT", "FY", #plant_trees -+ fruit_yield, 
                           "PT", "C", #plant_trees -+ costs,
                           "FY", "EF",#fruit_yield -+ eat_fruit,
                           "NK", "EF", #nutrition_knowledge -+ eat_fruit,
                           "FY", "S", #fruit_yield -+ sales,
                           "EF", "N",#eat_fruit -+ nutrition,
                           "C", "I", #costs -+ income,
                           "S", "I", #sales -+ income,
                           "I", "N",#income -+ nutrition,
                           "NN", "N"),#nutrition_needs -+ nutrition,
                         ncol = 2, byrow = TRUE,
                         dimnames = list(NULL, c("from", "to")))
 nutrition_edges
```

Add edges to the BN and plot the resulting model. 

```{r create_arcs} 
 bnlearn::arcs(nutrition_graph) = nutrition_edges
 
 plot(nutrition_graph)
```

Test that the graph is acyclic using the `bnlearn` `acyclic` function. 

```{r}
 acyclic(nutrition_graph)
```

To test this model we can generate some random data. 

```{r create_conservation_decision_data}
nutrition_decision_data <- data.frame(
            PT = sample(c("No","Yes"), 50, replace = TRUE),#plant_trees
            C = sample(c("Low", "Affordable", "High"), 50, replace = TRUE),#costs
            FY = sample(c("Low","High"), 50, replace = TRUE),#fruit_yield
            EF = sample(c("No","Yes"), 50, replace = TRUE),#eat_fruit
            NK = sample(c("Low","High"), 50, replace = TRUE),#nutrition_knowledge
            S = sample(c("No","Yes"), 50, replace = TRUE),#sales
            N = sample(c("Insufficient", "Moderate", "Sufficient"), 50, replace = TRUE),#nutrition
            I = sample(c("Low", "Sufficient", "High"), 50, replace = TRUE),#income
            NN  = sample(c("Low","High"), 50, replace = TRUE))#nutrition_needs
```

Create a fitted BN with the `bn.fit` function from `bnlearn`.

```{r}
#bn.fit runs the ensemble EM algorithm to learn CPT
fittedbn <- bn.fit(nutrition_graph, data = nutrition_decision_data)
```

Now we can infer from the BN. We ask for the chance that a farmer with high nutrition knowledge plants and has planted trees will have sufficient nutrition.

```{r query_1, message=TRUE, warning=FALSE}
cpquery(fittedbn, event = (N=="Sufficient"), 
        evidence = ( PT=="Yes" & NK=="High" ) )
```

We then ask for the chance that a farmer with low nutrition knowledge plants and has planted trees will have sufficient nutrition.

```{r query_2, message=TRUE, warning=FALSE}
cpquery(fittedbn, event = (N=="Sufficient"), 
        evidence = ( PT=="No" & NK=="Low" ) )
```

Because the data is randomly generated with the `sample` function we do not get a very sensible response at the moment. The next step is to build the conditional probability tables that are in teh backgroud of the BN. We can do this with the built in features of the `bnlearn` package or use the `make_CPT` function of the `decisionSupport` package.

# References