-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathUsing causaleffect and igraph.Rmd
176 lines (128 loc) · 7.09 KB
/
Using causaleffect and igraph.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
---
title: "Generating and calculating causal models"
author: "Cory Whitney"
date: "`r Sys.Date()`"
output:
html_document:
toc: true
toc_float: true
toc_collapsed: false
bibliography: references/Causal_refs.bib
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(bnlearn)
library(causaleffect)
library(igraph)
```
# Causal maps
Make a causal map in `igraph` [@igraph]. For example the notation X -+ Z identifies and draws a directed edge (a causal arrow) from X to Z. These can be words or phrases but this does not look very good yet and we probably need to do a bit of work learning and 'hacking' igraph if we want these to be very verbose text boxes.
```{r test_fig}
test_fig <- graph.formula(W -+ X, W -+ Z, X -+ Z, Z -+ Y, X -+ Y, Y -+ X, simplify = FALSE)
plot(test_fig)
```
The fifth and sixth arrow of the "graph.formula" (connecting X and Y) are bidirected. We therefore have to add the following "set.edge.attribute" command:
```{r fig1 ammend}
test_fig_ammend <- set.edge.attribute(graph = test_fig, name = "description", index = c(5,6), value = "U")
plot(test_fig_ammend)
```
# Mathematical expressions
Get a mathematical expression for the 'interventional distribution' (do-Calculus) using the `causal.effect` function from the causaleffect library [@tikkaIdentifyingCausalEffects2017].
```{r ce1}
ce_formula <- causal.effect(y = "Y", x = "X", z = NULL, G = test_fig_ammend, expr = TRUE)
ce_formula
```
The result of this function is readable in Latex. We can replace the double slash '\\' with a single '\' so that it is also readable in Rmarkdown.
Or we can avoid that manual work and use the 'Concatenate and Print' `cat()` function.
```{r cat ce1}
cat(ce_formula)
```
The results can be added to Rmd files when wrapped in '$' symbols in R markdown. This results in a LateX formula looks like this:
$\sum_{W,Z}\left(\sum_{X}P(Y|W,X,Z)P(X|W)\right)P(Z|W,X)P(W)$
# Nutrition example
We generate a logical causal impact pathway for an intervention to planting trees for better nutrition [c.f. @whitneyIntegrationFruitTrees2018].
```{r message=FALSE, warning=FALSE}
#decision impact pathway
nutrition_decision <- graph.formula(PT -+ FY, #plant_trees -+ fruit_yield,
PT -+ C, #plant_trees -+ costs,
FY -+ EF, #fruit_yield -+ eat_fruit,
NK -+ EF, #nutrition_knowledge -+ eat_fruit,
FY -+ S, #fruit_yield -+ sales,
EF -+ N, #eat_fruit -+ nutrition,
C -+ I, #costs -+ income,
S -+ I, #sales -+ income,
I -+ N, #income -+ nutrition,
NN -+ N, #nutrition_needs -+ nutrition,
simplify = FALSE)
plot(nutrition_decision)
```
Use the `igraph` model to get the formula for the causal effect of the nutrition intervention to plant trees `PT` on nutrition outcomes `N`. Concatenate and print the formula output from the `causal.effect` output with `cat` from base R.
```{r get_nutrition_formula, message=FALSE, warning=FALSE, results='hide'}
nutrition_formula <- causal.effect(y = "N", x = "PT", z = NULL, G = nutrition_decision, expr = TRUE)
cat(nutrition_formula)
```
$\sum_{NK,NN,FY,EF,S,I}P(N|PT,NK,NN,FY,EF,S,I)P(I|PT,FY,S)P(S|PT,FY)P(EF|PT,NK,FY)P(FY|PT)P(NN)P(NK)$
Use the `bnlearn` library to assess the `nutrition_decision_data` model. The first step is to create an empty graph with as a 'bn' object.More examples of this process are given in the [bnlearn blog](http://www.bnlearn.com/examples/dag/).
```{r empty_bn}
nutrition_graph = bnlearn::empty.graph(c("PT", "FY", #plant_trees, fruit_yield,
"C", "EF",#costs, eat_fruit,
"NK", "S", #nutrition_knowledge, sales,
"N", "I", #nutrition, income,
"NN"), #nutrition_needs
num = 1)
```
Create a matrix of edges (arrows) for the BN.
```{r nutrition_edges_matrix}
nutrition_edges = matrix(c("PT", "FY", #plant_trees -+ fruit_yield,
"PT", "C", #plant_trees -+ costs,
"FY", "EF",#fruit_yield -+ eat_fruit,
"NK", "EF", #nutrition_knowledge -+ eat_fruit,
"FY", "S", #fruit_yield -+ sales,
"EF", "N",#eat_fruit -+ nutrition,
"C", "I", #costs -+ income,
"S", "I", #sales -+ income,
"I", "N",#income -+ nutrition,
"NN", "N"),#nutrition_needs -+ nutrition,
ncol = 2, byrow = TRUE,
dimnames = list(NULL, c("from", "to")))
nutrition_edges
```
Add edges to the BN and plot the resulting model.
```{r create_arcs}
bnlearn::arcs(nutrition_graph) = nutrition_edges
plot(nutrition_graph)
```
Test that the graph is acyclic using the `bnlearn` `acyclic` function.
```{r}
acyclic(nutrition_graph)
```
To test this model we can generate some random data.
```{r create_conservation_decision_data}
nutrition_decision_data <- data.frame(
PT = sample(c("No","Yes"), 50, replace = TRUE),#plant_trees
C = sample(c("Low", "Affordable", "High"), 50, replace = TRUE),#costs
FY = sample(c("Low","High"), 50, replace = TRUE),#fruit_yield
EF = sample(c("No","Yes"), 50, replace = TRUE),#eat_fruit
NK = sample(c("Low","High"), 50, replace = TRUE),#nutrition_knowledge
S = sample(c("No","Yes"), 50, replace = TRUE),#sales
N = sample(c("Insufficient", "Moderate", "Sufficient"), 50, replace = TRUE),#nutrition
I = sample(c("Low", "Sufficient", "High"), 50, replace = TRUE),#income
NN = sample(c("Low","High"), 50, replace = TRUE))#nutrition_needs
```
Create a fitted BN with the `bn.fit` function from `bnlearn`.
```{r}
#bn.fit runs the ensemble EM algorithm to learn CPT
fittedbn <- bn.fit(nutrition_graph, data = nutrition_decision_data)
```
Now we can infer from the BN. We ask for the chance that a farmer with high nutrition knowledge plants and has planted trees will have sufficient nutrition.
```{r query_1, message=TRUE, warning=FALSE}
cpquery(fittedbn, event = (N=="Sufficient"),
evidence = ( PT=="Yes" & NK=="High" ) )
```
We then ask for the chance that a farmer with low nutrition knowledge plants and has planted trees will have sufficient nutrition.
```{r query_2, message=TRUE, warning=FALSE}
cpquery(fittedbn, event = (N=="Sufficient"),
evidence = ( PT=="No" & NK=="Low" ) )
```
Because the data is randomly generated with the `sample` function we do not get a very sensible response at the moment. The next step is to build the conditional probability tables that are in teh backgroud of the BN. We can do this with the built in features of the `bnlearn` package or use the `make_CPT` function of the `decisionSupport` package.
# References