forked from bvkrauth/is4e
-
Notifications
You must be signed in to change notification settings - Fork 0
/
06-More-on-random-variables.Rmd
1491 lines (1244 loc) · 56.9 KB
/
06-More-on-random-variables.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# (PART\*) Statistical Theory {-}
# More on random variables {#more-on-random-variables}
```{r setup6, include=FALSE}
knitr::opts_chunk$set(echo = FALSE,
prompt = FALSE,
tidy = TRUE,
collapse = TRUE)
library("tidyverse")
```
In our first few chapters, we developed the fundamental tools for understanding
statistics: the practical skills of cleaning data and calculating statistics,
and the basic theoretical concepts for thinking about random events and random
variables. Over the next few chapters, we will connect these tools by
conceptualizing data sets and the statistics calculated as random variables.
This connection is what makes statistics a tool for *science* and not just a
set of calculation procedures. The first step in building that connection is
to extend our theory of random variables from a
[single discrete random variable](#random-variables) to a wider range of
possibilities.
This chapter extends our theory to both continuous random variables and to
pairs or groups of random variables.
::: {.goals data-latex=""}
**Chapter goals**
In this chapter, we will learn how to:
1. Interpret the CDF and PDF of a continuous random variable.
2. Know and use the key properties of the uniform distribution.
3. Derive the distribution for a linear function of a uniform random variable.
4. Know and use the key properties of the normal distribution.
5. Derive the distribution for a linear function of a normal random variable.
6. Calculate the joint PDF of two discrete random variables from the
probability distribution of a random outcome.
7. Calculate a marginal PDF from a (discrete) joint PDF.
8. Calculate a conditional PDF from a (discrete) joint PDF.
9. Interpret joint, marginal, and conditional distributions.
11. Determine whether two random variables are independent.
12. Calculate the covariance of two discrete random variables from their joint
PDF.
13. Calculate the covariance of two random variables using the expected value
formula.
14. Calculate the correlation of two random variables from their covariance.
15. Calculate the covariance and correlation of two independent random
variables.
16. Calculate the expected value of a linear function of two or more random
variables.
17. Interpret covariances and correlations.
:::
To prepare for this chapter, please review the introductory chapter on
[random variables](#random-variables).
## Continuous random variables {#continuous-random-variables}
Many random variables of interest have a continuous support. That is, they
can take on any value in a particular range. Examples of such variables
include:
- Physical quantities such as distance, mass, volume, or temperature.
- Time values such as the current time or the time it takes to drive to school
from your home.
Because continuous random variables can take on any value in a particular
range, the chance that they take on any specific value is very low (in fact,
it is zero). This makes the math for continuous random variables a little
harder, which is why we started with discrete random variables.
::: example
**Labour force participation**
The labour force participation rate is defined as:
$$(\textrm{LFP rate}) = \frac{(\textrm{labour force})}{(\textrm{population})} \times 100\%$$
It can be any number between 0\% and 100\%:
$$S_{LFP rate} = [0\%,100\%]$$
so it is a continuous random variable.
:::
### General properties
We can describe the general properties of a continuous random variable by
comparing them to the properties of a discrete random variable.
We learned in an earlier chapter that the support of a discrete random variable
typically includes a *finite* number of values, each of which has strictly
*positive* probability, and most formulas for probabilities (including PDFs and
CDFs) and expected values use just *addition and subtraction*.
In contrast, the support of a continuous random variable includes an *infinite*
number of values, each of which has *zero* probability, and most formulas for
probabilities and expected values use *calculus*.
::: {.sfu data-latex=""}
**ECON 233 calculus prerequisites**
Differential calculus (MATH 151 or 157) is a prerequisite for ECON 233, but
integral calculus (MATH 152 or 158) is not. I will not assume you know how to
interpret or calculate an integral, and will not require you to do so in any
assigned or graded work in ECON 233.
:::
But deep down, there is no important *practical* difference between continuous
and discrete random variables. The intuition for this is that you can closely
approximate any continuous random variable by rounding it. The rounded variable
will be discrete, and our earlier results for discrete random variables apply.
With only a few exceptions, everything that is true for discrete random
variables is also true for continuous ones.
::: example
**Rounding a continuous variable to make it discrete**
Suppose you round the labour force participation rate to the nearest percentage
point. The rounded LFP rate is a discrete random variable with support:
$$S_x = \{0\%, 1\%, \ldots 99\%, 100\% \}$$
Alternatively, we could round to the nearest 1/100th of a percentage point,
to the nearest 1/1,000,000th of a percentage point, etc. As we round to a
higher and higher precision, the approximation gets closer and closer.
:::
As a result, our coverage of continuous random variables will be brief and will
mostly avoid calculus.
::: {.fyi data-latex=""}
**Formulas using integrals**
When a relevant mathematical formula uses integrals, I will put it in an "FYI"
box like this one. This means I am providing the formula to show you that it
exists, but do not expect you to understand, remember, or perform any
calculation using the formula.
If you *do* know some integral calculus, you might notice that the formulas for
continuous random variables look just like the ones for discrete random
variables, but with sums replaced by integrals. This should not be surprising
since an integral *is* a sum, or at least the limit of a sequence of sums.
:::
### The continuous CDF {#continuous-pdf-and-cdf}
The CDF of a continuous random variable $x$ is defined exactly the same way as
for the discrete case:
$$F_x(a) = \Pr(x \leq a)$$
The only difference is how it looks. If you recall, the CDF of a discrete random
variable takes on a stair-step form: increasing in discrete jumps at every point
in the discrete support, and flat everywhere else. In contrast, the CDF of a
continuous random variable increases smoothly over its support. It can have
flat parts, but it never jumps.
::: example
**The standard uniform distribution**
Consider a random variable $x$ that has continuous support:
$$S_x = [0, 1]$$
and CDF:
$$F_x(a) = \Pr(x \leq a) = \begin{cases} 0 & a < 0 \\ a & a \in [0,1] \\1 & a > 1 \\ \end{cases}$$
This particular probability distribution is called the
***standard uniform distribution*** and will be discussed in more detail later.
```{r StdUniformCDF, fig.cap = "*CDF for the standard uniform distribution*"}
UniformDist <- tibble(a=seq(from=-2,to=2,length.out=100),
Fa=punif(seq(from=-2,to=2,length.out=100)),
fa=dunif(seq(from=-2,to=2,length.out=100)))
ggplot(data=UniformDist,mapping=aes(x=a,y=Fa)) +
geom_line(col = "blue") +
geom_text(x=1,y=0.6,col="blue",label="F_x(a)") +
xlab("a") +
ylab("F(a)") +
labs(title = "Cumulative distribution function (CDF)",
subtitle = "Standard uniform",
caption = "",
tag = "")
```
Figure \@ref(fig:StdUniformCDF) shows the CDF of the standard uniform
distribution. As you can see, this CDF is smoothly increasing over the support
between zero and one, and is flat everywhere else.
:::
Section \@ref(the-cdf) describes the properties of a CDF, and these properties
apply to continuous random variables too. In addition, interval probabilities
are easier to calculate for continuous random variables: the probability of any
specific value is zero, so it does not matter whether inequalities are strict
($<$) or weak ($\leq$).
::: example
**Interval probabilities for the standard uniform**
Suppose that $x$ has the standard uniform distribution. What is the probability
that $x$ is *strictly* between 0.65 and 0.70?
We can use our usual formula for interval probabilities to get:
\begin{align*}
\Pr(0.65 < x < 0.70) &= \underbrace{\Pr(0.65 < x \leq 0.70)}_{=F_x(0.70)-F_x(0.65)}
- \underbrace{\Pr(x = 0.70)}_{=0} \\
&= 0.70 - 0.65 + 0 \\
&= 0.05
\end{align*}
So a standard uniform random variable has a 5\% chance of being between 0.65 and
0.70.
:::
### The continuous PDF {#continuous-pdf}
While the CDF has the same definition whether the random variable is discrete or
continuous, the same does not hold for the PDF.
- In the discrete case, the PDF $f_x(a)$ is defined as the size of the "jump" in
the CDF at $a$, or (equivalently) the probability $\Pr(x=a)$ of observing that
particular value.
- In the continuous case, there are no jumps, and the probability of observing
any specific value is always zero. So a PDF based on $\Pr(x=a)$ would be
useless in describing the probability distribution of a continuous random
variable.
Instead, the ***PDF of a continuous random variable*** $x$ is defined as as the
slope or derivative of the CDF:
$$f_x(a) = \frac{d F_x(a)}{da}$$
In other words, instead of the *amount* the CDF increases (jumps) at $a$, it is
the *rate* at which it (smoothly) increases.
::: example
**The PDF of the standard uniform distribution**
The PDF of a standard uniform random variable is:
$$f_x(a) = \begin{cases} 0 & a < 0 \\ 1 & a \in [0,1] \\ 0 & a > 1 \\ \end{cases}$$
which looks like this:
```{r StdUniformPDF, fig.cap = "*PDF for the standard uniform distribution*"}
ggplot(data=UniformDist,mapping=aes(x=a,y=fa)) +
geom_step(col = "blue") +
geom_text(x=1.25,y=0.8,col="blue",label="f_x(a)") +
xlab("a") +
ylab("f(a)") +
labs(title = "Probability density function (PDF)",
subtitle = "Standard uniform",
caption = "",
tag = "")
```
:::
The PDF of a continuous random variable is a good way to visualize its
probability distribution, and this is about the only way we will use
the continuous PDF in this class (since everything else requires integration).
::: example
**Interpreting the standard uniform PDF**
The standard uniform PDF shows the key feature of this distribution: in some
loose sense, all values in the support are "equally likely", much like in the
[discrete uniform distribution](#discrete-uniform) described earlier. In fact,
if you round a uniform random variable, you get a discrete uniform random
variable.
:::
Like the discrete PDF, the continuous PDF is always non-negative:
\begin{equation*}
f_x(a) \geq 0 \qquad \textrm{for all $a \in \mathbb{R}$}
\end{equation*}
and is strictly positive on the support:
\begin{equation*}
f_x(a) > 0 \qquad \textrm{for all $a \in S_x$}
\end{equation*}
But unlike the discrete PDF, the continuous PDF is *not* a probability. In
particular, it can be greater than one.
::: {.fyi data-latex=""}
**Additional properties of the continuous PDF**
If you recall, we can calculate probabilities from the discrete PDF by addition.
We can use this property to derive the CDF and show that the discrete PDF sums
to one.
Similarly, we can calculate probabilities from the continuous PDF by
integrating:
$$\Pr(a < x < b) = \int_a^b f_x(v)dv$$
which implies that the CDF can be derived from the PDF:
$$F_x(a) = \int_{-\infty}^a f_x(v)dv$$
and that the PDF integrates to one:
$$\int_{-\infty}^{\infty} f_x(v)dv = 1$$
Unless you have taken a course in integral calculus, you may have no idea what
these formulas mean or how to solve them. That's OK! All you need to know is
that they *can* be solved.
:::
### Quantiles {#continuous-quantiles}
The quantiles of a random variable have the same
[definition, interpretation, and properties](#quantiles-and-percentiles)
whether the random variable is continuous or discrete. The same applies to
percentiles and the median since they are also quantiles. Quantiles are
usually easier to calculate for continuous random variables.
::: example
**Quantiles for the standard uniform**
Suppose that $x$ has the standard uniform distribution. The $q$ quantile of $x$
is:
\begin{align}
F_x^{-1}(q) &= \min \{a \in S_x : F_x(a) \geq q\} \\
&= \min \{a \in [0,1] : a \geq q\} \\
&= \min [q,1] \\
&= q
\end{align}
For example, the median of $x$ is 0.5, the 10th percentile is 0.10, the 75th
percentile is 0.75, etc.
:::
### Expected values {#continuous-expected-values}
The expected value also has the same [interpretation](#the-expected-value) and
[properties](#properties-of-the-expected-value) whether the random variable
is continuous or discrete. The definition is slightly different, and includes
an integral.
::: {.fyi data-latex=""}
**The expected value for a continuous random variable**
When $x$ is continuous, its expected value is defined as:
$$E(x) = \int_{-\infty}^{\infty} af_x(a)da$$
Notice that this looks just like the definition for the discrete case, but with
the sum replaced by an integral sign.
:::
The variance and standard deviation are both defined in terms of expected
values, so they also have the same
[interpretation and properties](#variance-and-standard-deviation)
whether the random variable is continuous or discrete.
## The uniform distribution {#uniform-and-standard-uniform}
The ***uniform*** distribution is a continuous probability distribution that is
usually written:
$$x \sim U(L,H)$$
where $L$ and $H$ are numbers such that $L < H$.
The $U(0,1)$ distribution is also known as the ***standard uniform***
distribution.
### The uniform PDF
The uniform distribution has continuous support:
$$S_x = [L,H]$$
and continuous PDF:
$$f_x(a) = \begin{cases}\frac{1}{H-L} & a \in S_x \\ 0 & \textrm{otherwise} \\ \end{cases}$$
The uniform distribution can be interpreted as placing equal probability on all
values between $L$ and $H$.
::: example
**The PDF of the $U(2,5)$ distribution**
Suppose that $x \sim U(2,5)$.
Its support is the range of all values from 2 to 5, and its PDF looks like this:
```{r UniformPDF, fig.cap = "*PDF for the U(2,5) distribution*"}
UniformDist <- tibble(a=seq(from=-6,to=6,length.out=100),
Fa=punif(seq(from=-6,to=6,length.out=100),min=2,max=5),
fa=dunif(seq(from=-6,to=6,length.out=100),min=2,max=5))
ggplot(data=UniformDist,mapping=aes(x=a,y=fa)) +
geom_step(col = "blue") +
geom_text(x=1.25,y=0.8,col="blue",label="f_x(a)") +
xlab("a") +
ylab("f(a)") +
labs(title = "Probability density function (PDF)",
subtitle = "U(2,5)",
caption = "",
tag = "")
```
:::
### The uniform CDF
The CDF of the $U(L,H)$ distribution is:
$$F_x(a) = \begin{cases}
0 & a \leq L \\
\frac{a-L}{H-L} & L < a < H \\
1 & a \geq H \\
\end{cases}$$
::: example
**The CDF of the $U(2,5)$ distribution**
If $x \sim U(2,5)$, its CDF looks like this:
```{r UniformCDF, fig.cap = "*CDF for the U(2,5) distribution*"}
UniformDist <- tibble(a=seq(from=-6,to=6,length.out=100),
Fa=punif(seq(from=-6,to=6,length.out=100),min=2,max=5),
fa=dunif(seq(from=-6,to=6,length.out=100),min=2,max=5))
ggplot(data=UniformDist,mapping=aes(x=a,y=Fa)) +
geom_line(col = "blue") +
geom_text(x=3,y=0.6,col="blue",label="F_x(a)") +
xlab("a") +
ylab("F(a)") +
labs(title = "Cumulative distribution function (CDF)",
subtitle = "U(2,5)",
caption = "",
tag = "")
```
:::
### Quantiles {#uniform-quantiles}
Like any other random variable, we can calculate the quantiles of a uniform
random variable by inverting the CDF. That is:
$$F_x^{-1}(q) = L + q(H-L)$$
is the $q$ quantile of a $U(L,H)$ random variable.
The median of $x \sim U(L,H)$ is:
$$Med(x) = F_x^{-1}(0.5) = 0.5(L+H)$$
i.e., the midpoint of the support.
### Expected values {#uniform-mean-and-variance}
Integral calculus is required to calculate the mean, variance and standard
deviation of the uniform distribution, so I report them below for reference:
\begin{align*}
E(x) &= 0.5(L+H) \\
var(x) &= \frac{(H-L)^2}{12} \\
sd(x) &= \sqrt{\frac{(H-L)^2}{12}}
\end{align*}
This is one advantage of using standard distributions: you can look up results
when they are difficult to calculate.
### Functions of a uniform {#uniform-functions}
Any linear function of a uniform random variable also has a uniform
distribution. That is, if $x \sim U(L,H)$ and $y = a + bx$ where^[if $b <0$,
then $y \sim U(a + bH, a + bL)$.] $b > 0$, then:
$$y \sim U(a + bL, a + bH)$$
Nonlinear functions of a uniform random variable are generally not uniform.
::: {.fyi data-latex=""}
**Uniform distributions in video games**
Uniform distributions are important in many computer applications including
video games. Games need to be at least somewhat unpredictable in order to stay
interesting.
It is easy for a computer to generate a random number from the $U(0,1)$
distribution, and that distribution has the unusual feature that its $q$
quantile is equal to $q$.
As a result, you can generate a random variable with any probability
distribution you like by following these steps:
1. Let $F_{w}(\cdot)$ be the CDF of the distribution you want.
2. Generate a random variable $q \sim U(0,1)$.
3. Calculate $x = F_{w}^{-1}(q)$, where $F_{w}^{-1}(\cdot)$ is the inverse of
$F_{w}(\cdot)$.
Then $x$ is a random variable with the CDF $F_w(\cdot)$
Any modern video game is constantly generating and transforming $U(0,1)$
random numbers to determine the behavior of non-player characters, the location
of weapons and other resources, or the results of a particular player action.
:::
## The normal distribution {#normal-and-standard-normal}
The ***normal distribution*** is typically written as:
$$ x \sim N(\mu,\sigma^2)$$
where $\mu$ and $\sigma^2 \geq 0$ are numbers.
The normal distribution is also called the ***Gaussian*** distribution, and the
$N(0,1)$ distribution is called the ***standard normal*** distribution.
::: {.fyi data-latex=""}
**The central limit theorem**
An important result called the central limit theorem implies that many "real
world" random variables tend to be normally distributed. We will discuss the
central limit theorem in much more detail later.
:::
### The normal PDF
The $N(\mu,\sigma^2)$ distribution is a continuous distribution with support
$S_x = \mathbb{R}$ and PDF:
$$f_x(a) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(a-\mu)^2}{2\sigma}}$$
The Excel function `NORM.DIST()` can be used to calculate this PDF.
The $N(\mu,\sigma^2)$ distribution is bell-shaped and symmetric around $\mu$,
with the "spread" of the distribution depending on the value of $\sigma^2$:
```{r NormalPDF, fig.cap = "*PDF for several normal distributions*"}
NormalDist <- tibble(a=seq(from=-6,to=6,length.out=100),
Fa=pnorm(seq(from=-6,to=6,length.out=100)),
fa1=dnorm(seq(from=-6,to=6,length.out=100),mean=1),
fa2=dnorm(seq(from=-6,to=6,length.out=100),sd=2),
fa=dnorm(seq(from=-6,to=6,length.out=100)))
ggplot(data=NormalDist,mapping=aes(x=a,y=fa)) +
geom_line(col = "blue") +
geom_line(aes(y=fa1),col = "red") +
geom_line(aes(y=fa2),col = "purple") +
geom_text(x=-2,y=0.3,col="blue",label="N(0,1)") +
geom_text(x=3,y=0.3,col="red",label="N(1,1)") +
geom_text(x=-4.5,y=0.05,col="purple",label="N(0,2)") +
xlab("a") +
ylab("f(a)") +
labs(title = "Probability density function (PDF)",
subtitle = "",
caption = "",
tag = "")
```
### The normal CDF
The CDF of the normal distribution can be derived by integrating the PDF. There
is no simple closed-form expression for this CDF, but it is easy to calculate
with a computer. The Excel function `NORM.DIST()` can be used to calculate this
CDF.
The normal CDF is S-shaped, running smoothly from nearly-zero to nearly-one.
```{r NormalCDF, fig.cap = "*CDF for several normal distributions*"}
NormalDist <- tibble(a=seq(from=-6,to=6,length.out=100),
Fa=pnorm(seq(from=-6,to=6,length.out=100)),
Fa1=pnorm(seq(from=-6,to=6,length.out=100),mean=1),
Fa2=pnorm(seq(from=-6,to=6,length.out=100),sd=2),
fa=dnorm(seq(from=-6,to=6,length.out=100)))
ggplot(data=NormalDist,mapping=aes(x=a,y=Fa)) +
geom_line(col = "blue") +
geom_line(aes(y=Fa1),col = "red") +
geom_line(aes(y=Fa2),col = "purple") +
geom_text(x=0,y=0.8,col="blue",label="N(0,1)") +
geom_text(x=1,y=0.3,col="red",label="N(1,1)") +
geom_text(x=-3.2,y=0.13,col="purple",label="N(0,2)") +
xlab("a") +
ylab("f(a)") +
labs(title = "Cumulative distribution function (CDF)",
subtitle = "",
caption = "",
tag = "")
```
### Quantiles {#normal-quantiles}
Quantiles of the normal distribution can be calculated using the Excel function
`NORM.INV()`.
The median of a $N(\mu,\sigma^2)$ random variable is $\mu$.
### Expected values {#normal-means}
Integral calculus is required to calculate the mean, variance and standard
deviation of the normal distribution, so I report them below for reference:
$$E(x) = \mu$$
$$var(x) = \sigma^2$$
$$sd(x) = \sigma$$
### Functions of a normal
As discussed in an earlier chapter, all random variables have the property that
$E(a +bx) = a + bE(x)$ and $var(a+bx) = b^2 var(x)$ for any constants $a$ and
$b$.
Normal random variables have an additional property: any linear function of a
normal random variable is also normal. That is, if:
$$x \sim N(\mu,\sigma^2)$$
Then for any constants $a$ and $b$:
$$a + bx \sim N(a + b\mu, b^2\sigma^2)$$
Nonlinear functions of a normal random variable are generally *not* normal.
::: {.fyi data-latex=""}
**Other distributions based on the normal**
There are many other standard distributions that are based on functions
of one or more normal random variables and are derived from the normal
distribution. For example, if you draw $k$ independent $N(0,1)$ random
variables, square them, and add them up, the distribution of that sum has a
distribution called the $\chi^2(k)$ distribution.
Other such distributions include the $F$ distribution and the $T$ distribution.
All of these standard distributions have important applications in statistical
analysis and would be covered in a more advanced course.
:::
### Standardization
We earlier defined the standardized version of a random variable $x$ as the
following linear function of $x$:
$$z = \frac{x-E(x)}{sd(x)}$$
and showed that $E(z) = 0$ and $var(z) = sd(z) = 1$.
We can standardize any random variable, but standardization is particularly
convenient for normal random variables. If $x$ has a normal distribution,
then its standardized value $z$ has the standard normal distribution:
\begin{align}
x \sim N(\mu, \sigma^2) &\Rightarrow z \sim N\left(\mu-\mu, \left(\frac{1}{\sigma}\right)^2 \sigma^2\right) \\
&\Rightarrow z \sim N(0,1)
\end{align}
The standard normal distribution is so useful that we have special symbol for
its PDF:
$$\phi(a) = \frac{1}{\sqrt{2\pi}} e^{-\frac{a^2}{2}}$$
and its CDF:
$$\Phi(a) = \int_{-\infty}^a \phi(b)db$$
$\phi$ is the lower-case Greek letter *phi*, and $\Phi$ is the upper-case *phi*.
We can take advantage of standardization to express the CDF of any normal
random variable in terms of the standard normal CDF. That is, suppose that:
$$x \sim N(\mu, \sigma^2)$$
Then we can prove that its CDF is:
\begin{align}
F_x(a) &= \Phi\left(\frac{a-\mu}{\sigma}\right)
\end{align}
The standard normal CDF is available as a built-in function in every statistical
package including Excel and R, so we can use this result to calculate the CDF
for any normally distributed random variable.
::: {.fyi data-latex=""}
**The normal and standard normal CDF**
The result that any normal random variable $x \sim N(\mu,\sigma^2)$ has CDF
$\Phi\left(\frac{a-\mu}{\sigma}\right)$ can be proved as follows.
First, define:
$$z = \frac{x - \mu}{\sigma}$$
Since $z$ is a linear function of $x$, it is also normally distributed:
$$z \sim N\left(\frac{\mu-\mu}{\sigma},
\left(\frac{1}{\sigma}\right)^2 \sigma^2\right)$$
or equivalently:
$$z \sim N(0,1)$$
Then the CDF of $x$ is:
\begin{align}
F_x(a) &= \Pr\left(x \leq a\right) \\
&= \Pr\left( \frac{x-\mu}{\sigma} \leq \frac{a-\mu}{\sigma}\right)\\
&= \Pr\left( z \leq \frac{a-\mu}{\sigma}\right) \\
&= \Phi\left(\frac{a-\mu}{\sigma}\right)
\end{align}
:::
## Multiple random variables
Almost all interesting data sets have multiple observations and multiple
variables. So before we start talking about data, we need to develop some tools
and terminology for thinking about multiple random variables.
To keep things simple, most of the definitions and examples will be stated in
terms of *two* discrete random variables. The extension to more than two random
variables is conceptually straightforward but will be skipped.
### Joint distribution
Let $x$ and $y$ be two random variables defined in terms of the same underlying
random outcome. Their ***joint probability distribution*** assigns a value to
all joint probabilities of the form:
$$\Pr(x \in A \cap y \in B)$$
for any sets $A, B \subset \mathbb{R}$.
The joint distribution is the key to talking about $x$, $y$ and how they are
related. Every concept introduced in this section - marginal distributions,
conditional distributions, expected values, covariance, correlation, and
independence - can be defined in terms of the joint distribution.
::: example
**Three joint distributions**
The scatter plots in Figure \@ref(fig:JointIsNotMarginal) below depict
simulation results for a pair of random variables $(x,y)$, with a different
joint distribution in each graph.
```{r JointIsNotMarginal, fig.cap = "*x and y are drawn from a different joint distribution in each graph.*"}
simdata <- tibble(x = rnorm(100),
y1 = rnorm(100),
y2 = (-x+0.5*y1)/sqrt(1.25),
y3 = x)
p1 <- ggplot(data = simdata, mapping = aes(x = x)) +
geom_point(aes(y=y1),col="blue") +
xlab("x") +
ylab("y")
p2 <- ggplot(data = simdata, mapping = aes(x = x)) +
geom_point(aes(y=y2),col="blue") +
xlab("x") +
ylab("y")
p3 <- ggplot(data = simdata, mapping = aes(x = x)) +
geom_point(aes(y=y3),col="blue") +
xlab("x") +
ylab("y")
library("cowplot")
plot_grid(p1,p2,p3,ncol=3,nrow=1)
```
As you can see, the relationship between the two variables differs in the three
cases.
:::
The joint distribution of any two *discrete* random random variables can be fully
described by their ***joint PDF***:
$$f_{x,y}(a,b) = \Pr(x = a \cap y = b)$$
The joint PDF can be calculated from the probability distribution of the
underlying outcome.
::: example
**The joint PDF in roulette**
In our roulette example, both $w_{red}$ and $w_{14}$ depend on the original
outcome $b$. For convenience, I have created a table below that shows every
value of $b$ in the sample space, its probability, and the associated values
of $w_{red}$ and $w_{14}$.
| $b$ | $\Pr(b)$ | Color | $w_{red}$ | $w_{14}$ |
|:----|:--------:|:-----:|:---------:|:--------:|
| 0 | 1/37 | Green | -1 | -1 |
| 1 | 1/37 | Red | 1 | -1 |
| 2 | 1/37 | Black | -1 | -1 |
| 3 | 1/37 | Red | 1 | -1 |
| 4 | 1/37 | Black | -1 | -1 |
| 5 | 1/37 | Red | 1 | -1 |
| 6 | 1/37 | Black | -1 | -1 |
| 7 | 1/37 | Red | 1 | -1 |
| 8 | 1/37 | Black | -1 | -1 |
| 9 | 1/37 | Red | 1 | -1 |
| 10 | 1/37 | Black | -1 | -1 |
| 11 | 1/37 | Black | -1 | -1 |
| 12 | 1/37 | Red | 1 | -1 |
| 13 | 1/37 | Black | -1 | -1 |
| 14 | 1/37 | Red | 1 | 35 |
| 15 | 1/37 | Black | -1 | -1 |
| 16 | 1/37 | Red | 1 | -1 |
| 17 | 1/37 | Black | -1 | -1 |
| 18 | 1/37 | Red | 1 | -1 |
| 19 | 1/37 | Red | 1 | -1 |
| 20 | 1/37 | Black | -1 | -1 |
| 21 | 1/37 | Red | 1 | -1 |
| 22 | 1/37 | Black | -1 | -1 |
| 23 | 1/37 | Red | 1 | -1 |
| 24 | 1/37 | Black | -1 | -1 |
| 25 | 1/37 | Red | 1 | -1 |
| 26 | 1/37 | Black | -1 | -1 |
| 27 | 1/37 | Red | 1 | -1 |
| 28 | 1/37 | Black | -1 | -1 |
| 29 | 1/37 | Black | -1 | -1 |
| 30 | 1/37 | Red | 1 | -1 |
| 31 | 1/37 | Black | -1 | -1 |
| 32 | 1/37 | Red | 1 | -1 |
| 33 | 1/37 | Black | -1 | -1 |
| 34 | 1/37 | Red | 1 | -1 |
| 35 | 1/37 | Black | -1 | -1 |
| 36 | 1/37 | Red | 1 | -1 |
We can construct the joint PDF by simply adding up over all possible outcomes.
There is one outcome ($b=14$) in which both red and 14 win, and it has
probability $1/37$:
\begin{align}
f_{red,14}(1,35) &= \Pr(w_{red}=1 \cap w_{14} = 35) \\
&= \Pr(b \in \{14\}) = 1/37
\end{align}
There are 17 outcomes in which red wins and 14 loses, and each has probability
$1/37$:
\begin{align}
f_{red,14}(1,-1) &= \Pr(w_{red} = 1 \cap w_{14} = -1) \\
&= \Pr\left(b \in \left\{
\begin{gathered}
1,3,5,7,9,12,16,18,19,21,\\
23,25,27,30,32,34,36
\end{gathered}\right\}\right) \\
&= 17/37
\end{align}
There are 19 outcomes in which both red and 14 lose, and each has probability
$1/37$:
\begin{align}
f_{red,14}(-1,-1) &= \Pr(w_{red} = -1 \cap w_{14} = -1) \\
&= \Pr\left(b \in \left\{
\begin{gathered}
0,2,4,6,7,10,11,13,15,17, \\
20,22,24,26,28,31,33,35
\end{gathered}\right\}\right) \\
&= 19/37
\end{align}
There are no other outcomes, so all other combinations have probability zero.
Therefore, the joint PDF of $w_{red}$ and $w_{14}$ is:
\begin{align}
f_{red,14}(a,b) &= \begin{cases}
19/37 & \textrm{if $a = -1$ and $b = -1$} \\
17/37 & \textrm{if $a = 1$ and $b = -1$} \\
1/37 & \textrm{if $a = 1$ and $b = 35$} \\
0 & \textrm{otherwise} \\
\end{cases} \nonumber
\end{align}
Creating and using this table is something of a "brute force" approach: it is
time consuming but requires little thought and will always get the right
answer. You may be able to figure out a quicker approach.
:::
::: {.fyi data-latex=""}
**Other ways of describing a joint distribution**
The joint distribution of any two (discrete or continuous) random variables can
be fully described by their joint CDF:
$$F_{x,y}(a,b) = \Pr(x \leq a \cap y \leq b)$$
Similarly, the joint distribution of any two continuous random variables can be
fully described by their (continuous) joint PDF:
$$f_{x,y}(a,b) = \frac{\partial F_{x,y}(a,b)}{\partial a \partial b}$$
:::
### Marginal distributions
When two random variables have a joint distribution, we call the probability
distribution of random variable its **marginal distribution**. Both marginal
distributions are part of the joint distribution:
\begin{align}
\Pr(x \in A) &= \Pr(x \in A \cap y \in \mathbb{R}) \\
\Pr(y \in A) &= \Pr(x \in \mathbb{R} \cap y \in A)
\end{align}
Note that there is no difference between a random variable's "marginal
distribution" and its "distribution". We just add the word "marginal" in this
context to distinguish it from the joint distribution.
The marginal distribution is fully described by the corresponding
***marginal PDF***, which can be derived from the joint PDF. Let $x$ and $y$ be
two discrete random variables with joint PDF $f_{x,y}$. Then their marginal PDFs
are:
\begin{align}
f_x(a) &= \sum_{b \in S_y} f_{x,y}(a,b) \nonumber \\
f_y(b) &= \sum_{a \in S_x} f_{x,y}(a,b) \nonumber
\end{align}
Pay close attention to where the $a$ and $b$ are located when you use these
formulas.
::: example
**Deriving marginal PDF from joint PDF**
We earlier found the joint PDF of $w_{red}$ and $w_{14}$:
\begin{align}
f_{red,14}(a,b) &= \begin{cases}
19/37 & \textrm{if $a = b = -1$} \\
17/37 & \textrm{if $a = 1$ and $b = -1$} \\
1/37 & \textrm{if $a = 1$ and $b = 35$} \\
0 & \textrm{otherwise} \\
\end{cases} \nonumber
\end{align}
Then the marginal PDF of $w_{red}$ is:
\begin{align}
f_{red}(a) &= \sum_{b \in \{-1,35\}} f_{red,14}(a,b) \\
&= f_{red,14}(a,-1) + f_{red,14}(a,35) \\
\end{align}
Plugging in values in the support of $w_{red}$ we get:
\begin{align}
f_{red}(-1) &= f_{red,14}(-1,-1) + f_{red,14}(-1,35) \\
&= 19/37 + 0 \\
&= 19/37 \\
f_{red}(1) &= f_{red,14}(1,-1) + f_{red,14}(1,35) \\
&= 17/37 + 1/37 \\
&= 18/37
\end{align}
Which we can summarize with:
\begin{align}
f_{red}(a) &= \begin{cases}
19/37 & \textrm{if $a = -1$} \\
18/37 & \textrm{if $a = 1$} \\
0 & \textrm{otherwise} \\
\end{cases}
\end{align}
Note that this is the same PDF we found in an earlier chapter.
:::
While you can always derive the two marginal distributions from the joint
distribution, you cannot derive the joint distribution from the two marginal
distributions. A given pair of marginal distributions is typically consistent
with an infinite number of joint distributions. For example, the three joint
distributions shown in Figure \@ref(fig:JointIsNotMarginal) all depict random
variables with the same marginal distribution (both $x$ and $y$ have the
standard normal distribution in all three graphs) but very different joint
distributions.
::: example
**Two joint distributions with identical marginal distributions**
Suppose Al, Betty, and Carl each place a bet on the same roulette game. Al
and Betty both bet on red, and Carl bets on black. Let $w_{Al}$,
$w_{Betty}$, and $w_{Carl}$ be their respective winnings.
All three players have the same marginal distribution of winnings:
\begin{align}
f_{Al}(a) = f_{Betty}(a) = f_{Carl}(a) &= \begin{cases}
19/37 & \textrm{if $a = -1$} \\
18/37 & \textrm{if $a = 1$} \\
0 & \textrm{otherwise} \\
\end{cases}
\end{align}
since both red and black have a 18/37 chance of winning.
But the joint distribution of $w_{Al}$ and $w_{Betty}$:
\begin{align}
f_{Al,Betty}(a,b) &= \begin{cases}
19/37 & \textrm{if $a = -1$ and $b = -1$} \\
18/37 & \textrm{if $a = 1$ and $b = 1$} \\
0 & \textrm{otherwise} \\
\end{cases}
\end{align}
is very different from the joint distribution of $w_{Al}$ and $w_{Carl}$:
\begin{align}
f_{Al,Carl}(a,b) &= \begin{cases}
1/37 & \textrm{if $a = -1$ and $b = -1$} \\
18/37 & \textrm{if $a = -1$ and $b = 1$} \\
18/37 & \textrm{if $a = -1$ and $b = 1$} \\
0 & \textrm{otherwise} \\
\end{cases}
\end{align}
For example, Betty always *wins* when Al wins, but Carl always *loses* when Al
wins.
:::
We will soon develop several useful ways of describing the relationship between
two random variables including conditional distribution, covariance,
correlation, and independence.
::: {.fyi data-latex=""}
**Other ways of deriving a marginal distribution**
The marginal CDFs of any two random (discrete or continuous) random variables
can be derived from their joint CDF:
\begin{align}
F_x(a) &= \lim_{b \rightarrow \infty} F_{x,y}(a,b) \\
F_y(b) &= \lim_{a \rightarrow \infty} F_{x,y}(a,b)
\end{align}
and the marginal PDFs of any two continuous random variables can be derived
from their joint PDF:
\begin{align}
f_x(a) &= \int_{-\infty}^{\infty} f_{x,y}(a,b) db \\
f_y(b) &= \int_{-\infty}^{\infty} f_{x,y}(a,b) da
\end{align}
:::
### Conditional distribution
The ***conditional distribution*** of a random variable $y$ given another random
variable $x$ assigns values to all conditional probabilities of the form:
$$\Pr(y \in A| x \in B) = \frac{\Pr(y \in A \cap x \in B)}{\Pr(x \in B)}$$
Since a conditional probability is just the ratio of the joint probability
to the marginal probability, the conditional distribution can always be derived
from the joint distribution.
The conditional distributions of any two discrete random variables $x$ and $y$
can be fully described by the ***conditional PDF***:
\begin{align}
f_{x|y}(a,b) &= \Pr(x=a|y=b) \\
&= \frac{f_{x,y}(a,b)}{f_y(b)} \\
f_{y|x}(a,b) &= \Pr(y=a|x=b) \\
&= \frac{f_{x,y}(b,a)}{f_x(b)}
\end{align}
Pay close attention to where the $a$ and $b$ are located when you use these
formulas.
::: example
**Conditional PDFs in roulette**
The conditional PDF of the payout for a bet on red given the payout for a bet
on 14 is defined as:
\begin{align}
f_{red|14}(a,b) &= \Pr(w_{red} = a| w_{14} = b) \\
&= \frac{f_{red,14}(a,b)}{f_{14}(b)} \\
&= \begin{cases}
(19/37)/(36/37) & \textrm{if $a = -1$ and $b = -1$} \\
(17/37)/(36/37) & \textrm{if $a = 1$ and $b = -1$} \\
(1/37)/(1/37) & \textrm{if $a = 1$ and $b = 35$} \\
0 & \textrm{otherwise} \\
\end{cases} \\
&= \begin{cases}
19/36 & \textrm{if $a = -1$ and $b = -1$} \\
17/36 & \textrm{if $a = 1$ and $b = -1$} \\
1 & \textrm{if $a = 1$ and $b = 35$} \\
0 & \textrm{otherwise} \\
\end{cases}
\end{align}
The conditional PDF of the payout for a bet on 14 given the payout for a bet on
red is defined as:
\begin{align}
f_{14|red}(a,b) &= \Pr(w_{14} = a| w_{red} = b) \\
&= \frac{f_{red,14}(b,a)}{f_{red}(b)} \\
&= \begin{cases}
(19/37)/(19/37) & \textrm{if $a = -1$ and $b = -1$} \\
(17/37)/(18/37) & \textrm{if $a = -1$ and $b = 1$} \\
(1/37)/(18/37) & \textrm{if $a = 35$ and $b = 1$} \\
0 & \textrm{otherwise} \\
\end{cases} \\
&= \begin{cases}
1 & \textrm{if $a = -1$ and $b = -1$} \\
17/18 & \textrm{if $a = -1$ and $b = 1$} \\
1/18 & \textrm{if $a = 35$ and $b = 1$} \\
0 & \textrm{otherwise} \\
\end{cases}
\end{align}
:::
As we said earlier that you cannot derive the joint distribution from the two
marginal distributions. However, you can derive it by combining a conditional
distribution with the corresponding marginal distribution. For example:
\begin{align}
\underbrace{\Pr(x \in A \cap y \in B)}_{\textrm{joint}} &=
\underbrace{\Pr(x \in A | y \in B)}_{\textrm{conditional}}
\underbrace{\Pr(y \in B)}_{\textrm{marginal}}
\end{align}
A similar result applies to joint, conditional and marginal PDFs.
::: {.fyi data-latex=""}
**Other ways of deriving a conditional distribution**