06-More-on-random-variables.Rmd

# (PART\*) Statistical Theory {-}

# More on random variables {#more-on-random-variables}

```{r setup6, include=FALSE}
knitr::opts_chunk$set(echo = FALSE,
                      prompt = FALSE,
                      tidy = TRUE,
                      collapse = TRUE)
library("tidyverse")
```

In our first few chapters, we developed the fundamental tools for understanding
statistics: the practical skills of cleaning data and calculating statistics,
and the basic theoretical concepts for thinking about random events and random
variables. Over the next few chapters, we will connect these tools by
conceptualizing data sets and the statistics calculated as random variables.
This connection is what makes statistics a tool for *science* and not just a
set of calculation procedures. The first step in building that connection is
to extend our theory of random variables from a 
[single discrete random variable](#random-variables) to a wider range of
possibilities.

This chapter extends our theory to both continuous random variables and to
pairs or groups of random variables.

::: {.goals data-latex=""}
**Chapter goals**

In this chapter, we will learn how to:

1.  Interpret the CDF and PDF of a continuous random variable.
2.  Know and use the key properties of the uniform distribution.
3.  Derive the distribution for a linear function of a uniform random variable.
4.  Know and use the key properties of the normal distribution.
5.  Derive the distribution for a linear function of a normal random variable.
6.  Calculate the joint PDF of two discrete random variables from the
    probability distribution of a random outcome.
7.  Calculate a marginal PDF from a (discrete) joint PDF.
8.  Calculate a conditional PDF from a (discrete) joint PDF.
9.  Interpret joint, marginal, and conditional distributions.
11. Determine whether two random variables are independent.
12. Calculate the covariance of two discrete random variables from their joint
    PDF.
13. Calculate the covariance of two random variables using the expected value
    formula.
14. Calculate the correlation of two random variables from their covariance.
15. Calculate the covariance and correlation of two independent random
    variables.
16. Calculate the expected value of a linear function of two or more random
    variables.
17. Interpret covariances and correlations.
:::

To prepare for this chapter, please review the introductory chapter on
[random variables](#random-variables).

## Continuous random variables {#continuous-random-variables}

Many random variables of interest have a continuous support.  That is, they
can take on any value in a particular range.  Examples of such variables
include:

- Physical quantities such as distance, mass, volume, or temperature.
- Time values such as the current time or the time it takes to drive to school
  from your home.

Because continuous random variables can take on any value in a particular
range, the chance that they take on any specific value is very low (in fact,
it is zero). This makes the math for continuous random variables a little
harder, which is why we started with discrete random variables.

::: example
**Labour force participation**

The labour force participation rate is defined as:
  $$(\textrm{LFP rate}) = \frac{(\textrm{labour force})}{(\textrm{population})} \times 100\%$$ 
It can be any number between 0\% and 100\%:
  $$S_{LFP rate} = [0\%,100\%]$$
so it is a continuous random variable.
:::

### General properties

We can describe the general properties of a continuous random variable by
comparing them to the properties of a discrete random variable.

We learned in an earlier chapter that the support of a discrete random variable
typically includes a *finite* number of values, each of which has strictly
*positive* probability, and most formulas for probabilities (including PDFs and
CDFs) and expected values use just *addition and subtraction*.

In contrast, the support of a continuous random variable includes an *infinite*
number of values, each of which has *zero* probability, and most formulas for
probabilities and expected values use *calculus*.

::: {.sfu data-latex=""}
**ECON 233 calculus prerequisites**

Differential calculus (MATH 151 or 157) is a prerequisite for ECON 233, but
integral calculus (MATH 152 or 158) is not. I will not assume you know how to
interpret or calculate an integral, and will not require you to do so in any
assigned or graded work in ECON 233.
:::

But deep down, there is no important *practical* difference between continuous
and discrete random variables. The intuition for this is that you can closely
approximate any continuous random variable by rounding it. The rounded variable
will be discrete, and our earlier results for discrete random variables apply.
With only a few exceptions, everything that is true for discrete random
variables is also true for continuous ones.

::: example
**Rounding a continuous variable to make it discrete**

Suppose you round the labour force participation rate to the nearest percentage
point.  The rounded LFP rate is a discrete random variable with support:
  $$S_x = \{0\%, 1\%, \ldots 99\%, 100\% \}$$
Alternatively, we could round to the nearest 1/100th of a percentage point,
to the nearest 1/1,000,000th of a percentage point, etc. As we round to a
higher and higher precision, the approximation gets closer and closer.
:::

As a result, our coverage of continuous random variables will be brief and will
mostly avoid calculus.

::: {.fyi data-latex=""}
**Formulas using integrals**

When a relevant mathematical formula uses integrals, I will put it in an "FYI"
box like this one. This means I am providing the formula to show you that it
exists, but do not expect you to understand, remember, or perform any
calculation using the formula.

If you *do* know some integral calculus, you might notice that the formulas for
continuous random variables look just like the ones for discrete random
variables, but with sums replaced by integrals.  This should not be surprising
since an integral *is* a sum, or at least the limit of a sequence of sums.
:::

### The continuous CDF {#continuous-pdf-and-cdf}

The CDF of a continuous random variable $x$ is defined exactly the same way as
for the discrete case:
  $$F_x(a) = \Pr(x \leq a)$$
The only difference is how it looks. If you recall, the CDF of a discrete random
variable takes on a stair-step form: increasing in discrete jumps at every point
in the discrete support, and flat everywhere else. In contrast, the CDF of a
continuous random variable increases smoothly over its support.  It can have
flat parts, but it never jumps.

::: example
**The standard uniform distribution**

Consider a random variable $x$ that has continuous support:
$$S_x = [0, 1]$$
and CDF:
  $$F_x(a) = \Pr(x \leq a) = \begin{cases} 0 & a < 0 \\ a & a \in [0,1] \\1 & a > 1 \\ \end{cases}$$
This particular probability distribution is called the 
***standard uniform distribution*** and will be discussed in more detail later.

```{r StdUniformCDF, fig.cap = "*CDF for the standard uniform distribution*"}
UniformDist <- tibble(a=seq(from=-2,to=2,length.out=100),
                      Fa=punif(seq(from=-2,to=2,length.out=100)),
                      fa=dunif(seq(from=-2,to=2,length.out=100)))
ggplot(data=UniformDist,mapping=aes(x=a,y=Fa)) +
  geom_line(col = "blue") +
  geom_text(x=1,y=0.6,col="blue",label="F_x(a)") +
  xlab("a") +
  ylab("F(a)") +
  labs(title = "Cumulative distribution function (CDF)", 
       subtitle = "Standard uniform", 
       caption = "", 
       tag = "")
```

Figure \@ref(fig:StdUniformCDF) shows the CDF of the standard uniform
distribution.  As you can see, this CDF is smoothly increasing over the support
between zero and one, and is flat everywhere else.
:::

Section \@ref(the-cdf) describes the properties of a CDF, and these properties
apply to continuous random variables too. In addition, interval probabilities
are easier to calculate for continuous random variables: the probability of any
specific value is zero, so it does not matter whether inequalities are strict
($<$) or weak ($\leq$).

::: example
**Interval probabilities for the standard uniform**

Suppose that $x$ has the standard uniform distribution. What is the probability
that $x$ is *strictly* between 0.65 and 0.70?

We can use our usual formula for interval probabilities to get:
\begin{align*}
  \Pr(0.65 < x < 0.70) &= \underbrace{\Pr(0.65 < x \leq 0.70)}_{=F_x(0.70)-F_x(0.65)}
    - \underbrace{\Pr(x = 0.70)}_{=0} \\
    &= 0.70 - 0.65 + 0 \\
    &= 0.05
\end{align*}
So a standard uniform random variable has a 5\% chance of being between 0.65 and
0.70.
:::

### The continuous PDF {#continuous-pdf}

While the CDF has the same definition whether the random variable is discrete or
continuous, the same does not hold for the PDF.

- In the discrete case, the PDF $f_x(a)$ is defined as the size of the "jump" in
  the CDF at $a$, or (equivalently) the probability $\Pr(x=a)$ of observing that
  particular value.
- In the continuous case, there are no jumps, and the probability of observing
  any specific value is always zero. So a PDF based on $\Pr(x=a)$ would be
  useless in describing the probability distribution of a continuous random
  variable.

Instead, the ***PDF of a continuous random variable*** $x$ is defined as as the
slope or derivative of the CDF:
  $$f_x(a) = \frac{d F_x(a)}{da}$$
In other words, instead of the *amount* the CDF increases (jumps) at $a$, it is
the *rate* at which it (smoothly) increases.

::: example
**The PDF of the standard uniform distribution**

The PDF of a standard uniform random variable is:
  $$f_x(a) = \begin{cases} 0 & a < 0 \\ 1 & a \in [0,1] \\ 0 & a > 1 \\ \end{cases}$$
which looks like this:  
```{r StdUniformPDF, fig.cap = "*PDF for the standard uniform distribution*"}
ggplot(data=UniformDist,mapping=aes(x=a,y=fa)) +
  geom_step(col = "blue") +
  geom_text(x=1.25,y=0.8,col="blue",label="f_x(a)") +
  xlab("a") +
  ylab("f(a)") +
  labs(title = "Probability density function (PDF)", 
       subtitle = "Standard uniform", 
       caption = "", 
       tag = "")
```
:::

The PDF of a continuous random variable is a good way to visualize its
probability distribution, and this is about the only way we will use
the continuous PDF in this class (since everything else requires integration).

::: example
**Interpreting the standard uniform PDF**

The standard uniform PDF shows the key feature of this distribution: in some
loose sense, all values in the support are "equally likely", much like in the
[discrete uniform distribution](#discrete-uniform) described earlier. In fact,
if you round a uniform random variable, you get a discrete uniform random
variable.
:::

Like the discrete PDF, the continuous PDF is always non-negative:
 \begin{equation*}
   f_x(a) \geq 0 \qquad \textrm{for all $a \in \mathbb{R}$}
 \end{equation*}
and is strictly positive on the support:
 \begin{equation*}
   f_x(a) > 0 \qquad \textrm{for all $a \in S_x$}
 \end{equation*}
But unlike the discrete PDF, the continuous PDF is *not* a probability. In
particular, it can be greater than one.

::: {.fyi data-latex=""}
**Additional properties of the continuous PDF**

If you recall, we can calculate probabilities from the discrete PDF by addition.
We can use this property to derive the CDF and show that the discrete PDF sums
to one.

Similarly, we can calculate probabilities from the continuous PDF by
integrating:
  $$\Pr(a < x < b) = \int_a^b f_x(v)dv$$
which implies that the CDF can be derived from the PDF:
  $$F_x(a) = \int_{-\infty}^a f_x(v)dv$$
and that the PDF integrates to one:
  $$\int_{-\infty}^{\infty} f_x(v)dv = 1$$
Unless you have taken a course in integral calculus, you may have no idea what
these formulas mean or how to solve them.  That's OK! All you need to know is
that they *can* be solved.
:::

### Quantiles {#continuous-quantiles}

The quantiles of a random variable have the same
[definition, interpretation, and properties](#quantiles-and-percentiles)
whether the random variable is continuous or discrete. The same applies to
percentiles and the median since they are also quantiles. Quantiles are
usually easier to calculate for continuous random variables.

::: example
**Quantiles for the standard uniform**

Suppose that $x$ has the standard uniform distribution. The $q$ quantile of $x$
is:
\begin{align}
  F_x^{-1}(q) &= \min \{a \in S_x : F_x(a) \geq q\} \\
    &= \min \{a \in [0,1] : a \geq q\} \\
    &= \min [q,1] \\
    &= q
\end{align}
For example, the median of $x$ is 0.5, the 10th percentile is 0.10, the 75th
percentile is 0.75, etc.
:::

### Expected values {#continuous-expected-values}

The expected value also has the same [interpretation](#the-expected-value) and
[properties](#properties-of-the-expected-value) whether the random variable
is continuous or discrete. The definition is slightly different, and includes
an integral.

::: {.fyi data-latex=""}
**The expected value for a continuous random variable**

When $x$ is continuous, its expected value is defined as:
  $$E(x) = \int_{-\infty}^{\infty} af_x(a)da$$
Notice that this looks just like the definition for the discrete case, but with
the sum replaced by an integral sign.
:::

The variance and standard deviation are both defined in terms of expected
values, so they also have the same
[interpretation and properties](#variance-and-standard-deviation)
whether the random variable is continuous or discrete.

## The uniform distribution {#uniform-and-standard-uniform}

The ***uniform*** distribution is a continuous probability distribution that is
usually written:
  $$x \sim U(L,H)$$
where $L$ and $H$ are numbers such that $L < H$.

The $U(0,1)$ distribution is also known as the ***standard uniform***
distribution.

### The uniform PDF

The uniform distribution has continuous support:
  $$S_x = [L,H]$$
and continuous PDF:
  $$f_x(a) = \begin{cases}\frac{1}{H-L} & a \in S_x \\ 0 & \textrm{otherwise} \\ \end{cases}$$
The uniform distribution can be interpreted as placing equal probability on all
values between $L$ and $H$.

::: example
**The PDF of the $U(2,5)$ distribution**

Suppose that $x \sim U(2,5)$.

Its support is the range of all values from 2 to 5, and its PDF looks like this:
```{r UniformPDF, fig.cap = "*PDF for the U(2,5) distribution*"}
UniformDist <- tibble(a=seq(from=-6,to=6,length.out=100),
                      Fa=punif(seq(from=-6,to=6,length.out=100),min=2,max=5),
                      fa=dunif(seq(from=-6,to=6,length.out=100),min=2,max=5))
ggplot(data=UniformDist,mapping=aes(x=a,y=fa)) +
  geom_step(col = "blue") +
  geom_text(x=1.25,y=0.8,col="blue",label="f_x(a)") +
  xlab("a") +
  ylab("f(a)") +
  labs(title = "Probability density function (PDF)", 
       subtitle = "U(2,5)", 
       caption = "", 
       tag = "")
```
:::

### The uniform CDF

The CDF of the $U(L,H)$ distribution is:
  $$F_x(a) = \begin{cases}
    0 & a \leq L \\ 
    \frac{a-L}{H-L} & L < a < H \\ 
    1 & a \geq H \\ 
    \end{cases}$$

::: example
**The CDF of the $U(2,5)$ distribution**

If $x \sim U(2,5)$, its CDF looks like this:  
```{r UniformCDF, fig.cap = "*CDF for the U(2,5) distribution*"}
UniformDist <- tibble(a=seq(from=-6,to=6,length.out=100),
                      Fa=punif(seq(from=-6,to=6,length.out=100),min=2,max=5),
                      fa=dunif(seq(from=-6,to=6,length.out=100),min=2,max=5))
ggplot(data=UniformDist,mapping=aes(x=a,y=Fa)) +
  geom_line(col = "blue") +
  geom_text(x=3,y=0.6,col="blue",label="F_x(a)") +
  xlab("a") +
  ylab("F(a)") +
  labs(title = "Cumulative distribution function (CDF)", 
       subtitle = "U(2,5)", 
       caption = "", 
       tag = "")
```
:::

### Quantiles {#uniform-quantiles}

Like any other random variable, we can calculate the quantiles of a uniform
random variable by inverting the CDF. That is:
  $$F_x^{-1}(q) = L + q(H-L)$$
is the $q$ quantile of a $U(L,H)$ random variable.

The median of $x \sim U(L,H)$ is:
  $$Med(x) = F_x^{-1}(0.5) = 0.5(L+H)$$
i.e., the midpoint of the support.

### Expected values {#uniform-mean-and-variance}

Integral calculus is required to calculate the mean, variance and standard
deviation of the uniform distribution, so I report them below for reference:
\begin{align*}
  E(x)   &= 0.5(L+H) \\
  var(x) &= \frac{(H-L)^2}{12} \\
  sd(x)  &= \sqrt{\frac{(H-L)^2}{12}}
\end{align*}
This is one advantage of using standard distributions: you can look up results
when they are difficult to calculate.

### Functions of a uniform {#uniform-functions}

Any linear function of a uniform random variable also has a uniform
distribution. That is, if $x \sim U(L,H)$ and $y = a + bx$ where^[if $b <0$,
then $y \sim U(a + bH, a + bL)$.] $b > 0$, then:
  $$y \sim U(a + bL, a + bH)$$
Nonlinear functions of a uniform random variable are generally not uniform.

::: {.fyi data-latex=""}
**Uniform distributions in video games**

Uniform distributions are important in many computer applications including
video games. Games need to be at least somewhat unpredictable in order to stay
interesting.

It is easy for a computer to generate a random number from the  $U(0,1)$
distribution, and that distribution has the unusual feature that its $q$
quantile is equal to $q$.

As a result, you can generate a random variable with any probability
distribution you like by following these steps:

1. Let $F_{w}(\cdot)$ be the CDF of the distribution you want.
2. Generate a random variable $q \sim U(0,1)$.
3. Calculate $x = F_{w}^{-1}(q)$, where $F_{w}^{-1}(\cdot)$ is the inverse of
   $F_{w}(\cdot)$.
   
Then $x$ is a random variable with the CDF $F_w(\cdot)$

Any modern video game is constantly generating and transforming $U(0,1)$
random numbers to determine the behavior of non-player characters, the location
of weapons and other resources, or the results of a particular player action.
:::

## The normal distribution {#normal-and-standard-normal}

The ***normal distribution*** is typically written as:
  $$ x \sim N(\mu,\sigma^2)$$ 
where $\mu$ and $\sigma^2 \geq 0$ are numbers.

The normal distribution is also called the ***Gaussian*** distribution, and the
$N(0,1)$ distribution is called the ***standard normal*** distribution.

::: {.fyi data-latex=""}
**The central limit theorem**

An important result called the central limit theorem implies that many "real
world" random variables tend to be normally distributed. We will discuss the
central limit theorem in much more detail later.
:::

### The normal PDF

The $N(\mu,\sigma^2)$ distribution is a continuous distribution with support 
$S_x = \mathbb{R}$ and PDF:
$$f_x(a) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(a-\mu)^2}{2\sigma}}$$
The Excel function `NORM.DIST()` can be used to calculate this PDF.

The $N(\mu,\sigma^2)$ distribution is bell-shaped and symmetric around $\mu$,
with the "spread" of the distribution depending on the value of $\sigma^2$:  
```{r NormalPDF, fig.cap = "*PDF for several normal distributions*"}
NormalDist <- tibble(a=seq(from=-6,to=6,length.out=100),
                      Fa=pnorm(seq(from=-6,to=6,length.out=100)),
                      fa1=dnorm(seq(from=-6,to=6,length.out=100),mean=1),
                      fa2=dnorm(seq(from=-6,to=6,length.out=100),sd=2),
                      fa=dnorm(seq(from=-6,to=6,length.out=100)))
ggplot(data=NormalDist,mapping=aes(x=a,y=fa)) +
  geom_line(col = "blue") +
  geom_line(aes(y=fa1),col = "red") +
  geom_line(aes(y=fa2),col = "purple") +
  geom_text(x=-2,y=0.3,col="blue",label="N(0,1)") +
  geom_text(x=3,y=0.3,col="red",label="N(1,1)") +
  geom_text(x=-4.5,y=0.05,col="purple",label="N(0,2)") +
  xlab("a") +
  ylab("f(a)") +
  labs(title = "Probability density function (PDF)", 
       subtitle = "", 
       caption = "", 
       tag = "")
```

### The normal CDF

The CDF of the normal distribution can be derived by integrating the PDF. There
is no simple closed-form expression for this CDF, but it is easy to calculate 
with a computer.  The Excel function `NORM.DIST()` can be used to calculate this
CDF.

The normal CDF is S-shaped, running smoothly from nearly-zero to nearly-one.

```{r NormalCDF, fig.cap = "*CDF for several normal distributions*"}
NormalDist <- tibble(a=seq(from=-6,to=6,length.out=100),
                      Fa=pnorm(seq(from=-6,to=6,length.out=100)),
                      Fa1=pnorm(seq(from=-6,to=6,length.out=100),mean=1),
                      Fa2=pnorm(seq(from=-6,to=6,length.out=100),sd=2),
                      fa=dnorm(seq(from=-6,to=6,length.out=100)))
ggplot(data=NormalDist,mapping=aes(x=a,y=Fa)) +
  geom_line(col = "blue") +
  geom_line(aes(y=Fa1),col = "red") +
  geom_line(aes(y=Fa2),col = "purple") +
  geom_text(x=0,y=0.8,col="blue",label="N(0,1)") +
  geom_text(x=1,y=0.3,col="red",label="N(1,1)") +
  geom_text(x=-3.2,y=0.13,col="purple",label="N(0,2)") +
  xlab("a") +
  ylab("f(a)") +
  labs(title = "Cumulative distribution function (CDF)", 
       subtitle = "", 
       caption = "", 
       tag = "")
```

### Quantiles {#normal-quantiles}

Quantiles of the normal distribution can be calculated using the Excel function
`NORM.INV()`.

The median of a $N(\mu,\sigma^2)$ random variable is $\mu$.

### Expected values {#normal-means}

Integral calculus is required to calculate the mean, variance and standard
deviation of the normal distribution, so I report them below for reference:
  $$E(x) = \mu$$
  $$var(x) = \sigma^2$$
  $$sd(x) = \sigma$$


### Functions of a normal

As discussed in an earlier chapter, all random variables have the property that
$E(a +bx) = a + bE(x)$ and $var(a+bx) = b^2 var(x)$ for any constants $a$ and
$b$.

Normal random variables have an additional property: any linear function of a
normal random variable is also normal. That is, if:
  $$x \sim N(\mu,\sigma^2)$$
Then for any constants $a$ and $b$:
  $$a + bx \sim N(a + b\mu, b^2\sigma^2)$$
Nonlinear functions of a normal random variable are generally *not* normal.

::: {.fyi data-latex=""}
**Other distributions based on the normal**

There are many other standard distributions that are based on functions
of one or more normal random variables and are derived from the normal
distribution. For example, if you draw $k$ independent $N(0,1)$ random
variables, square them, and add them up, the distribution of that sum has a
distribution called the $\chi^2(k)$ distribution.

Other such distributions include the $F$ distribution and the $T$ distribution.
All of these standard distributions have important applications in statistical
analysis and would be covered in a more advanced course.
:::

### Standardization

We earlier defined the standardized version of a random variable $x$ as the
following linear function of $x$:
  $$z = \frac{x-E(x)}{sd(x)}$$
and showed that $E(z) = 0$ and $var(z) = sd(z) = 1$.

We can standardize any random variable, but standardization is particularly
convenient for normal random variables.  If $x$ has a normal distribution,
then its standardized value $z$ has the standard normal distribution:
  \begin{align}
    x \sim N(\mu, \sigma^2) &\Rightarrow z \sim N\left(\mu-\mu, \left(\frac{1}{\sigma}\right)^2 \sigma^2\right) \\
      &\Rightarrow z \sim N(0,1)
  \end{align}
The standard normal distribution is so useful that we have special symbol for
its PDF:
  $$\phi(a) = \frac{1}{\sqrt{2\pi}} e^{-\frac{a^2}{2}}$$
and its CDF:
  $$\Phi(a) = \int_{-\infty}^a \phi(b)db$$
$\phi$ is the lower-case Greek letter *phi*, and $\Phi$ is the upper-case *phi*.

We can take advantage of standardization to express the CDF of any normal
random variable in terms of the standard normal CDF. That is, suppose that:
  $$x \sim N(\mu, \sigma^2)$$
Then we can prove that its CDF is:
  \begin{align}
    F_x(a) &= \Phi\left(\frac{a-\mu}{\sigma}\right)
  \end{align}
The standard normal CDF is available as a built-in function in every statistical
package including Excel and R, so we can use this result to calculate the CDF
for any normally distributed random variable.

::: {.fyi data-latex=""}
**The normal and standard normal CDF**

The result that any normal random variable $x \sim N(\mu,\sigma^2)$ has CDF
$\Phi\left(\frac{a-\mu}{\sigma}\right)$ can be proved as follows.

First, define:
  $$z = \frac{x - \mu}{\sigma}$$
Since $z$ is a linear function of $x$, it is also normally distributed:
  $$z \sim N\left(\frac{\mu-\mu}{\sigma}, 
                  \left(\frac{1}{\sigma}\right)^2 \sigma^2\right)$$
or equivalently:
  $$z \sim N(0,1)$$
Then the CDF of $x$ is:
  \begin{align}
    F_x(a) &= \Pr\left(x \leq a\right) \\
      &= \Pr\left( \frac{x-\mu}{\sigma} \leq \frac{a-\mu}{\sigma}\right)\\
      &= \Pr\left( z \leq \frac{a-\mu}{\sigma}\right) \\ 
      &= \Phi\left(\frac{a-\mu}{\sigma}\right)
  \end{align}
:::

## Multiple random variables

Almost all interesting data sets have multiple observations and multiple
variables. So before we start talking about data, we need to develop some tools
and terminology for thinking about multiple random variables.

To keep things simple, most of the definitions and examples will be stated in
terms of *two* discrete random variables.  The extension to more than two random
variables is conceptually straightforward but will be skipped.

### Joint distribution

Let $x$ and $y$ be two random variables defined in terms of the same underlying
random outcome. Their ***joint probability distribution*** assigns a value to
all joint probabilities of the form:
  $$\Pr(x \in A \cap y \in B)$$
for any sets $A, B \subset \mathbb{R}$.

The joint distribution is the key to talking about $x$, $y$ and how they are
related.  Every concept introduced in this section - marginal distributions,
conditional distributions, expected values, covariance, correlation, and
independence - can be defined in terms of the joint distribution.

::: example
**Three joint distributions**

The scatter plots in Figure \@ref(fig:JointIsNotMarginal) below depict
simulation results for a pair of random variables $(x,y)$, with a different
joint distribution in each graph.

```{r JointIsNotMarginal, fig.cap = "*x and y are drawn from a different joint distribution in each graph.*"}
simdata <- tibble(x = rnorm(100),
                  y1 = rnorm(100),
                  y2 = (-x+0.5*y1)/sqrt(1.25),
                  y3 = x)
p1 <- ggplot(data = simdata, mapping = aes(x = x)) +
  geom_point(aes(y=y1),col="blue") +
  xlab("x") +
  ylab("y")
p2 <- ggplot(data = simdata, mapping = aes(x = x)) +
  geom_point(aes(y=y2),col="blue") +
  xlab("x") +
  ylab("y")
p3 <- ggplot(data = simdata, mapping = aes(x = x)) +
  geom_point(aes(y=y3),col="blue") +
  xlab("x") +
  ylab("y")
library("cowplot")
plot_grid(p1,p2,p3,ncol=3,nrow=1)
```

As you can see, the relationship between the two variables differs in the three
cases.
:::

The joint distribution of any two *discrete* random random variables can be fully
described by their ***joint PDF***:
  $$f_{x,y}(a,b) = \Pr(x = a \cap y = b)$$
The joint PDF can be calculated from the probability distribution of the
underlying outcome.

::: example
**The joint PDF in roulette**

In our roulette example, both $w_{red}$ and $w_{14}$ depend on the original
outcome $b$. For convenience, I have created a table below that shows every
value of $b$ in the sample space, its probability, and the associated values
of $w_{red}$ and $w_{14}$.

| $b$ | $\Pr(b)$ | Color | $w_{red}$ | $w_{14}$ |
|:----|:--------:|:-----:|:---------:|:--------:|
|  0  |  1/37    | Green |    -1     |   -1     |
|  1  |  1/37    | Red   |     1     |   -1     |
|  2  |  1/37    | Black |    -1     |   -1     |
|  3  |  1/37    | Red   |     1     |   -1     |
|  4  |  1/37    | Black |    -1     |   -1     |
|  5  |  1/37    | Red   |     1     |   -1     |
|  6  |  1/37    | Black |    -1     |   -1     |
|  7  |  1/37    | Red   |     1     |   -1     |
|  8  |  1/37    | Black |    -1     |   -1     |
|  9  |  1/37    | Red   |     1     |   -1     |
| 10  |  1/37    | Black |    -1     |   -1     |
| 11  |  1/37    | Black |    -1     |   -1     |
| 12  |  1/37    | Red   |     1     |   -1     |
| 13  |  1/37    | Black |    -1     |   -1     |
| 14  |  1/37    | Red   |     1     |   35     |
| 15  |  1/37    | Black |    -1     |   -1     |
| 16  |  1/37    | Red   |     1     |   -1     |
| 17  |  1/37    | Black |    -1     |   -1     |
| 18  |  1/37    | Red   |     1     |   -1     |
| 19  |  1/37    | Red   |     1     |   -1     |
| 20  |  1/37    | Black |    -1     |   -1     |
| 21  |  1/37    | Red   |     1     |   -1     |
| 22  |  1/37    | Black |    -1     |   -1     |
| 23  |  1/37    | Red   |     1     |   -1     |
| 24  |  1/37    | Black |    -1     |   -1     |
| 25  |  1/37    | Red   |     1     |   -1     |
| 26  |  1/37    | Black |    -1     |   -1     |
| 27  |  1/37    | Red   |     1     |   -1     |
| 28  |  1/37    | Black |    -1     |   -1     |
| 29  |  1/37    | Black |    -1     |   -1     |
| 30  |  1/37    | Red   |     1     |   -1     |
| 31  |  1/37    | Black |    -1     |   -1     |
| 32  |  1/37    | Red   |     1     |   -1     |
| 33  |  1/37    | Black |    -1     |   -1     |
| 34  |  1/37    | Red   |     1     |   -1     |
| 35  |  1/37    | Black |    -1     |   -1     |
| 36  |  1/37    | Red   |     1     |   -1     |

We can construct the joint PDF by simply adding up over all possible outcomes.

There is one outcome ($b=14$) in which both red and 14 win, and it has
probability $1/37$:
  \begin{align}
    f_{red,14}(1,35) &= \Pr(w_{red}=1 \cap w_{14} = 35) \\
      &= \Pr(b \in \{14\}) = 1/37
  \end{align}
There are 17 outcomes in which red wins and 14 loses, and each has probability
$1/37$:
  \begin{align}
    f_{red,14}(1,-1) &= \Pr(w_{red} = 1 \cap w_{14} = -1) \\
      &= \Pr\left(b \in \left\{
        \begin{gathered}
          1,3,5,7,9,12,16,18,19,21,\\
          23,25,27,30,32,34,36
        \end{gathered}\right\}\right)  \\
      &= 17/37
  \end{align}
There are 19 outcomes in which both red and 14 lose, and each has probability
$1/37$:
  \begin{align}
    f_{red,14}(-1,-1) &= \Pr(w_{red} = -1 \cap w_{14} = -1) \\
      &= \Pr\left(b \in \left\{
        \begin{gathered}
          0,2,4,6,7,10,11,13,15,17, \\
          20,22,24,26,28,31,33,35
        \end{gathered}\right\}\right)  \\
      &= 19/37
  \end{align}
There are no other outcomes, so all other combinations have probability zero.

Therefore, the joint PDF of $w_{red}$ and $w_{14}$ is:
\begin{align}
  f_{red,14}(a,b) &= \begin{cases}
      19/37 & \textrm{if $a = -1$ and $b = -1$} \\
      17/37 & \textrm{if $a = 1$ and $b = -1$} \\
      1/37 & \textrm{if $a = 1$ and $b = 35$} \\
      0 & \textrm{otherwise} \\
      \end{cases} \nonumber
\end{align}
Creating and using this table is something of a "brute force" approach: it is
time consuming but requires little thought and will always get the right
answer. You may be able to figure out a quicker approach.
:::

::: {.fyi data-latex=""}
**Other ways of describing a joint distribution**

The joint distribution of any two (discrete or continuous) random variables can
be fully described by their joint CDF:
$$F_{x,y}(a,b) = \Pr(x \leq a \cap y \leq b)$$

Similarly, the joint distribution of any two continuous random variables can be
fully described by their (continuous) joint PDF:
$$f_{x,y}(a,b) = \frac{\partial F_{x,y}(a,b)}{\partial a \partial b}$$
:::

### Marginal distributions

When two random variables have a joint distribution, we call the probability
distribution of random variable its **marginal distribution**.  Both marginal
distributions are part of the joint distribution:
\begin{align}
  \Pr(x \in A) &= \Pr(x \in A \cap y \in \mathbb{R}) \\
  \Pr(y \in A) &= \Pr(x \in \mathbb{R} \cap y \in A)
\end{align}
Note that there is no difference between a random variable's "marginal
distribution" and its "distribution".  We just add the word "marginal" in this
context to distinguish it from the joint distribution.

The marginal distribution is fully described by the corresponding
***marginal PDF***, which can be derived from the joint PDF. Let $x$ and $y$ be
two discrete random variables with joint PDF $f_{x,y}$. Then their marginal PDFs
are:
\begin{align}
  f_x(a) &= \sum_{b \in S_y} f_{x,y}(a,b) \nonumber \\
  f_y(b) &= \sum_{a \in S_x} f_{x,y}(a,b) \nonumber
\end{align}
Pay close attention to where the $a$ and $b$ are located when you use these
formulas.

::: example
**Deriving marginal PDF from joint PDF**

We earlier found the joint PDF of $w_{red}$ and $w_{14}$:
\begin{align}
  f_{red,14}(a,b) &= \begin{cases}
      19/37 & \textrm{if $a = b = -1$} \\
      17/37 & \textrm{if $a = 1$ and $b = -1$} \\
      1/37 & \textrm{if $a = 1$ and $b = 35$} \\
      0 & \textrm{otherwise} \\
      \end{cases} \nonumber
\end{align}
Then the marginal PDF of $w_{red}$ is:
\begin{align}
  f_{red}(a) &= \sum_{b \in \{-1,35\}} f_{red,14}(a,b) \\
    &= f_{red,14}(a,-1) + f_{red,14}(a,35) \\
\end{align}
Plugging in values in the support of $w_{red}$ we get:
\begin{align}
  f_{red}(-1) &= f_{red,14}(-1,-1) + f_{red,14}(-1,35) \\
    &= 19/37 + 0 \\
    &= 19/37 \\
  f_{red}(1) &= f_{red,14}(1,-1) + f_{red,14}(1,35) \\
    &= 17/37 + 1/37 \\
    &= 18/37
\end{align}
Which we can summarize with:
\begin{align}
  f_{red}(a) &= \begin{cases}
    19/37 & \textrm{if $a = -1$} \\
    18/37 & \textrm{if $a = 1$} \\
    0 & \textrm{otherwise} \\
    \end{cases}
\end{align}
Note that this is the same PDF we found in an earlier chapter.
:::

While you can always derive the two marginal distributions from the joint
distribution, you cannot derive the joint distribution from the two marginal
distributions.  A given pair of marginal distributions is typically consistent
with an infinite number of joint distributions. For example, the three joint
distributions shown in Figure \@ref(fig:JointIsNotMarginal) all depict random
variables with the same marginal distribution (both $x$ and $y$ have the
standard normal distribution in all three graphs) but very different joint
distributions.

::: example
**Two joint distributions with identical marginal distributions**

Suppose Al, Betty, and Carl each place a bet on the same roulette game. Al
and Betty both bet on red, and Carl bets on black.  Let $w_{Al}$,
$w_{Betty}$, and $w_{Carl}$ be their respective winnings.

All three players have the same marginal distribution of winnings:
\begin{align}
  f_{Al}(a) = f_{Betty}(a) = f_{Carl}(a) &= \begin{cases}
    19/37 & \textrm{if $a = -1$} \\
    18/37 & \textrm{if $a = 1$} \\
    0 & \textrm{otherwise} \\
    \end{cases}
\end{align}
since both red and black have a 18/37 chance of winning.

But the joint distribution of $w_{Al}$ and $w_{Betty}$:
\begin{align}
  f_{Al,Betty}(a,b)  &= \begin{cases}
    19/37 & \textrm{if $a = -1$ and $b = -1$} \\
    18/37 & \textrm{if $a = 1$ and $b = 1$} \\
    0 & \textrm{otherwise} \\
    \end{cases}
\end{align}
is very different from the joint distribution of $w_{Al}$ and $w_{Carl}$:
\begin{align}
  f_{Al,Carl}(a,b)  &= \begin{cases}
    1/37 & \textrm{if $a = -1$ and $b = -1$} \\
    18/37 & \textrm{if $a = -1$ and $b = 1$} \\
    18/37 & \textrm{if $a = -1$ and $b = 1$} \\
    0 & \textrm{otherwise} \\
    \end{cases}
\end{align}
For example, Betty always *wins* when Al wins, but Carl always *loses* when Al
wins.
:::

We will soon develop several useful ways of describing the relationship between
two random variables including conditional distribution, covariance,
correlation, and independence.

::: {.fyi data-latex=""}
**Other ways of deriving a marginal distribution**

The marginal CDFs of any two random (discrete or continuous) random variables
can be derived from their joint CDF:
\begin{align}
  F_x(a) &= \lim_{b \rightarrow \infty} F_{x,y}(a,b) \\
  F_y(b) &= \lim_{a \rightarrow \infty} F_{x,y}(a,b)
\end{align}
and the marginal PDFs of any two continuous random variables can be derived
from their joint PDF:
\begin{align}
  f_x(a) &= \int_{-\infty}^{\infty} f_{x,y}(a,b) db \\
  f_y(b) &= \int_{-\infty}^{\infty} f_{x,y}(a,b) da
\end{align}
:::

### Conditional distribution

The ***conditional distribution*** of a random variable $y$ given another random
variable $x$ assigns values to all conditional probabilities of the form:
  $$\Pr(y \in A| x \in B) = \frac{\Pr(y \in A \cap x \in B)}{\Pr(x \in B)}$$
Since a conditional probability is just the ratio of the joint probability 
to the marginal probability, the conditional distribution can always be derived
from the joint distribution.

The conditional distributions of any two discrete random variables $x$ and $y$
can be fully described by the ***conditional PDF***:
\begin{align}
  f_{x|y}(a,b) &= \Pr(x=a|y=b) \\
    &= \frac{f_{x,y}(a,b)}{f_y(b)} \\
  f_{y|x}(a,b) &= \Pr(y=a|x=b) \\
    &= \frac{f_{x,y}(b,a)}{f_x(b)}
\end{align}
Pay close attention to where the $a$ and $b$ are located when you use these
formulas.

::: example
**Conditional PDFs in roulette**

The conditional PDF of the payout for a bet on red given the payout for a bet
on 14 is defined as:
\begin{align}
  f_{red|14}(a,b) &= \Pr(w_{red} = a| w_{14} = b) \\
    &= \frac{f_{red,14}(a,b)}{f_{14}(b)} \\
    &= \begin{cases}
      (19/37)/(36/37) & \textrm{if $a = -1$ and $b = -1$} \\
      (17/37)/(36/37) & \textrm{if $a = 1$ and $b = -1$} \\
      (1/37)/(1/37) & \textrm{if $a = 1$ and $b = 35$} \\
      0 & \textrm{otherwise} \\
      \end{cases} \\
    &= \begin{cases}
      19/36 & \textrm{if $a = -1$ and $b = -1$} \\
      17/36 & \textrm{if $a = 1$ and $b = -1$} \\
      1 & \textrm{if $a = 1$ and $b = 35$} \\
      0 & \textrm{otherwise} \\
      \end{cases}
\end{align}
The conditional PDF of the payout for a bet on 14 given the payout for a bet on
red is defined as:
\begin{align}
  f_{14|red}(a,b) &= \Pr(w_{14} = a| w_{red} = b) \\
    &= \frac{f_{red,14}(b,a)}{f_{red}(b)} \\
    &= \begin{cases}
      (19/37)/(19/37) & \textrm{if $a = -1$ and $b = -1$} \\
      (17/37)/(18/37) & \textrm{if $a = -1$ and $b = 1$} \\
      (1/37)/(18/37) & \textrm{if $a = 35$ and $b = 1$} \\
      0 & \textrm{otherwise} \\
      \end{cases} \\
    &= \begin{cases}
      1 & \textrm{if $a = -1$ and $b = -1$} \\
      17/18 & \textrm{if $a = -1$ and $b = 1$} \\
      1/18 & \textrm{if $a = 35$ and $b = 1$} \\
      0 & \textrm{otherwise} \\
      \end{cases}
\end{align}
:::

As we said earlier that you cannot derive the joint distribution from the two
marginal distributions.  However, you can derive it by combining a conditional
distribution with the corresponding marginal distribution. For example:
\begin{align}
  \underbrace{\Pr(x \in A \cap y \in B)}_{\textrm{joint}} &=
    \underbrace{\Pr(x \in A | y \in B)}_{\textrm{conditional}}
    \underbrace{\Pr(y \in B)}_{\textrm{marginal}}
\end{align}
A similar result applies to joint, conditional and marginal PDFs.

::: {.fyi data-latex=""}
**Other ways of deriving a conditional distribution**

We can describe any conditional distribution with the conditional CDF:
  $$F_{x|y}(a,b) = \Pr(x \leq a|y=b)$$
We can also describe the conditional distribution of one continuous random
variable given another with the continuous conditional PDF:
  \begin{align}
    f_{x|y}(a,b) &= \frac{\partial}{\partial a}F_{x|y}(a,b) \\
      &= \frac{f_{x,y}(a,b)}{f_y(b)}
  \end{align}
:::


### Functions of multiple random variables {#multiple-functions}

Suppose we have two random variables $x$ and $y$ and we use them to construct
a third random variable $z = g(x,y)$.  What can we say about the probability
distribution of $z$?

1. It has a well-defined probability distribution, PDF, CDF, expected value,
   etc. all of which can be derived from the joint distribution of $x$ and $y$.
2. If $z$ is a *linear* function of $x$ and $y$, its expected value is a linear
   function of $E(x)$ and $E(y)$:
    $$E(a + bx + cy) = a + bE(x) + cE(y)$$
3. If $z$ is a nonlinear function of $x$ and $y$ we typically cannot express its
   expected value as a function of $E(x)$ and $E(y)$.  For example:
   $$E(xy) \neq E(x)E(y)$$
   $$E(x/y) \neq E(x)/E(y)$$

Note that these results are very similar to what we found earlier for a function
of a single random variable.

::: example
**Multiple bets in roulette**

Suppose we bet \$100 on red and \$10 on 14. Our net payout will be:
  $$w_{total} = 100*w_{red} + 10*w_{14}$$
which has expected value:
  \begin{align}
    E(w_{total}) &= E(100 w_{red} + 10 w_{14}) \\
      &= 100 \, \underbrace{E(w_{red})}_{\approx -0.027} + 10 \, \underbrace{E(w_{14})}_{\approx -0.027} \\
      &\approx -3 
  \end{align}
That is we expect this betting strategy to lose an average of about \$3 per
game.
:::

### Covariance {#population-covariance}

The ***covariance*** of two random variables $x$ and $y$ is defined as:
  $$\sigma_{xy} = cov(x,y) = E[(x-E(x))*(y-E(y))]$$
The covariance can be interpreted as a measure of how $x$ and $y$ tend to move
together.

- If the covariance is *positive*: 
  - $(x-E(x))$ and $(y-E(y))$ tend to have the *same* sign.
  - Above-average values of $x$ (positive values of $x-E(x)$) are typically
    associated with above-average values of $y$ (positive values of $y-E(y)$).
  - $x$ and $y$ tend to move in the *same* direction. 
- If the covariance is *negative*:
  - $(x-E(x))$ and $(y-E(y))$ tend to have *opposite* signs.
  - Above-average values of $x$ (positive values of $x-E(x)$) are typically
    associated with below-average values of $y$ (negative values of $y-E(y)$).
  - $x$ and $y$ tend to move in *opposite* directions.
- If the covariance is *zero*, there is no simple pattern of co-movement
  for $x$ and $y$.
- Higher (positive or negative) values of the covariance are associated with:
  - Greater variability of $x$ and $y$.
  - Stronger (positive or negative) relationship between $x$ and $y$.

For example, the first scatter plot in Figure \@ref(fig:JointIsNotMarginal)
shows a case with zero correlation, the second shows a case with negative
correlation, and the third shows a case with positive correlation.

::: example
**Calculating the covariance from the joint PDF**

The covariance of $w_{red}$ and $w_{14}$ is:
  \begin{align}
    cov(w_{red},w_{14}) &= \begin{aligned}[t]
        & (1-\underbrace{E(w_{red})}_{\approx -0.027})(35-\underbrace{E(w_{14})}_{\approx -0.027})\underbrace{f_{red,14}(1,35)}_{1/37}\\
        &+ (1-\underbrace{E(w_{red})}_{\approx -0.027})(-1-\underbrace{E(w_{14})}_{\approx -0.027})\underbrace{f_{red,14}(1,-1)}_{17/37} \\ 
        &+ (-1-\underbrace{E(w_{red})}_{\approx -0.027})(-1-\underbrace{E(w_{14})}_{\approx -0.027})\underbrace{f_{red,14}(-1,-1)}_{19/37} \\
        \end{aligned} \\
      &\approx 0.999 
    \end{align}
That is, the returns from a bet on red and a bet on 14 are positively related.
:::

As with the variance, we can derive an alternative formula for the covariance:
  $$cov(x,y) = E(xy) - E(x)E(y)$$
Again, this formula is often easier to calculate than using the original
definition.

::: example
**Another way to calculate the covariance**

The expected value of $w_{red}w_{14}$ is:
\begin{align}
  E(w_{red}w_{14}) &= \begin{aligned}[t]
        & 1*35*\underbrace{f_{red,14}(1,35)}_{1/37}\\
        &+ 1*(-1)*\underbrace{f_{red,14}(1,-1)}_{17/37} \\ 
        &+ (-1)*(-1)*\underbrace{f_{red,14}(-1,-1)}_{19/37} \\
        \end{aligned} \\
      &= 35/37 - 17/37 + 19/37 \\
      &= 1
\end{align}
So the covariance is:
  \begin{align}
    cov(w_{red},w_{14}) &= E(w_{red}w_{14}) - E(w_{red})E(w_{14}) \\
      &= 1 - (-0.027)*(-0.027) \\
      &\approx 0.999 
    \end{align}
which is the same result as we calculated earlier.
:::

::: {.fyi data-latex=""}
**Deriving the alternate formula**

The alternate formula for the covariance can be derived as follows:
  \begin{align}
    cov(x,y) &= E((x-E(x))(y-E(y))) \\
      &= E(xy - yE(x) - xE(y) + E(x)E(y)) \\
      &= E(xy) - E(y)E(x) - E(x)E(y) + E(x)E(y)) \\
      &= E(xy) - E(x)E(y)
  \end{align}
:::

The key to understanding the covariance is that it is the expected value of a
*product* $(x-E(x))(y-E(y))$, and the expected value itself is just a sum. As a
result it is easy to prove that:

1. Order does not matter:
   $$cov(x,y) = cov(y,x)$$
   just like $xy = yx$.
2. The variance is also a covariance:
   $$var(x) = cov(x,x)$$
   just like $xx = x^2$.
3. Covariances pass through sums:
   $$cov(x,y+z) = cov(x,y) + cov(x,z)$$
   just like $x(y+z) = xy + xz$.
4. Constants can be factored out of covariances:
   $$cov(x,a+by) = b \, cov(x,y)$$
   just like $x(a+by) = ax + bxy$

These results can be combined in various ways.

::: example
**Finding the variance of a sum**

Suppose we have two random variables $x$ and $y$ and want to find the variance
of $x$ and $y$.

We can apply the result that the variance is also a covariance to get:
\begin{align}
  var(x+y) &= cov(x+y, x+y)
\end{align}
Then we apply the result that covariances pass through sums:
\begin{align}
  var(x+y) &= cov(x+y, x) + cov(x+y, y) \\
    &= cov(x, x) + cov(y,x) + cov(x, y) + cov(y, y) \\
\end{align}
Then we apply the result that order does not matter, and rearrange:
\begin{align}
  var(x+y) &= cov(x, x) + cov(x,y) + cov(x, y) + cov(y, y) \\
    &= var(x) + 2 \, cov(x,y) + var(y) \\
\end{align}
Note that this result looks a little like the result from algebra that
$(x+y)^2 = x^2 + 2xy + y^2$.  This similarity is not an accident, since the
covariance is a product.

We can use a similar approach to find the variance of any linear function of
$x$ and $y$, or the covariance of any two such functions.
:::

### Correlation {#population-correlation}

The ***correlation coefficient*** of two random variables $x$ and $y$ is defined
as:
  $$\rho_{xy} = corr(x,y) = \frac{cov(x,y)}{\sqrt{var(x)var(y)}} = \frac{\sigma_{xy}}{\sigma_x\sigma_y}$$
Like the covariance, the correlation describes the strength of a (linear) 
relationship between $x$ and $y$.  But it is re-scaled in a way that makes it
more convenient for some purposes.

::: example
**Correlation in roulette**

The correlation of $w_{red}$ and $w_{14}$ is:
  \begin{align}
    corr(w_{red},w_{14}) 
    &= \frac{cov(w_{red},w_{14})}{\sqrt{var(w_{red})*var(w_{14})}} \\
     &\approx \frac{0.999}{\sqrt{1.0*34.1}} \\
     &\approx 0.17
  \end{align}
:::

The covariance and correlation always have the same sign since standard
deviations are always[^601] positive. The key difference between them is that
correlation is scale-invariant.

1. The correlation always lies between -1 and 1.
2. The correlation is unchanged by any re-scaling or change in units.  That is:
   $$corr(ax,by) = corr(x,y)$$
   for any positive^[More generally,
   $corr(ax+b,cy+d) = sign(a)\,sign(b)\,corr(x,y)$ for any (positive or
   negative) constants $a,b,c,d$.] constants $a$ and $b$.
3. The correlation of $x$ with itself is always:
   $$corr(x,x) = 1$$
   and the correlation of $x$ with $-x$ is always:
   $$corr(x,-x) = -1$$

For example, suppose $x$ and $y$ are distances measured in kilometers and we
convert them to miles. This change of units will affect $cov(x,y)$ but will
not affect $corr(x,y)$.

[^601]: More precisely, either or both of $\sigma_x$ and $\sigma_y$ could be zero.
In that case the covariance will also be zero, and the correlation will be 
undefined (zero divided by zero).

### Independence {#independent-random-variables}

We say that the random variables $x$ and $y$ are ***independent*** if every
event defined in terms of $x$ is independent of every event defined in terms of
$y$.  That is:
  $$\Pr(x \in A \cap y \in B) = \Pr(x \in A)\Pr(y \in B)$$
for any sets $A, B \subset \mathbb{R}$.

As we earlier found for independence of two events, independence of two random
variables implies that their conditional and marginal distributions are the
same:
  $$\Pr(x \in A| y \in B) = \Pr(x \in A)$$
  $$\Pr(y \in A| x \in B) = \Pr(y \in A)$$
The first graph in Figure \@ref(fig:JointIsNotMarginal) shows an example of
what independent random variables look like in data: a cloud of unrelated points.
  
Independence also means that the joint and conditional distributions can be
derived from the marginal distributions.  When $x$ and $y$ are both discrete,
this implication can be expressed in terms of PDFs:
  $$f_{x,y}(a,b) = f_x(a)f_y(b)$$
  $$f_{y|x}(a,b) = f_y(a)$$
As with independence of events, this will be very handy in simplifying the
analysis.  But remember: independence is an *assumption* that we can only make
when it's reasonable to do so.

::: example
**Independence in roulette**

The winnings from a bet on red $(w_{red})$ and the winnings from a bet on 14
$(w_{14})$ in the same game are *not* independent. 

However the winnings from a bet on red and a bet on 14 in two different games
*are* independent since the underlying outcomes are independent.
:::

When random variables are independent, their covariance and correlation are both
exactly zero. However, it does not go the other way around.  The intuition here
is that covariance and correlation describe the *linear* relationship between
the two variables, and independence means that there is no *linear or nonlinear*
relationship between the two variables. If there is a nonlinear relationship
between two variables, they are not independent but they could have a covariance
or correlation of zero.

::: example
**Zero covariance does not imply independent**

Figure \@ref(fig:UncorrelatedButNotIndependent) below shows a scatter plot from
a simulation of two random variables^[In this particular example $x$ is a random
number from the $N(0,1)$ distribution and $y = x^2$.] that are clearly related
(and therefore not independent) but whose covariance is exactly zero.


```{r UncorrelatedButNotIndependent, fig.cap = "*x and y are uncorrelated, but clearly related.*"}
simdata <- tibble(x = rnorm(100),
                  y = x^2)
ggplot(data = simdata, mapping = aes(x = x, y =y)) +
  geom_point(col="blue") +
  xlab("x") + 
  ylab("y")
```

Intuitively, covariance can be interpreted as a measure of the *linear*
relationship between two variables.  When variables have a nonlinear
relationship as in Figure \@ref(fig:UncorrelatedButNotIndependent) above, the
covariance may miss it.
:::

## Chapter review {-#review-more-on-random-variables}

Over the course of this chapter and the previous ones on
[probability](#probability) and [simple random variables](#random-variables),
we have learned the basic terminology and tools for working with random
variables: PDFs, CDFs, and expected values. You should work hard and do as many
practice problems as you can, as a clear understanding of random variables will
make the rest of the course much easier to follow.

The next step is to use the tools of probability and random variables to build
a theoretical framework in which we can
[interpret each statistic as a random variable](#statistics)
and each data set as a collection of random variables. This theory will allow us
to use statistics not only as a way of describing data, but as a way of
understanding the process that produced that data.

## Practice problems {-#problems-more-on-random-variables}

Answers can be found in the [appendix](#answers-more-on-random-variables).

Questions 6 - 18 below continue our [craps example](#problems-probability). To
review that example, we have:

- An outcome $(r,w)$ where $r$ and $w$ are the numbers rolled on a pair of 
  fair six-sided dice
- Several random variables defined in terms of that outcome:
  - The total showing on the pair of dice: $t = r+w$
  - An indicator for whether a bet on "Yo" wins: $y = I(t=11)$.

In addition, let $b = I(t=12)$ be an indicator of whether a bet on "Boxcars"
wins. Since it is an indicator variable $b$ has the $Bernoulli(p)$ distribution
with $p = 1/36$, so it has mean:
  $$E(b) = p = 1/36$$
and variance:
  $$var(b) = p(1-p) = 1/36*35/36 \approx 0.027 $$


**GOAL #1: Interpret the CDF and PDF of a continuous random variable**

1. The figure below shows the PDF and CDF of a random variable $x$.
   ```{r goal1}
   simdata <- tibble(a = seq(-3,40,len=200),
                     f = dchisq(a,df=4),
                     F = pchisq(a,df=4))
   ggplot(data = simdata, mapping = aes(x = a, y =f)) +
     geom_line(col="blue") +
     xlab("a") +
     ylab("f(a)")
   ggplot(data = simdata, mapping = aes(x = a, y =F)) +
     geom_line(col="blue") +
     xlab("a") +
     ylab("F(a)")
   ```

   Based on these figures:
   a. Is $x$ discrete or continuous?
   b. Which graph shows the PDF and which graph shows the CDF?
   c. Approximately what value is the median of $x$?
   d. Is $x$ more likely to be between 0 and 10, or between 10 and 20?

**GOAL #2: Know and use the key properties of the uniform distribution**

2.  Suppose that $x \sim U(-1,1)$.
    a. Find the PDF $f_x(\cdot)$ of $x$.
    b. Find the CDF $F_x(\cdot)$ of $x$.
    c. Find $\Pr(x = 0)$.
    d. Find $\Pr(0 < x < 0.5)$.
    e. Find $\Pr(0 \leq x \leq 0.5)$.
    f. Find the median of $x$.
    g. Find the 75th percentile of $x$.
    h. Find $E(x)$.
    i. Find $var(x)$.

**GOAL #3: Derive the distribution for a linear function of a uniform random variable**

3.  Suppose that $x \sim U(-1,1)$, and let $y = 3x + 5$.
    a. What is the probability distribution of $y$?
    b. Find $E(y)$.

**GOAL #4: Know and use the key properties of the normal distribution**

4.  Suppose that $x \sim N(10,4)$.
    a. Find $E(x)$.
    b. Find the median of $x$.
    c. Find $var(x)$.
    d. Find $sd(x)$.
    b. Use Excel to find $\Pr(x \leq 11)$.

**GOAL #5: Derive the distribution for a linear function of a normal random variable**

5.  Suppose that $x \sim N(10,4)$.
    a. Find the distribution of $y = 3x + 5$.
    b. Find a random variable $z$ that is a linear function of $x$ and
       has the standard normal distribution.
    c. Find an expression for $\Pr(x \leq 11)$ in terms of the standard normal
       CDF $\Phi(\cdot)$.
    d. Use the Excel function `NORM.S.DIST` and the previous result to
       find to find the value of $\Pr(x \leq 11)$.

**GOAL #6: Derive the joint PDF of two discrete random variables from the probability distribution of a random outcome **

6. Let $f_{y,b}(\cdot)$ be the joint PDF of $y$ and $b$, where $y$ is an
   indicator for winning a "Yo" bet in craps, and $b$ is an indicator for
   winning a "Boxcars" bet.
   a. Find $f_{y,b}(1,1)$.
   b. Find $f_{y,b}(0,1)$.
   c. Find $f_{y,b}(1,0)$.
   d. Find $f_{y,b}(0,0)$.

**GOAL #7: Calculate a marginal PDF from a (discrete) joint PDF **

7.  Let $f_b(\cdot)$ be the marginal PDF of $b$, where $b$ is an indicator for
    winning a "Boxcars" bet in craps.
    a. Find $f_b(0)$ based on the joint PDF $f_{y,b}(\cdot)$.
    b. Find $f_b(1)$ based on the joint PDF $f_{y,b}(\cdot)$.
    c. Find $E(b)$ based on this marginal PDF you found in parts (a) and (b).

**GOAL #8: Derive a conditional PDF from a (discrete) joint PDF**

8. Let $f_{y|b}(1,1)$ be the conditional PDF of $y$ given $b$, where $y$ is an
   indicator for winning a "Yo" bet in craps, and $b$ is an indicator for
   winning a "Boxcars" bet.
   a. Find $f_{y|b}(1,1)$.
   b. Find $f_{y|b}(0,1)$.
   c. Find $f_{y|b}(1,0)$.
   d. Find $f_{y|b}(0,0)$.

**GOAL #9: Interpret joint, marginal, and conditional distributions**

9.  You have bet on Yo and your friend Betty bet on Boxcars. Based on your
    previous calculations:
    a. What is the probability you and Betty both win?
    b. What is the probability you and Betty both lose?
    c. What is the probability Betty wins?
    d. What is the probability that you win if Betty loses?

**GOAL #10: Determine whether two random variables are independent**

10. Which of the following pairs of random variables are independent for a
    single game of craps?
    a. $y$ and $t$
    b. $y$ and $b$
    c. $r$ and $w$
    d. $r$ and $y$

**GOAL #11: Calculate the covariance of two discrete random variables from their joint PDF**

11. Find $cov(y,b)$ using the joint PDF $f_{y,b}(\cdot)$ calculated in problem
    (6) above.

**GOAL #12: Calculate the covariance of two random variables using the expected value formula**

12. Find the following covariances using the alternate formula, where $y$ is an
    indicator for winning a "Yo" bet in craps, and $b$ is an indicator for
    winning a "Boxcars" bet.
    a. Find $E(yb)$ using the joint PDF $f_{y,b}(\cdot)$.
    b. Find $cov(y,b)$ using your result in (a).
    c. Is your answer in (b) the same as your answer to question 5 above?

**GOAL #13: Calculate the correlation of two random variables from their covariance**

13. Find $corr(y,b)$ using the results you found earlier, where $y$ is an
    indicator for winning a "Yo" bet in craps, and $b$ is an indicator for
    winning a "Boxcars" bet.

14. We can find the correlation from the covariance, but we can also find the
    covariance from the correlation.  For example, suppose we already know that
    \begin{align}
      E(t) &= 7 \\
      var(t) &\approx 5.83  \\
      corr(b,t) &\approx 0.35
    \end{align}
    Using this information and the values of $E(b)$ and $var(b)$ implied by the
    fact that $b \sim Bernoulli(1/36)$
    a. Find $cov(b,t)$.
    b. Find $E(bt)$.

**GOAL #14: Calculate the correlation and covariance of two independent random variables**

15. We earlier found that $r$ and $w$ (the values rolled on two separate dice)
    are independent. Using this information:
    a. Find $cov(r,w)$.
    b. Find $corr(r,w)$.

**GOAL #15: Calculate the expected value of a linear function of two or more random variables**

16. Your net winnings if you bet \$1 on Yo and \$1 on Boxcars can be written 
    $16y + 31b - 2$. Find the following expected values:
    a. Find $E(y + b)$
    b. Find $E(16y + 31b - 2)$

17. Find the following variances and covariances,, where $y$ is an indicator for
    winning a "Yo" bet in craps, and $b$ is an indicator for winning a "Boxcars"
    bet.
    a. Find $cov(16y,31b)$
    b. Find $var(y + b)$

**GOAL #16: Interpret covariances and correlations**

18. Based on your results, which of the following statements is correct?
    a. The result of a bet on Boxcars is *positively* related to the result of a
       bet on Yo.
    b. The result of a bet on Boxcars is *negatively* related to the result of a
       bet on Yo.
    c. The result of a bet on Boxcars is *not* related to the result of a bet on
       Yo.