Skip to content

Latest commit

 

History

History
169 lines (123 loc) · 5.6 KB

File metadata and controls

169 lines (123 loc) · 5.6 KB
title subtitle author job logo framework highlighter hitheme url widgets mode
Conditional Probability
Statistical Inference
Brian Caffo, Jeff Leek, Roger Peng
Johns Hopkins Bloomberg School of Public Health
bloomberg_shield.png
io2012
highlight.js
tomorrow
lib assets
../../librariesNew
../../assets
mathjax
selfcontained

Conditional probability, motivation

  • The probability of getting a one when rolling a (standard) die is usually assumed to be one sixth
  • Suppose you were given the extra information that the die roll was an odd number (hence 1, 3 or 5)
  • conditional on this new information, the probability of a one is now one third

Conditional probability, definition

  • Let $B$ be an event so that $P(B) > 0$
  • Then the conditional probability of an event $A$ given that $B$ has occurred is $$ P(A | B) = \frac{P(A \cap B)}{P(B)} $$
  • Notice that if $A$ and $B$ are independent, then $$ P(A | B) = \frac{P(A) P(B)}{P(B)} = P(A) $$

Example

  • Consider our die roll example
  • $B = {1, 3, 5}$
  • $A = {1}$ $$ \begin{eqnarray*} P(\mbox{one given that roll is odd}) & = & P(A | B) \ \ & = & \frac{P(A \cap B)}{P(B)} \ \ & = & \frac{P(A)}{P(B)} \ \ & = & \frac{1/6}{3/6} = \frac{1}{3} \end{eqnarray*} $$

Bayes' rule

$$ P(B | A) = \frac{P(A | B) P(B)}{P(A | B) P(B) + P(A | B^c)P(B^c)}. $$


Diagnostic tests

  • Let $+$ and $-$ be the events that the result of a diagnostic test is positive or negative respectively
  • Let $D$ and $D^c$ be the event that the subject of the test has or does not have the disease respectively
  • The sensitivity is the probability that the test is positive given that the subject actually has the disease, $P(+ | D)$
  • The specificity is the probability that the test is negative given that the subject does not have the disease, $P(- | D^c)$

More definitions

  • The positive predictive value is the probability that the subject has the disease given that the test is positive, $P(D | +)$
  • The negative predictive value is the probability that the subject does not have the disease given that the test is negative, $P(D^c | -)$
  • The prevalence of the disease is the marginal probability of disease, $P(D)$

More definitions

  • The diagnostic likelihood ratio of a positive test, labeled $DLR_+$, is $P(+ | D) / P(+ | D^c)$, which is the $$sensitivity / (1 - specificity)$$
  • The diagnostic likelihood ratio of a negative test, labeled $DLR_-$, is $P(- | D) / P(- | D^c)$, which is the $$(1 - sensitivity) / specificity$$

Example

  • A study comparing the efficacy of HIV tests, reports on an experiment which concluded that HIV antibody tests have a sensitivity of 99.7% and a specificity of 98.5%
  • Suppose that a subject, from a population with a .1% prevalence of HIV, receives a positive test result. What is the probability that this subject has HIV?
  • Mathematically, we want $P(D | +)$ given the sensitivity, $P(+ | D) = .997$, the specificity, $P(- | D^c) =.985$, and the prevalence $P(D) = .001$

Using Bayes' formula

$$ \begin{eqnarray*} P(D | +) & = &\frac{P(+~|D)P(D)}{P(+|D)P(D) + P(+|D^c)P(D^c)}\ \\ & = & \frac{P(+|D)P(D)}{P(+|D)P(D) + {1-P(-|~D^c)}{1 - P(D)}} \ \\ & = & \frac{.997\times .001}{.997 \times .001 + .015 \times .999}\ \\ & = & .062 \end{eqnarray*} $$

  • In this population a positive test result only suggests a 6% probability that the subject has the disease
  • (The positive predictive value is 6% for this test)

More on this example

  • The low positive predictive value is due to low prevalence of disease and the somewhat modest specificity
  • Suppose it was known that the subject was an intravenous drug user and routinely had intercourse with an HIV infected partner
  • Notice that the evidence implied by a positive test result does not change because of the prevalence of disease in the subject's population, only our interpretation of that evidence changes

Likelihood ratios

  • Using Bayes rule, we have $$ P(D | +) = \frac{P(+~|D)P(D)}{P(+|D)P(D) + P(+|D^c)P(D^c)} $$ and $$ P(D^c | +) = \frac{P(+|D^c)P(D^c)}{P(+|D)P(D) + P(+|~D^c)P(D^c)}. $$

Likelihood ratios

  • Therefore $$ \frac{P(D | +)}{P(D^c | +)} = \frac{P(+~|D)}{P(+|~D^c)}\times \frac{P(D)}{P(D^c)} $$ ie $$ \mbox{post-test odds of }D = DLR_+\times\mbox{pre-test odds of }D $$
  • Similarly, $DLR_-$ relates the decrease in the odds of the disease after a negative test result to the odds of disease prior to the test.

HIV example revisited

  • Suppose a subject has a positive HIV test
  • $DLR_+ = .997 / (1 - .985) \approx 66$
  • The result of the positive test is that the odds of disease is now 66 times the pretest odds
  • Or, equivalently, the hypothesis of disease is 66 times more supported by the data than the hypothesis of no disease

HIV example revisited

  • Suppose that a subject has a negative test result
  • $DLR_- = (1 - .997) / .985 \approx .003$
  • Therefore, the post-test odds of disease is now $.3%$ of the pretest odds given the negative test.
  • Or, the hypothesis of disease is supported $.003$ times that of the hypothesis of absence of disease given the negative test result