Fix typos in chpt 09 10 and 13

Wu Jianxiao · Wu Jianxiao · commit 532e7f661ad4 · 2020-11-12T23:17:02.000+01:00
diff --git a/notebooks/09-StatisticalPower.ipynb b/notebooks/09-StatisticalPower.ipynb
@@ -37,7 +37,7 @@
    "source": [
     "## Power analysis\n",
     "\n",
-    "We can compute a power analysis using functions from the `statsmodels.stats.power` package. Let's focus on the power for an independent samples t-test in order to determine a difference in the mean between two groups.  Let's say that we think than an effect size of Cohen's d=0.5 is realistic for the study in question (based on previous research) and would be of scientific interest.  We wish to have 80% power to find the effect if it exists.  We can compute the sample size needed for adequate power using the `TTestIndPower()` function:"
+    "We can compute a power analysis using functions from the `statsmodels.stats.power` package. Let's focus on the power for an independent samples t-test in order to determine a difference in the mean between two groups.  Let's say that we think that an effect size of Cohen's d=0.5 is realistic for the study in question (based on previous research) and would be of scientific interest.  We wish to have 80% power to find the effect if it exists.  We can compute the sample size needed for adequate power using the `TTestIndPower()` function:"
    ]
   },
   {
diff --git a/notebooks/09-StatisticalPower.py b/notebooks/09-StatisticalPower.py
@@ -33,7 +33,7 @@
 # %% [markdown]
 # ## Power analysis
 #
-# We can compute a power analysis using functions from the `statsmodels.stats.power` package. Let's focus on the power for an independent samples t-test in order to determine a difference in the mean between two groups.  Let's say that we think than an effect size of Cohen's d=0.5 is realistic for the study in question (based on previous research) and would be of scientific interest.  We wish to have 80% power to find the effect if it exists.  We can compute the sample size needed for adequate power using the `TTestIndPower()` function:
+# We can compute a power analysis using functions from the `statsmodels.stats.power` package. Let's focus on the power for an independent samples t-test in order to determine a difference in the mean between two groups.  Let's say that we think that an effect size of Cohen's d=0.5 is realistic for the study in question (based on previous research) and would be of scientific interest.  We wish to have 80% power to find the effect if it exists.  We can compute the sample size needed for adequate power using the `TTestIndPower()` function:
 
 # %%
 
diff --git a/notebooks/10-BayesianStatistics.ipynb b/notebooks/10-BayesianStatistics.ipynb
@@ -9,7 +9,7 @@
     "\n",
     "## Applying Bayes' theorem: A simple example\n",
     "TBD: MOVE TO MULTIPLE TESTING EXAMPLE SO WE CAN USE BINOMIAL LIKELIHOOD\n",
-    "A person has a cough and flu-like symptoms, and gets a PCR test for COVID-19, which comes back postiive.  What is the likelihood that they actually have COVID-19, as opposed a regular cold or flu?  We can use Bayes' theorem to compute this.  Let's say that the local rate of symptomatic individuals who actually are infected with COVID-19 is 7.4% (as [reported](https://twitter.com/Bob_Wachter/status/1281792549309386752/photo/1) on July 10, 2020 for San Francisco); thus, our prior probability that someone with symptoms actually has COVID-19 is .074.  The RT-PCR test used to identify COVID-19 RNA is highly specific (that is, it very rarelly reports the presence of the virus when it is not present); for our example, we will say that the specificity is 99%.  Its sensitivity is not known, but probably is no higher than 90%.  \n",
+    "A person has a cough and flu-like symptoms, and gets a PCR test for COVID-19, which comes back postiive.  What is the likelihood that they actually have COVID-19, as opposed to a regular cold or flu?  We can use Bayes' theorem to compute this.  Let's say that the local rate of symptomatic individuals who actually are infected with COVID-19 is 7.4% (as [reported](https://twitter.com/Bob_Wachter/status/1281792549309386752/photo/1) on July 10, 2020 for San Francisco); thus, our prior probability that someone with symptoms actually has COVID-19 is .074.  The RT-PCR test used to identify COVID-19 RNA is highly specific (that is, it very rarelly reports the presence of the virus when it is not present); for our example, we will say that the specificity is 99%.  Its sensitivity is not known, but probably is no higher than 90%.  \n",
     "First let's look at the probability of disease given a single positive test."
    ]
   },
@@ -29,6 +29,8 @@
     "marginal_likelihood = sensitivity * prior + (1 - specificity) * (1 - prior)\n",
     "posterior = (likelihood * prior) / marginal_likelihood\n",
     "posterior\n",
+    "\n",
+    "\n",
     "\n"
    ]
   },
diff --git a/notebooks/10-BayesianStatistics.py b/notebooks/10-BayesianStatistics.py
@@ -19,7 +19,7 @@
 #
 # ## Applying Bayes' theorem: A simple example
 # TBD: MOVE TO MULTIPLE TESTING EXAMPLE SO WE CAN USE BINOMIAL LIKELIHOOD
-# A person has a cough and flu-like symptoms, and gets a PCR test for COVID-19, which comes back postiive.  What is the likelihood that they actually have COVID-19, as opposed a regular cold or flu?  We can use Bayes' theorem to compute this.  Let's say that the local rate of symptomatic individuals who actually are infected with COVID-19 is 7.4% (as [reported](https://twitter.com/Bob_Wachter/status/1281792549309386752/photo/1) on July 10, 2020 for San Francisco); thus, our prior probability that someone with symptoms actually has COVID-19 is .074.  The RT-PCR test used to identify COVID-19 RNA is highly specific (that is, it very rarelly reports the presence of the virus when it is not present); for our example, we will say that the specificity is 99%.  Its sensitivity is not known, but probably is no higher than 90%.  
+# A person has a cough and flu-like symptoms, and gets a PCR test for COVID-19, which comes back postiive.  What is the likelihood that they actually have COVID-19, as opposed to a regular cold or flu?  We can use Bayes' theorem to compute this.  Let's say that the local rate of symptomatic individuals who actually are infected with COVID-19 is 7.4% (as [reported](https://twitter.com/Bob_Wachter/status/1281792549309386752/photo/1) on July 10, 2020 for San Francisco); thus, our prior probability that someone with symptoms actually has COVID-19 is .074.  The RT-PCR test used to identify COVID-19 RNA is highly specific (that is, it very rarelly reports the presence of the virus when it is not present); for our example, we will say that the specificity is 99%.  Its sensitivity is not known, but probably is no higher than 90%.  
 # First let's look at the probability of disease given a single positive test.
 
 # %%
@@ -36,6 +36,8 @@
 
 
 
+
+
 # %% [markdown]
 # The high specificity of the test, along with the relatively high base rate of the disease, means that most people who test positive actually have the disease. 
 # Now let's plot the posterior as a function of the prior.  Let's first create a function to compute the posterior, and then apply this with a range of values for the prior.
diff --git a/notebooks/13-GeneralLinearModel.ipynb b/notebooks/13-GeneralLinearModel.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# The General Linear Model in R\n",
+    "# The General Linear Model\n",
     "In this chapter we will explore how to fit general linear models in Python.  We will focus on the tools provided by the `statsmodels` package."
    ]
   },
diff --git a/notebooks/13-GeneralLinearModel.py b/notebooks/13-GeneralLinearModel.py
@@ -14,7 +14,7 @@
 # ---
 
 # %% [markdown]
-# # The General Linear Model in R
+# # The General Linear Model
 # In this chapter we will explore how to fit general linear models in Python.  We will focus on the tools provided by the `statsmodels` package.
 
 # %%
@@ -95,7 +95,7 @@ def generate_linear_data(slope, intercept,
 import seaborn as sns
 import scipy.stats
 
-scipy.stats.probplot(ols_result.resid, plot=sns.mpl.pyplot)
+_ = scipy.stats.probplot(ols_result.resid, plot=sns.mpl.pyplot)
 
 # %% [markdown]
 # This looks pretty good, in the sense that the residual data points fall very close to the unit line.  This is not surprising, since we generated the data with normally distributed noise.  We should also plot the predicted (or *fitted*) values against the residuals, to make sure that the model does work systematically better for some predicted values versus others.

Original file line number	Diff line number	Diff line change
`@@ -37,7 +37,7 @@`
`37`	`37`	`"source": [`
`38`	`38`	`"## Power analysis\n",`
`39`	`39`	`"\n",`
`40`		- "We can compute a power analysis using functions from the `statsmodels.stats.power` package. Let's focus on the power for an independent samples t-test in order to determine a difference in the mean between two groups. Let's say that we think than an effect size of Cohen's d=0.5 is realistic for the study in question (based on previous research) and would be of scientific interest. We wish to have 80% power to find the effect if it exists. We can compute the sample size needed for adequate power using the `TTestIndPower()` function:"
	`40`	+ "We can compute a power analysis using functions from the `statsmodels.stats.power` package. Let's focus on the power for an independent samples t-test in order to determine a difference in the mean between two groups. Let's say that we think that an effect size of Cohen's d=0.5 is realistic for the study in question (based on previous research) and would be of scientific interest. We wish to have 80% power to find the effect if it exists. We can compute the sample size needed for adequate power using the `TTestIndPower()` function:"
`41`	`41`	`]`
`42`	`42`	`},`
`43`	`43`	`{`
Original file line number	Diff line number	Diff line change
`@@ -33,7 +33,7 @@`
`33`	`33`	`# %% [markdown]`
`34`	`34`	`# ## Power analysis`
`35`	`35`	`#`
`36`		-# We can compute a power analysis using functions from the `statsmodels.stats.power` package. Let's focus on the power for an independent samples t-test in order to determine a difference in the mean between two groups. Let's say that we think than an effect size of Cohen's d=0.5 is realistic for the study in question (based on previous research) and would be of scientific interest. We wish to have 80% power to find the effect if it exists. We can compute the sample size needed for adequate power using the `TTestIndPower()` function:
	`36`	+# We can compute a power analysis using functions from the `statsmodels.stats.power` package. Let's focus on the power for an independent samples t-test in order to determine a difference in the mean between two groups. Let's say that we think that an effect size of Cohen's d=0.5 is realistic for the study in question (based on previous research) and would be of scientific interest. We wish to have 80% power to find the effect if it exists. We can compute the sample size needed for adequate power using the `TTestIndPower()` function:
`37`	`37`
`38`	`38`	`# %%`
`39`	`39`
Original file line number	Diff line number	Diff line change
`@@ -4,7 +4,7 @@`
`4`	`4`	`"cell_type": "markdown",`
`5`	`5`	`"metadata": {},`
`6`	`6`	`"source": [`
`7`		`- "# The General Linear Model in R\n",`
	`7`	`+ "# The General Linear Model\n",`
`8`	`8`	"In this chapter we will explore how to fit general linear models in Python. We will focus on the tools provided by the `statsmodels` package."
`9`	`9`	`]`
`10`	`10`	`},`