forked from bvkrauth/is4e
-
Notifications
You must be signed in to change notification settings - Fork 0
/
01-Introduction.Rmd
282 lines (210 loc) · 11.6 KB
/
01-Introduction.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
# Introduction {#introduction}
::: goals
**Chapter goals**
In this chapter we will:
- Review educational goals and establish course expectations
- Gather resources and tools, including all needed computer software.
:::
## Course goals and context {#course-goals-and-context}
ECON 233 is the first course in the two-course econometrics sequence
that is required for all economics majors. If you've never seen the word
before, "econometrics" just means statistics and data analysis for
economics.
::: goals
**Course goals**
By the end of this course:
You will develop computer skills:
1. Clean and analyze data in Excel.
2. Analyze and graph data in R.
3. Follow recommended practices for data management and reproducible analysis.
You will become familiar with basic statistical concepts:
4. Calculate and interpret probabilities and expected values.
5. Explain the relationship between population and sample.
6. Describe the properties of a statistic or estimator including its
probability distribution, expected value, variance, bias, and mean
squared error.
7. State and apply the law of large numbers and central limit theorem.
You will be able to apply these skills in combination to analyze
real-world economic data:
8. Construct and interpret common charts including histograms, scatter plots,
and time-series plots.
9. Construct and interpret frequency tables and cross-tabulations.
10. Construct and interpret common univariate and bivariate statistics,
including mean, variance, standard deviation, covariance and correlation.
11. Construct and interpret hypothesis tests and confidence intervals.
We will be switching back and forth between theory, data analysis and
applications. All three skill sets are valuable.
:::
Hopefully you are in this course because you are fascinated by the course
material, and would take it even if it wasn't required for the economics
major. But that isn't the case for many of you, so I'd like to motivate
everyone to take this course as an opportunity to learn some very useful
skills.
Today's world is awash in data:
- retailers maintain databases of transactions
- manufacturers track product quality and costs
- marketers collect data on customers and potential customers
- government records everyone's interactions with schools,
tax authorities, social welfare, health care and criminal justice,
- employers maintain detailed personnel records.
These databases can be linked and analyzed in various ways, and many
of the world's most successful companies rely heavily on the innovative
gathering and usage of data:
- Google's core product (the search engine) is built on the
innovative analysis of massive amounts of data.
- Both Google and the major social media companies
are based on providing valuable "free"
services in order to gather data on consumers that can
then be sold (in some form) to other businesses.
- Amazon and other retailers use what is called A/B testing
to fine-tune product descriptions and set prices
so as to maximize profits.
Some of this data analysis is done by computer scientists, but
much of it is done by economists: for example, Amazon is the
second-largest employer of PhD economists in the US
(after the Federal Reserve System).
::: economics
I always tell students thinking about the future to remember
supply and demand in the labour market. In the labour market
your skills and effort are the product, and you are the seller.
Like all sellers, you want to be expensive. This requires
that you have skills that are:
- Useful (high demand)
- Uncommon (low supply)
The ability to analyze data in a sophisticated way, and to explain
the results in written or oral presentation, is an extremely
useful and uncommon skill. Most of you do not have the technical
skills of your colleagues in Computer Science, but if you can combine
a reasonable level of computer skills with writing, knowledge
of the underlying statistical principles, and the ability to
recognize the economic considerations in a situation, you
will do quite well.
:::
All economics majors have the option of taking ECON 233 or
BUS 232, so you may be wondering what the difference is. Either course
is suitable preparation for ECON 333, but there are some key differences:
- *Tools:* ECON 233 uses both Excel and R, while BUS 232 uses Excel.
- You are likely to use R in ECON 333 and other upper-division ECON
courses, so it is nice to get used to it now.
- *Applications:* ECON 233 emphasizes economics applications, while
BUS 232 emphasizes business applications.
ECON 233 is part of the Social Data and Analytics (SDA) minor; if you
are an economics student and are interested in that minor you are
recommended to take ECON 233.
::: fyi
**Related courses**
ECON 333 is the second course in the two-course econometrics
sequence required for all economics majors. In ECON 333,
you will learn more advanced techniques including
linear regression, you will use R more extensively,
and you will go deeper into the theory.
If you find you enjoy and/or do well in this course, I would strongly
encourage you to take further courses in econometrics:
- [ECON 334: Data Visualization and Economic Analysis](https://www.sfu.ca/students/calendar/2021/spring/courses/econ/334.html)
is an elective focusing on exploratory data analysis and visualization
- [ECON 335: Introduction to Causal Inference and Policy Evaluation](https://www.sfu.ca/students/calendar/2021/spring/courses/econ/335.html)
is an elective focusing on the problem of inferring cause-and-effect from
economic data, and using data to forecast the effects of economic
policies.
- [ECON 433: Financial and Time Series Econometrics](https://www.sfu.ca/students/calendar/2021/spring/courses/econ/433.html) is an advanced elective focusing on techniques
for analyzing the kind of time series data that is used in macroeconomics
and financial markets.
- [ECON 435: Econometric Methods](https://www.sfu.ca/students/calendar/2021/spring/courses/econ/435.html) is an advanced course in statistics
and econometrics that is part of our honours sequence. It gives you the
opportunity and tools to write a serious empirical research
paper. Non-honours students are eligible to take it if they have
a 3.0 CGPA and the course prerequisites.
I would also encourage you to take courses outside of the economics
department, and to consider a [Statistics minor](http://www.sfu.ca/students/calendar/2021/spring/programs/statistics/minor.html) or the new
interdisciplinary [Social Data Analytics (SDA) minor](http://www.sfu.ca/students/calendar/2021/spring/programs/social-data-analytics/minor.html) .
:::
## Organization and expectations {#organization-and-expectations}
The course Canvas page is available at https://canvas.sfu.ca/courses/59191. It includes information on lectures, tutorials, quizzes, and assignments.
The course is constructed under the assumptions that:
- You have taken introductory microeconomics (ECON 103 at SFU) and
introductory macroeconomics (ECON 105 at SFU).
We will use ideas from those courses in applications and examples.
- You have seen some probability and statistics content in high school,
though you may not remember much.
- You can do high-school level math including algebra and basic set theory
and have taken or are currently taken an introductory calculus course.
- You have access to a desktop or laptop computer, and basic computer skills.
This is not a class in introductory economics, high school math, or basic
computer skills. If you are a little behind in those skills
you will need to ask for help, but I am happy to help anyone who asks.
## Computer resources {#computer-resources}
To do the computer work you will need a computer with internet access and
the following software packages installed:
- Microsoft Excel
- R
- RStudio
The required software packages are available free of charge for SFU students. They are also installed on all campus lab computers,
I will teach the course using Windows, and can provide technical support for
Windows. All of the necessary tools are also available for macOS, but my
ability to provide technical support is more limited. I'll do the best I can.
### Installing Microsoft Excel {#installing-microsoft-excel}
Microsoft Excel is a well-known spreadsheet program that is available for both Windows
and macOS. Alternatives to Excel include Google Sheets and Apple Numbers.
SFU has a licensing agreement with Microsoft that allows its students
free installation of the entire Microsoft Office suite, including Excel.
Installation instructions are available at
https://www.sfu.ca/itservices/technical/software/office365.html.
Once you have installed Excel, you should confirm that it is working by
starting the program. You should see something that looks like this:
![Excel blank workbook](bin/ExcelBlankWorkbook.png)
### Installing R and RStudio {#installing-r-and-rstudio}
Later in the semester, we will also be using a more specialized statistical
program called R, and a related program called RStudio.
- R is a programming language used for statistical analysis.
- RStudio is an "Integrated Development Environment" for R, that is
it is an integrated set of tools for building and running R
programs.
Both R and RStudio are open-source, and are available free of charge for both Windows and macOS. Installation instructions are available at:
https://rstudio.com/products/rstudio/download/#download.
NOTE: install R first, then RStudio.
After installing R and RStudio, you should confirm that they are working by
opening RStudio. You should see something like this:
![RStudio open screenshot](bin/RStudio open screenshot.jpg)
### Installing the Tidyverse {#installing-the-tidyverse}
One of the most useful features of R is that it allows users to write and
distribute ***packages*** that extend its capabilities.
One of the most popular and useful packages is called the ***Tidyverse***. R
is a very powerful program, but it is also a very old one: the underlying
language (called "S") was originally created in 1976. The result of this
is that some of the original commands are outdated in design and aren't
well suited for modern capabilities or principles of software development.
The Tidyverse solves this problem by adding new, more modern versions of these
commands. You can learn more about the Tidyverse at https://www.tidyverse.org/.
To install the Tidyverse package:
1. Open RStudio if it isn't already open.
2. Click in the `Console` window (you will see it towards the bottom of the screen)
3. Enter `install.packages("tidyverse")` (i.e., type it and hit the `<enter>` key)
Once the installation is concluded and the `>` prompt reappears you can test
to make sure the installation worked.
4. Enter `library(tidyverse)` in the console window.
- If you don't get an error message, the installation worked.
- If you get an error message, drop by office hours to get help.
If you run into trouble, don't worry. We will not need the Tidyverse for
a few weeks, so there is plenty of time to get help.
## Conventions of this book {#conventions-of-this-book}
This book uses consistent visual conventions to convey information.
- Organization:
- Each chapter corresponds to one week of the course.
- Typography:
- Computer code or other inputs are shown `like this`.
- When new terminology is introduced, it is shown ***like this***.
- Boxes:
- Pull-out information is shown in colored boxes.
::: example
Boxes like this are for examples
:::
::: goals
Boxes like this are for showing course or chapter goals.
:::
::: economics
Boxes like this are for providing economic background
:::
::: fyi
Boxes like this are for providing optional information that
might be of interest to some students.
:::