forked from alejandroschuler/r4ds-courses
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.qmd
226 lines (175 loc) · 11.8 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
---
title: Data Science Workshop - Fall 2024
format:
html:
css: website/styles.css
theme: default
toc: true
toc-location: left
toc-title: Navigation
page-layout: full
sidebar:
contents: auto
project-type: website
---
# Welcome!
This is the website for the Biostatistics and Data Science Workshop, organized by Division of Biostatistics at the University of California, Berkeley.
# Workshop Overview
This two-day workshop provides an immersive experience in basic data analysis using R, a popular programming language in data science. Most lecutres are largely based on the wonderful lectures prepared by Prof. Alejandro Schuler for his [R for Data Science course](https://alejandroschuler.github.io/r4ds-courses/){target="_blank"}. Participants will gain experience at data I/O, transformation, programming, and visualization in R. We will use a consistent set of packages for these tasks called the `tidyverse`.
In addition, the workshop provides:
**Research Exposure**: You will hear from faculty members and graduate students from the Biostatistics department at UC Berkeley, who will share insights into their groundbreaking research. We will also conduct in seminars on soft skill development.
**Project Experience**: At the end of the workshop, we will form groups and work together to create our own data visualization web app using the tools we have learned throughout the weekend. Everyone will have an opportunity to contribute to the project and present their app to the whole workshop.
**Networking Opportunities**: Informal networking sessions will offer the chance for participants to connect with peers, graduate students, and faculty members over food and drinks.
# Instructor Team
::: {.panel-tabset .tabset-fade .tabset-pills}
## Team
::: group-pic
<img src="website/assets/last_year_workshop.jpg" alt="Photo of last year's workshop"/>
:::
Click through the tabs to meet our instructor team for the weekend!
## Kaitlyn Lee
::: profile-pic
<img src="website/assets/kaitlyn.jpeg" alt="Kaitlyn Lee"/>
:::
Kaitlyn (she/her) is a PhD student studying Biostatistics. She is also the Diversity, Equity, Inclusion, and Belonging (DEIB) Student Fellow in the Division. Kaitlyn is interested in developing methods that use machine learning and statistics to answer questions pertaining to health and social policy. Outside of school, she likes to bake for her friends and go on hikes with her dog, Lulu.
## Alissa Gordon
::: profile-pic
<img src="website/assets/alissa.jpeg" alt="Alissa Gordon"/>
:::
Alissa is a second year MA-PhD Biostatistics student. She is interested in causal inference and machine learning methods that influence the design and analysis of various trial designs in the clinical setting. In her free time, she likes to explore the outdoors and watch movies with friends.
## Emily Hou
::: profile-pic
<img src="website/assets/emily.jpeg" alt="Emily Hou"/>
:::
Emily is a first year M.A. student in Biostatistics at UC Berkeley. She is interested in precision medicine, which uses a patient's genetics, lifestyle, and environment to determine which medical treatments will suit them best. Outside of school, Emily enjoys rock climbing, knitting, doing escape rooms with friends, and playing the New York Times games.
## Sylvia Song
::: profile-pic
<img src="website/assets/sylvia.jpeg" alt="Sylvia Song"/>
:::
My name is Sylvia Song. I am a 2nd year MS student in Biostatistics. I graduated from the University of Wisconsin-Madison and I took one gap year working at Texas Medical Center. In my free time I love to go birding and cooking. I am looking forward to meet you all!
## Yutine (Yoyo) Wu
::: profile-pic
<img src="website/assets/yoyo.jpeg" alt="Yoyo Wu"/>
:::
Yutine (Yoyo) is a first year MA student in Biostatistics at UC Berkeley. She is interested in causal inference and clinical trials. In her free time, she likes hiking and recently tried Pilates. She always excited to explore delicious food and new activities.
## Lauren Liao
::: profile-pic
<img src="website/assets/lauren.jpeg" alt="Lauren Liao"/>
:::
Lauren graduated with her PhD in Biostatistics from UC Berkeley. She enjoys tackling real-world causal inference problems in observational study design and randomized trials. Outside of statistics, she likes to play board games and make art.
:::
# Prerequisites
No prior experience with R or coding is needed nor expected. We will use R through the RStudio interface. The easiest way to access RStudio is through the cloud: [posit.cloud](https://posit.cloud){target="_blank"}. It's fast, easy, and free - just go the link, click "get started" and create an account. Once you're in, click "new project" near the upper-right and the RStudio interface will open.
Alternatively, you can install R and RStudio on your own computer:
[Follow this link](https://cran.rstudio.com/){target="_blank"} and click on the appropriate options for your operating system to install R,
then do the same to [install RStudio](https://posit.co/download/rstudio-desktop/#download){target="_blank"}.
# Learning Goals
By the end of the course, you will be able to:
* comfortably use R through the Rstudio interface
* read and write tabular data between R and flat files
* subset, transform, summarize, join, and plot data
* write reusable and readable programs
* seek out, learn, and integrate new packages and code into your analyses
# Schedule
Subject to change
## Day 1: Introduction to R and RStudio
| Time | Activity |
|----------------------|-------------------------------------------------|
| 9 AM - 10 AM | [Welcome](https://docs.google.com/presentation/d/1jvmNNnyfmf9_0mncpjdNoUjYlUGZiCXWY74cx9hQ604/edit?usp=sharing){target="_blank"} and [Introduction](https://docs.google.com/presentation/d/1q4xhvJQtOkojJnbvB78gB9YrnTTT-_eZ2plo3n77v5A/edit?usp=sharing){target="_blank"} |
| 10 AM - 11 AM | Lecture 1: [Intro and Plotting](lectures/1-intro-plotting.html){target="_blank"} |
| 11 AM - 12 PM | Faculty Panel |
| 12 PM - 12:30 PM | Lunch |
| 12:30 PM - 2 PM | Lecture 1 (cont.): [Intro and Plotting](lectures/1-intro-plotting.html){target="_blank"} |
| 2 PM - 3 PM | Campus Tour |
| 3 PM - 5 PM | Lecture 2: [R Programming](lectures/2-r-basics.html){target="_blank"} |
## Day 2: Data Wrangling and Visualization
| Time | Activity |
|---------------------|----------------------------------------------|
| 9 AM - 10 AM | Lecture 3: [Tabular Data](lectures/3-data-transformation.html){target="_blank"} |
| 10 AM - 11 AM | Graduate Student Panel |
| 11 AM - 12:30 PM | Lecture 3: [Tabular Data](lectures/3-data-transformation.html){target="_blank"} |
| 12:30 PM - 1 PM | Lunch |
| 1 PM - 2 PM | Mentorship Discussion |
| 2 PM - 3:30 PM | Lecture 4: [Intro to Shiny](lectures/4-r-shiny-basics.html) - [Shiny App Skeleton](https://github.com/UCB-Biostatistics-Workshop/f24/blob/summer-2023/ShinySkeleton.R){target="_blank"} |
| 3:30 PM - 5 PM | Group Project: Create Your Own Shiny App! |
# Slide Details
<table>
<tbody>
<tr>
<th>Module</th>
<th>Topic</th>
<th>Learning Goals</th>
<th>Packages</th>
<th>Reading</th>
</tr>
<tr>
<td>1</td>
<td>[Intro and Plotting](lectures/1-intro-plotting.html){target="_blank"}</td>
<td><ul>
<li>issue commands to R using the Rstudio REPL interface</li>
<li>load a package into R</li>
<li>read some tabular data into R</li>
<li>visualize tabular data using ggplot geoms, aesthetics, and facets</li>
</ul></td>
<td><ul>
<li><code>ggplot2</code></li>
</ul></td>
<td>R4DS ch. 1, 9-11</td>
</tr>
<tr>
<td>2</td>
<td>[R Programming](lectures/2-r-basics.html){target="_blank"}</td>
<td>
<ul>
<li>save values to variables</li>
<li>find and call R functions with multiple arguments by position and name</li>
<li>recognize and index vectors and lists</li>
<li>recognize, import, and inspect data frames</li>
<li>issue commands to R using the Rstudio script pane</li>
</ul></td>
<td><ul>
<li><code>tibble</code></li>
<li><code>readr</code></li>
</ul></td>
<td>R4DS ch. 2, 4, 6, 8, 20</td>
</tr>
<tr>
<td>3</td>
<td>[Tabular Data](lectures/3-data-transformation.html){target="_blank"}</td>
<td><ul>
<li>filter rows of a dataset based on conditions</li>
<li>arrange rows of a dataset based on one or more columns</li>
<li>select columns of a dataset</li>
<li>mutate existing columns to create new columns</li>
<li>use the pipe to combine multiple operations</li>
</ul></td>
<td><ul>
<li><code>dplyr</code></li>
</ul></td>
<td>R4DS ch. 3, 12-16, 18</td>
</tr>
<tr>
<td>4</td>
<td>[Intro to Shiny](lectures/4-r-shiny-basics.html)</td>
<td>
<ul>
<li>learn what a Shiny App is and why it is useful</li>
<li>be able to differentiate between what goes in the UI versus the server</li>
<li>use inputs, outputs, and reactive coding to create your own Shiny app</li>
</ul></td>
<td><ul>
<li><code>shiny</code></li>
</ul></td>
<td> </td>
</tr>
</tbody>
</table>
# Further learning
If you are interested in a deeper dive on the topics we will be discussing, Prof. Schuler recommends the fantastic book [R for Data Science](https://r4ds.hadley.nz/){target="_blank"} (R4DS:2e) by [Hadley Wickham](http://hadley.nz/){target="_blank"}, Mine Çetinkaya-Rundel, and Garrett Grolemund (O'Reilly Media, 2017); it is online for free and also available in hardcopy.
# Background and Contact Information
## Background
Historically underrepresented minority (URM) students often face significant obstacles in accessing equitable educational opportunities, particularly in the fields of Science, Technology, Engineering, and Mathematics (STEM). These challenges are most pronounced during the early stages of their education, notably in high school and at the start of their undergraduate studies. Early exposure to STEM disciplines, including programming and data analysis, is critical for these students as it not only provides the necessary skills and experiences for advanced research or graduate studies but also opens up new potential career paths.
With this in mind, the long-term goal of our Biostatistics Workshop is to offer URM students an opportunity for equitable exposure to programming and data analysis. By doing so, we aim to showcase how data science and biostatistics can be both engaging and valuable as potential career options. Importantly, this workshop is dedicated to addressing educational inequality, rather than simply increasing the number of URM applicants in our department.
We hosted the first iteration of this workshop in Spring 2023. Fall 2024 will be the second iteration.
## Contact Information
If you have any questions, please feel free to reach out to us at [biostat-outreach\@berkeley.edu](mailto:biostat-outreach@berkeley.edu).