-
Notifications
You must be signed in to change notification settings - Fork 0
/
d2t-chenzhu.rmd
121 lines (80 loc) · 2.75 KB
/
d2t-chenzhu.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
# Dare To Tinker (R markdown file)
- Chen Zhu (Birmingham Law School)
- Code demonstration for UoB Msc Responsible Data Science (RDS) students (October 2022)
## exploring law-related datasets in R
### US Judges Ratings (base-R)
```{r}
library(skimr)
data()
str (USJudgeRatings) #==> a dataframe
?USJudgeRatings
require(graphics)
pairs(USJudgeRatings, main = "USJudgeRatings data")
head (USJudgeRatings, n = 20)
tail (USJudgeRatings)
colnames (USJudgeRatings)
USJudgeRatings[,1]
## list Judges' names by using the ~rownames~ function
rownames (USJudgeRatings)
## list all the 12 variables (i.e. column names)
colnames (USJudgeRatings)
skim (USJudgeRatings) #==> skim through the dataset
```
### exploring `historydata` (not a base-R package)
- Lincoln Mullen, ‘Data Sets for Historians [R Package Historydata Version 0.1]’ (24 December 2014) <https://CRAN.R-project.org/package=historydata>
```{r , eval = TRUE, echo = FALSE }
install.packages (historydata)
library (tidyverse)
library (historydata)
# description of the datasets
?historydata
?judges
#==> Federal judges in the United States of America since 1789; two dataframes ‘judges_people’ and ‘judges_appointments’.
#---------------------------
# how many variables in `judges_people`?
colnames (judges_people) # 13 variables
# how to find judges' age range?
judges_people$birth_date
## two ways of doing this
sort (judges_people$birth_date)
judges_people$birth_date %>% sort() # using pipe (from `tidyverse`)
unique (judges_people$birth_date) %>% sort()
# visualisation: histogram of judges birth years
judges_people$birth_date %>% hist()
#==> where is the peak of this histogram? (most judges were born in which decade?)
# find a particular judge: [eg] Is Judge Alsup (Oracle vs Google) in this data?
"Alsup" %in% judges_people$name_last
alsup <- judges_people[judges_people$name_last == "Alsup", ]
str (alsup)
alsup %>% str # use pipe (from "tidyverse")
#---------------------------
# race and gender of judges
judges_people$race
judges_people$gender
# how diverse judges' ethinical background? how many "races" in this dataset?
judges_people$race
judges_people$race %>% unique()
# white
judges_people[judges_people$race == "White",]
# white and female
judges_people[judges_people$race == "White" & judges_people$gender == "F", ] # white female judges
# “Black” judges (African American)
judges_people[judges_people$race == "African American",]
# female black judges
judges_people[judges_people$race == "African American" & judges_people$gender == "F", ]
```
# shell command
## date and calendar
```{shell}
date
cal Oct 2022
cal sep 1752
```
## for loop
```{shell}
for i in $(seq 1 10); do echo "RDS$i"; done
```
## grep
```{shell}
grep -ri law *
```