forked from markfairbanks/tidytable
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
204 lines (143 loc) · 5.38 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
warning = FALSE,
message = FALSE
)
```
# tidytable <img id="logo" src="man/figures/logo.png" align="right" width="17%" height="17%" />
<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/tidytable)](https://cran.r-project.org/package=tidytable)
![r-universe](https://fastverse.r-universe.dev/badges/tidytable)
[![downloads](http://cranlogs.r-pkg.org/badges/grand-total/tidytable?color=blue)](https://r-pkg.org/pkg/tidytable)
[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/last-month/tidytable?color=blue)](https://markfairbanks.github.io/tidytable/)
[![R-CMD-check](https://github.com/markfairbanks/tidytable/workflows/R-CMD-check/badge.svg)](https://github.com/markfairbanks/tidytable/actions)
<!-- badges: end -->
#### Why `tidytable`?
* `tidyverse`-like syntax
* Fast functions built using two high performance packages: `data.table` and the `tidyverse`'s `vctrs`
* Compatibility with the tidy evaluation framework
## Installation
Install the released version from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("tidytable")
```
Or install the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("markfairbanks/tidytable")
```
## General syntax
`tidytable` uses `verb.()` syntax to replicate `tidyverse` functions:
```{r}
library(tidytable)
df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df %>%
select.(x, y, z) %>%
filter.(x < 4, y > 1) %>%
arrange.(x, y) %>%
mutate.(double_x = x * 2,
x_plus_y = x + y)
```
A full list of functions can be found [here](https://markfairbanks.github.io/tidytable/reference/index.html).
## Using "group by"
Group by calls are done by using the `.by` argument of any function that has "by group" functionality.
* A single column can be passed with `.by = z`
* Multiple columns can be passed with `.by = c(y, z)`
```{r}
df %>%
summarize.(avg_x = mean(x),
count = n(),
.by = z)
```
### `.by` vs. `group_by()`
`tidytable` follows `data.table` semantics where `.by` must be called each time you want a function to operate "by group".
Below is some example `tidytable` code that utilizes `.by` that we'll then compare to its `dplyr` equivalent. The goal is to grab the first two rows of each group using `slice.()`, then add a group row number column using `mutate.()`:
```{r}
library(tidytable)
df <- data.table(x = c("a", "a", "a", "b", "b"))
df %>%
slice.(1:2, .by = x) %>%
mutate.(group_row_num = row_number(), .by = x)
```
Note how `.by` is called in both `slice.()` and `mutate.()`.
Compared to a `dplyr` pipe chain that utilizes `group_by()`, where each function operates "by group" until `ungroup()` is called:
```{r}
library(dplyr)
df <- tibble(x = c("a", "a", "a", "b", "b"))
df %>%
group_by(x) %>%
slice(1:2) %>%
mutate(group_row_num = row_number()) %>%
ungroup()
```
Note that the `ungroup()` call is unnecessary in `tidytable`.
## tidyselect support
`tidytable` allows you to select/drop columns just like you would in the tidyverse by utilizing the [`tidyselect`](https://tidyselect.r-lib.org) package in the background.
Normal selection can be mixed with all `tidyselect` helpers: `everything()`, `starts_with()`, `ends_with()`, `any_of()`, `where()`, etc.
```{r}
df <- data.table(
a = 1:3,
b1 = 4:6,
b2 = 7:9,
c = c("a", "a", "b")
)
df %>%
select.(a, starts_with("b"))
```
To drop columns use a `-` sign:
```{r}
df %>%
select.(-a, -starts_with("b"))
```
These same ideas can be used whenever selecting columns in `tidytable` functions - for example when using `count.()`, `drop_na.()`, `across.()`, `pivot_longer.()`, etc.
A full overview of selection options can be found [here](https://tidyselect.r-lib.org/reference/language.html).
### Using tidyselect in `.by`
`tidyselect` helpers also work when using `.by`:
```{r}
df <- data.table(
a = 1:3,
b = c("a", "a", "b"),
c = c("a", "a", "b")
)
df %>%
summarize.(avg_a = mean(a), .by = where(is.character))
```
## Tidy evaluation compatibility
Tidy evaluation can be used to write custom functions with `tidytable` functions. The embracing shortcut `{{ }}` works, or you can use `enquo()` with `!!` if you prefer:
```{r}
df <- data.table(x = c(1, 1, 1), y = c(1, 1, 1), z = c("a", "a", "b"))
add_one <- function(data, add_col) {
data %>%
mutate.(new_col = {{ add_col }} + 1)
}
df %>%
add_one(x)
```
The `.data` and `.env` pronouns also work within `tidytable` functions:
```{r}
var <- 10
df %>%
mutate.(new_col = .data$x + .env$var)
```
A full overview of tidy evaluation can be found [here](https://rlang.r-lib.org/reference/topic-data-mask.html).
## `dt()` helper
The `dt()` function makes regular `data.table` syntax pipeable, so you can easily mix `tidytable` syntax with `data.table` syntax:
```{r}
df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df %>%
dt(, .(x, y, z)) %>%
dt(x < 4 & y > 1) %>%
dt(order(x, y)) %>%
dt(, double_x := x * 2) %>%
dt(, .(avg_x = mean(x)), by = z)
```
## Speed Comparisons
For those interested in performance, speed comparisons can be found [here](https://markfairbanks.github.io/tidytable/articles/speed_comparisons.html).