-
Notifications
You must be signed in to change notification settings - Fork 1
/
05_tidymodels.Rmd
81 lines (63 loc) · 2.97 KB
/
05_tidymodels.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
title: "Touring tidyverse"
subtitle: "tidymodels"
output:
xaringan::moon_reader:
lib_dir: libs
nature:
highlightStyle: github
highlightLines: true
countIncrementalSlides: false
---
```{r setup, include=FALSE}
options(htmltools.dir.version = FALSE)
library(magrittr)
```
# Plan
1. Brief intro to `tidymodels`.
1. Demo of `parsnip`/`dials`.
1. Demo of `recipes` (?).
---
background-image: url(https://raw.githubusercontent.com/rstudio/hex-stickers/master/PNG/tidymodels.png)
background-size: 100px
background-position: 90% 6%
# tidymodels
tidymodels is a "meta-package" for modeling and statistical analysis that share the underlying design philosophy, grammar, and data structures of the tidyverse.
1. `broom` takes the messy output of built-in functions in R, such as lm, nls, or t.test, and turns them into tidy data frames.
1. `infer` is a modern approach to statistical inference.
1. `recipes` is a general data preprocessor with a modern interface.
1. `rsample` has infrastructure for resampling data so that models can be assessed and empirically validated.
1. `yardstick` contains tools for evaluating models (e.g. accuracy, RMSE, etc.)
1. `tidypredict` translates some model prediction equations to SQL for high-performance computing.
1. `tidyposterior` can be used to compare models using resampling and Bayesian analysis.
1. `tidytext` contains tidy tools for quantitative text analysis, including basic text summarization, sentiment analysis, and text modeling.
1. `dials` contains tools to create and manage values of tuning parameters and is designed to integrate well with the parsnip package.
---
# Current state
1. Current version - `r packageVersion("tidymodels")`.
1. https://github.com/tidymodels/
1. Developed by __Max Kuhn__, Hadley Wickham, Davis Vaughan.
1. Very early stage.
```{r, echo = FALSE}
tibble::tibble(package = c("broom", "infer", "recipes", "rsample", "yardstick", "tidypredict", "tidyposterior", "tidytext", "parsnip", "dials")) %>%
dplyr::mutate(version = purrr::map_chr(package, ~as.character(packageVersion(.x))))
```
---
# Prior art
1. [`h2o`](https://github.com/h2oai/h2o-3)
1. [`caret`](https://topepo.github.io/caret/index.html)
1. [`mlr`](https://github.com/mlr-org/mlr)
1. [`scikit-learn`](https://scikit-learn.org/stable/index.html)
1. Apache Spark with, e.g., [`sparklyr`](https://spark.rstudio.com/)
1. [MLflow](https://mlflow.org/docs/latest/tutorial.html)
1. [CRAN Machine Learning taskview](https://cran.r-project.org/web/views/MachineLearning.html)
1. More?
---
# Demo
We will have a (very) condensed walkthrough of 2-day workshop "Applied Machine Learning" by Max Kuhn that he had during `rstudio::conf(2019L)`.
This should give you an idea about what `tidymodels` is planning to be and what is possible already now.
---
# Resources
1. https://tidymodels.github.io/model-implementation-principles/index.html
1. Applied Machine Learning workshop - https://github.com/topepo/rstudio-conf-2019
1. https://github.com/tidymodels/tidymodels