Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog post for tibble 2.0.1 #266

Merged
merged 21 commits into from
Jan 16, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
221 changes: 221 additions & 0 deletions content/articles/2019-01-tibble-2-0-1.Rmarkdown
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
---
title: tibble 2.0.1
slug: tibble-2.0.1
description: >
Tibbles are a modern reimagining of the data frame, keeping what time has shown to be effective, and throwing out what is not, with nicer default output too! This article describes the latest major release and provides an outlook on further developments
date: 2019-01-15
author: Kirill Müller
photo:
url: https://unsplash.com/photos/KA89yJKYtjE
author: Marcello Gennari
categories: [package]
---


```{r setup, include = FALSE}
library(tidyverse)
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", error = TRUE)
options(rlang_trace_top_env = .GlobalEnv)

options(crayon.enabled = TRUE)
```

```{r setup-hooks, comment="", results="asis", echo=FALSE}
old_hooks <- fansi::set_knit_hooks(
knitr::knit_hooks,
which = c("output", "error"),
proc.fun = function(x, class) {
x <- unlist(strsplit(x, "\n", fixed = TRUE))
fansi::html_code_block(fansi::sgr_to_html(fansi::html_esc(x)), class = class)
}
)
```


I'm pleased to announce that version 2.0.1 of the *tibble* package is on CRAN now, just in time for [rstudio::conf()](https://www.rstudio.com/conference/). Tibbles are a modern reimagining of the data frame, keeping what time has shown to be effective, and throwing out what is not, with nicer default output too! Grab the latest version with:

```r
install.packages("tibble")
```

This release required a bit of preparation, including a [pre-release blog post](https://www.tidyverse.org/articles/2018/11/tibble-2.0.0-pre-announce/) that described the breaking changes, mostly in [`as_tibble()`](https://tibble.tidyverse.org/reference/as_tibble.html), [`new_tibble()`](https://tibble.tidyverse.org/reference/new_tibble.html), [`set_tidy_names()`](https://tibble.tidyverse.org/reference/set_tidy_names.html), [`tidy_names()`](https://tibble.tidyverse.org/reference/tidy_names.html), and `names<-()`, and a patch release that fixed problems found after the initial 2.0.0 release.
In this blog post, I focus on a few user- and programmer-related changes, and give an outlook over future development:

- [`view()`](https://tibble.tidyverse.org/reference/view.html), nameless [`enframe()`](https://tibble.tidyverse.org/reference/enframe.html), 2D columns
- Lifecycle, robustness, name repair, row names, [`glimpse()`](https://tibble.tidyverse.org/reference/glimpse.html) for subclasses
- _vctrs_, dependencies, decorations

For a complete overview please see the [release notes](https://github.com/tidyverse/tibble/releases/tag/v2.0.0).

Use the [issue tracker](https://github.com/tidyverse/tibble/issues) to submit bugs or suggest ideas, your contributions are always welcome.

## Changes that affect users

### view

The experimental [`view()`](https://tibble.tidyverse.org/reference/view.html) function forwards its input to `utils::View()` (only in interactive mode) and always returns its input invisibly, which is useful for pipe-based workflows.
Currently it is unclear if this functionality should live in _tibble_ or elsewhere.

```{r view}
# This is a no-op in non-interactive mode.
# In interactive mode, a viewer window/pane will open.
iris %>%
view()
```


### Nameless enframe

The [`enframe()`](https://tibble.tidyverse.org/reference/enframe.html) function always has been a good way to convert a (named) vector to a two-column data frame.
In this version, conversion to a one-column data frame is also supported by setting the `name` argument to `NULL`.
This is now the recommended way to turn a vector to a one-column tibble, due to changes to the default implementation of [`as_tibble()`](https://tibble.tidyverse.org/reference/as_tibble.html).

```{r enframe}
enframe(letters[1:3])
enframe(letters[1:3], name = NULL)
```


### 2D columns

`tibble()` now supports columns that are matrices or data frames.
These have always been supported in data frames and are used in some modelling functions.
We are looking forward to supporting these and other exciting use cases, see also the [Matrix and data frame columns](https://adv-r.hadley.nz/vectors-chap.html#matrix-and-data-frame-columns) chapter of adv-r.
The number of rows in these objects must be consistent with the length of the other columns.
Internally, this feature required using `NROW()` instead of `length()` in a few spots, which conveniently returns the length for vectors and the number of rows for 2D objects.
The required support in _pillar_ has been added earlier last year.

```{r hierarchical}
tibble(
a = 1:3,
b = tibble(c = 4:6),
d = tibble(e = 7:9, f = tibble(g = 10, h = 11)),
i = diag(3)
)
```

## Changes that affect package developers

### Lifecycle

[![Life
cycle](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://www.tidyverse.org/lifecycle/)

All functions have been assigned a lifecycle.
The _tibble_ package has now reached the "stable" lifecycle, functions in a different lifecycle stage are marked as such in their documentation.
One example is the [`add_row()`](https://tibble.tidyverse.org/reference/add_row.html) function: it is unclear if it should ensure that all columns have length one by wrapping in a list if necessary, and a better implementation is perhaps possible once _tibble_ uses the _vctrs_ package, see below.
Therefore this function is marked "questioning".
Learn more about lifecycle in the tidyverse at https://www.tidyverse.org/lifecycle/.

### Robustness

The new `.rows` argument to [`tibble()`](https://tibble.tidyverse.org/reference/tibble.html) and [`as_tibble()`](https://tibble.tidyverse.org/reference/as_tibble.html) allows specifying the expected number of rows explicitly, even if it's evident from the data.
This supports writing even more defensive code.
The `nrow` argument to the low-level [`new_tibble()`](https://tibble.tidyverse.org/reference/new_tibble.html) constructor is now mandatory, on the other hand most expensive checks have been moved to the new [`validate_tibble()`](https://tibble.tidyverse.org/reference/validate_tibble.html) function.
This means that constructions of tibbles is now faster by default if you know that the inputs are correct, but you can always double-check if needed.
See also the [S3 classes](https://adv-r.hadley.nz/s3.html#s3-classes) chapter in adv-r for motivation.

```{r nrow}
tibble(a = 1, b = 1:3, .rows = 3)
tibble(a = 1, b = 2:3, .rows = 3)
tibble(a = 1, .rows = 3)
as_tibble(iris[1:3, ], .rows = 3)
new_tibble(list(a = 1:3), nrow = 3)
bad <- new_tibble(list(a = 1:2), nrow = 3)
validate_tibble(bad)
```


### Name repair

Column name repair has more direct support, via the new `.name_repair` argument to [`tibble()`](https://tibble.tidyverse.org/reference/tibble.html) and [`as_tibble()`](https://tibble.tidyverse.org/reference/as_tibble.html).
It takes the following values:

- `"minimal"`: No name repair or checks, beyond basic existence.
- `"unique"`: Make sure names are unique and not empty.
- `"check_unique"`: (default value), no name repair, but check they are `unique`.
- `"universal"`: Make the names `unique` and syntactic.
- a function: apply custom name repair (e.g., `.name_repair = make.names` or `.name_repair = ~make.names(., unique = TRUE)` for names in the style of base R).

```{r repair-names}
## by default, duplicate names are not allowed
tibble(`1a` = 1, `1a` = 2)

## you can authorize duplicate names
tibble(`1a` = 1, `1a` = 2, .name_repair = "minimal")
## or request that the names be made unique
tibble(`1a` = 1, `1a` = 2, .name_repair = "unique")
## or universal
tibble(`1a` = 1, `1a` = 2, .name_repair = "universal")
```

### Row names

Row name handling is stricter.
Row names were never supported in [`tibble()`](https://tibble.tidyverse.org/reference/tibble.html) and [`new_tibble()`](https://tibble.tidyverse.org/reference/new_tibble.html), and are now stripped by default in [`as_tibble()`](https://tibble.tidyverse.org/reference/as_tibble.html).
The `rownames` argument to [`as_tibble()`](https://tibble.tidyverse.org/reference/as_tibble.html) supports:

- `NULL`: remove row names (default),
- `NA`: keep row names,
- A string: the name of the new column that will contain the existing row names, which are no longer present in the result.

The old default can be restored by calling `pkgconfig::set_config("tibble::rownames", NA)`, this also works for packages that import _tibble_.

```{r row-names}
rownames(as_tibble(mtcars))
as_tibble(mtcars, rownames = "make_model")
```


### glimpse for subclasses

The [`glimpse()`](https://tibble.tidyverse.org/reference/glimpse.html) function shows information obtained from [`tbl_sum()`](https://tibble.tidyverse.org/reference/tbl_sum.html) in the header, e.g. grouping information for `grouped_df` from _dplyr_, or other information from packages that override the `tbl_df` class.

```{r}
iris %>%
group_by(Species) %>%
glimpse()
```


## Outlook

### vctrs

The plan is to [use _vctrs_](https://github.com/tidyverse/tibble/issues/521) in _tibble_ 2.1.0.
This package is a solid foundation for handling coercion, concatenation and recycling in vectors of arbitrary type.
The support provided by _vctrs_ will yield a better [`add_row()`](https://tibble.tidyverse.org/reference/add_row.html) implementation, in return name repair which is currently defined in _tibble_ should likely live in _vctrs_.

### Dependencies

Currently, installing _tibble_ can bring in almost dozen other packages:

```{r deps}
tools::package_dependencies("tibble", recursive = TRUE, which = "Imports")
```

Some of them, namely _fansi_ and _utf8_, contain code that requires compilation and are only required for optional features.
[The plan](https://github.com/tidyverse/tibble/issues/475) is to make these packages, and _crayon_, a suggested package to _cli_, and provide fallback implementations there.
When finished, taking a strong dependency on _tibble_ won't add too many new dependencies (again): _rlang_, _vctrs_ and _cli_ will be used by most of the tidyverse anyway, _pillar_ is the only truly new strong dependency.
Packages that subclass `tbl_df` should import _tibble_ to make sure that the subsetting operator `[` always behaves the same.
Constructing (subclasses of) tibbles should happen through [`new_tibble()`](https://tibble.tidyverse.org/reference/new_tibble.html) only.


### Decorations

Tibbles have a very opinionated way to print their data, not always in line with users' expectations, and sometimes clearly wrong (e.g. for numerical data where the absolute mean is much larger than the standard deviation).
It seems difficult to devise a formatting that suits all needs, especially for numbers: how do we tell if a number represents money, or perhaps is a misspecified categorical variable or a UID?
[Decorations](https://github.com/tidyverse/tibble/pull/411) are an idea that might help here.
A decoration is applied only when printing a vector, which behaves identically to a bare vector otherwise.
Decorations can be "learned" from the data (using heuristics), or specified directly after import or when creating column,
and stored in attribues like `"class"`.
It will be important to make sure that these attributes survive subsetting and perhaps some arithmetic transformations, easiest to achieve with the help of _vctrs_.


## Acknowledgments

Thanks to Brodie Gaslam ([&#x0040;brodieG](https://github.com/brodieG)) for his help with formatting this blog post and for spotting inaccurate wording.

We also received issues, pull requests, and comments from 108 people since tibble 1.4.2. Thanks to everyone:

[&#x0040;adam-gruer](https://github.com/adam-gruer), [&#x0040;aegerton](https://github.com/aegerton), [&#x0040;alaindanet](https://github.com/alaindanet), [&#x0040;alexpghayes](https://github.com/alexpghayes), [&#x0040;alexwhan](https://github.com/alexwhan), [&#x0040;alistaire47](https://github.com/alistaire47), [&#x0040;anhqle](https://github.com/anhqle), [&#x0040;batpigandme](https://github.com/batpigandme), [&#x0040;brendanf](https://github.com/brendanf), [&#x0040;brodieG](https://github.com/brodieG), [&#x0040;cfhammill](https://github.com/cfhammill), [&#x0040;christophsax](https://github.com/christophsax), [&#x0040;cimentadaj](https://github.com/cimentadaj), [&#x0040;czeildi](https://github.com/czeildi), [&#x0040;DasHammett](https://github.com/DasHammett), [&#x0040;DavisVaughan](https://github.com/DavisVaughan), [&#x0040;earowang](https://github.com/earowang), [&#x0040;Eluvias](https://github.com/Eluvias), [&#x0040;Enchufa2](https://github.com/Enchufa2), [&#x0040;esford3](https://github.com/esford3), [&#x0040;flying-sheep](https://github.com/flying-sheep), [&#x0040;gavinsimpson](https://github.com/gavinsimpson), [&#x0040;GeorgeHayduke](https://github.com/GeorgeHayduke), [&#x0040;gregorp](https://github.com/gregorp), [&#x0040;hadley](https://github.com/hadley), [&#x0040;IndrajeetPatil](https://github.com/IndrajeetPatil), [&#x0040;iron0012](https://github.com/iron0012), [&#x0040;isteves](https://github.com/isteves), [&#x0040;jeffreyhanson](https://github.com/jeffreyhanson), [&#x0040;jennybc](https://github.com/jennybc), [&#x0040;jimhester](https://github.com/jimhester), [&#x0040;JLYJabc](https://github.com/JLYJabc), [&#x0040;joranE](https://github.com/joranE), [&#x0040;jtelleriar](https://github.com/jtelleriar), [&#x0040;karldw](https://github.com/karldw), [&#x0040;kendonB](https://github.com/kendonB), [&#x0040;kevinushey](https://github.com/kevinushey), [&#x0040;kovla](https://github.com/kovla), [&#x0040;lbusett](https://github.com/lbusett), [&#x0040;lionel-](https://github.com/lionel-), [&#x0040;lorenzwalthert](https://github.com/lorenzwalthert), [&#x0040;lwiklendt](https://github.com/lwiklendt), [&#x0040;mattfidler](https://github.com/mattfidler), [&#x0040;MatthieuStigler](https://github.com/MatthieuStigler), [&#x0040;maxheld83](https://github.com/maxheld83), [&#x0040;michaelweylandt](https://github.com/michaelweylandt), [&#x0040;mingsu](https://github.com/mingsu), [&#x0040;momeara](https://github.com/momeara), [&#x0040;PalaceChan](https://github.com/PalaceChan), [&#x0040;pat-s](https://github.com/pat-s), [&#x0040;plantarum](https://github.com/plantarum), [&#x0040;prosoitos](https://github.com/prosoitos), [&#x0040;ptoche](https://github.com/ptoche), [&#x0040;QuLogic](https://github.com/QuLogic), [&#x0040;ralonso-igenomix](https://github.com/ralonso-igenomix), [&#x0040;randomgambit](https://github.com/randomgambit), [&#x0040;riccardopinosio](https://github.com/riccardopinosio), [&#x0040;romainfrancois](https://github.com/romainfrancois), [&#x0040;tomroh](https://github.com/tomroh), [&#x0040;Woosah](https://github.com/Woosah), [&#x0040;yonicd](https://github.com/yonicd), and [&#x0040;yutannihilation](https://github.com/yutannihilation).
Loading