Skip to content

Label dictionary #6077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 2, 2024
Merged

Label dictionary #6077

merged 8 commits into from
Dec 2, 2024

Conversation

teunbrand
Copy link
Collaborator

This PR aims to fix #5178.

Briefly, it adds labs(dict) that allows one to use a data-dictionary to label a plot based on variable names (rather than aesthetics).

Let's just jump into examples.
The premise of this idea is that somewhere in your analysis code, you have some nice labels about what variables in your dataset mean. For example, we could have the following for the mpg dataset.

devtools::load_all("~/packages/ggplot2")
#> ℹ Loading ggplot2

dict <- c(
  displ = "Engine Displacement",
  hwy   = "Highway miles per gallon",
  cty   = "City miles per gallon",
  drv   = "Drive train",
  manufacturer = "Manufacturer name",
  model = "Model name",
  year  = "Year of manufacture",
  cyl   = "Number of cylinders",
  trans = "Type of transmission",
  fl    = "Fuel type",
  class = "Type of car"
)

This PR lets you slap on such a dictionary to your labels, and all variable names will be translated. The benefit is that you only have to think about pretty lables for variables once and you needn't worry about them again.

ggplot(mpg, aes(class, cty, fill = drv)) +
  geom_boxplot() +
  labs(dict = dict)

Noteably, this doesn't work when having more complex expressions, like factor(cyl) instead of cyl. In such case, you can fall back to labelling the aesthetic, or you can add an entry like labs(dict = c(dict, factor(cyl) = dict[["cyl"]])). Also we can reuse the dictionary here because we're using the same dataset even though we're making a totally different plot.

ggplot(mpg, aes(displ, hwy, colour = factor(cyl))) +
  geom_point() +
  labs(dict = dict, colour = dict["cyl"])

Created on 2024-09-04 with reprex v2.1.1

@larmarange
Copy link

This is a nice idea. Would it be better to have a more explicit argument, i.e. dictionary instead of dict?

@teunbrand
Copy link
Collaborator Author

I don't have too strong of an opinion on this, but I like that both labs() is terse and dict is terse.
Perhaps Thomas can render a decision on this one

@teunbrand
Copy link
Collaborator Author

teunbrand commented Sep 10, 2024

Double check: do we overwrite labs if their value is already inside?
EDIT: No, we don't :)
TODO: rename to dictionary

devtools::load_all("~/packages/ggplot2/")
#> ℹ Loading ggplot2

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  labs(
    x = "foo",
    dictionary = c(foo = "foobar", hwy = "baz")
  )

Created on 2024-09-10 with reprex v2.1.1

@shannonpileggi
Copy link

I love this idea! Thank you for initiating this work!

Just curious - have you given thought to workflows where the user doesn't actually specify a dictionary? For example, what if the data already has labels embedded, and the user wants those embedded labels to be displayed automatically? This would be a similar approach to the {gt} package.

library(gt)

# unlabelled data frame
df1 <- data.frame(x = 1)
gt(df1)
x
1
# labelled data frame
df2 <- df1
attr(df2$x, "label") <- "variable x description"
gt(df2)
variable x description
1

Created on 2024-10-25 with reprex v2.1.1

@teunbrand
Copy link
Collaborator Author

Hi Shannon! We implemented something akin to what you describe in #5879 which is already merged in the dev version. You can give the dev version a spin if you like (and provide feedback if we can improve 😅 )

@shannonpileggi
Copy link

Ohhhhh thank you for pointing that out! Will take a look!

Copy link
Member

@thomasp85 thomasp85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +87 to +95
dict <- plot_labels$dictionary
if (length(dict) > 0) {
labels <- lapply(labels, function(x) {
dict <- dict[names(dict) %in% x]
x[match(names(dict), x)] <- dict
x
})
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being called on all labels, right? So if a user has given a label name that just happens to be in the dictionary it will get translated

To me it should only apply to fallback labels, but I'm open to hearing against this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't roll back my approval but let's talk this one out before merging :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It applies to all 'derived' labels, not labels that users give verbatim.
Is this the kind of situation you mean?

devtools::load_all("~/packages/ggplot2/")
#> ℹ Loading ggplot2

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  labs(
    x = "Engine Displacement",
    dictionary = c(displ = "NO NOT THIS ONE", hwy = "new label")
  )

Created on 2024-12-02 with reprex v2.1.1

Copy link
Collaborator Author

@teunbrand teunbrand Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also labs(x = "foo", dictionary = c(foo = "bar")) will give "foo" as label for x.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was that last situation I was concerned about. As long as any explicitly given label is honoured I'm happy

@teunbrand teunbrand merged commit 4af509e into tidyverse:main Dec 2, 2024
13 checks passed
@teunbrand teunbrand deleted the label_dictionary branch December 2, 2024 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Label dictionaries
4 participants