Skip to content

plot_env can cause duplicated serialization of ggplot data #3619

Closed
@traversc

Description

@traversc

Here is an example:

Case 1: saving a ggplot from global

library(ggplot2)
set.seed(1)
df <- data.frame(x = rnorm(1e6))
g <- ggplot(df, aes(x = x)) + geom_density()
saveRDS(g, "/tmp/temp.rds", compress=F)
file.info("/tmp/temp.rds")$size
[1] 8244327

Case 2: saving a ggplot from within a function

myplotfun <- function() {
  set.seed(1)
  df <- data.frame(x = rnorm(1e6))
  g <- ggplot(df, aes(x = x)) + geom_density()
  return(g)
}
g <- myplotfun()

saveRDS(g, "/tmp/temp.rds", compress=F)
file.info("/tmp/temp.rds")$size
[1] 24253766

In one case, I generate a ggplot from the global environment. In the second case, I generate the same ggplot but from within my custom plot function. I then save both these plots to disk and compare the size. The plot from within the function is 3x the size on disk.

This is caused by plot_env. If an environment is not a special case (e.g. global), R saves everything within the environment, which could include the original data.frame as well as the ggplot object itself (and therefore, everything in it).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions