-
Notifications
You must be signed in to change notification settings - Fork 2.1k
autoplot
autoplot is an idiom for creating complete ggplot graphs that are appropriate for specific types of data first introduced in version 0.9.0. ggplot2 does not provide any useful methods, but declares the S3 generic for other packages to extend
ggplot provides a framework for creating plots based on starting with data, mapping data to aesthetics, scaling the data and aesthetics, and providing a theming system for controlling plot appearance. At the same time, many packages which create specific data structures, typically S3 or S4 objects, and provide plot methods to graphically display them based, typically, on base graphics. These default plots implement appropriate conventions for the type of data plotted. A layer that is missing is the ability to create a standard plot using ggplot so that it can be further adjusted by setting scales, themes, etc. autoplot aims to fill this niche. However, the design should not be so rigid as to make adaptation impossible; a strength of ggplot is the ability to rearrange data presentation in different ways for different needs.
The plotting of specialized types of data structures can be split into two steps:
- Converting the specialized data structure into a
data.framewhich exposes variables in a structure appropriate forggplot. A mechanism/idiom for this already exists infortify. - Define the plot based on mapping these variables to the appropriate aesthetics using existing geoms/stats/scales. This is the step
autoplotshould do.
Any object which has an autoplot method should also have a fortify method which the autoplot method uses to convert the specialized data structure into a data.frame. The purpose of this separation is to be able to re-use the work of data restructuring even if the specific autoplot method is not used. The documentation for the fortify method should enumerate the variables in the returned data.frame in such a way that they are known for other uses and that how they relate to the original data is known.
The autoplot method should use the appropriate fortify method to convert the data structure to a data.frame. It can then construct a ggplot object using this data.frame and creating layers (geoms or stats) with the appropriate aesthetic mappings. Documentation should state what layers/geoms/stats are created, and what aesthetic mappings are made. just include the actual code?
If a package which defines these specific data structures also defines fortify and autoplot, then they are just two additional methods. Enhances or depends on ggplot?
If a separate package implements them, what should the package naming convention be? GGobject? ggobject? autplotObject? originalpackageGG?
- S4 classes?
- What should the convention be if the data structure can not be well represented by a single
data.frame, but rather by a set ofdata.frames? - a list of data.frames pro: natural R idiom for collecting two or more things; con: breaks the return value convention for
fortify - a "block diagonal"
data.frame- that is, a singledata.framethat has columns which are the combination of all the columns in all the individualdata.frames, but only one set of columns are filled in at a time. pros: is adata.frame, whichfortifyis supposed to return; easy to create given the separatedata.frames -- justplyr::rbind.fillthem. con: inelegant as a data structure; wasteful of space - a
data.framewith additionaldata.frames as attributes. pro: is adata.framewhichfortifyis to return (and which won't chokeggplot/qplot). con: Sets one of thedata.frames as dominant over the others; seems not all that natural. - What should package naming conventions be?
- Extends vs depends?
- How much should
autoplottake extra parameters to define variations on standard plot? example: triangle versus square lines inggdendro. Should thefortifyfunction pull out all possible data so that any version can be plotted? - If calling
fortifyis expensive, should eachautoplotfunction check its input data to see if it is either a object of that type (dispatched via S3 methods) or called directly with the fortified data (and thus already adata.frame)? If so, then theautoplotfunction should be exported so it can be called directly. - Should the
fortifyandautoplotmethods for specific data types be exported/public?
These are examples of packages or functions which create complete graphics of specific data types using ggplot, whether or not they use the autoplot mechanism.
- Original discussion was based on http://stackoverflow.com/questions/7098830/bad-idea-ggplotting-an-s3-class-object which had discussion of linear regression model diagnostics and an example of trees.
- ggdendro (CRAN page) (GitHub repo): does not implement in this way (as of 0.0-7), but has many of the pieces and some of the separation. Could be expanded/adapted if conventions are settled on.
- granovaGG (CRAN page): first release September 4, 2011.
- Survival curves: I (BrianDiggs) have some code that creates Kaplan-Meier curves from
survfitobjects, but it needs work; partially, I was wondering about a framework such as this.