With the popularity and power of ggplot2, some R package authors are changing
their plotting functions to output ggplot
objects instead of base R plots. This is a great idea for existing package
maintainers that simply want to update their output to a modern, flexible and themeable plotting library.
However, I have also encountered a handful of packages that fall into the trap of creating new
monolithic ggplot functions: heavyweight, base-R-like functions with lots of parameters that output custom ggplot
objects. I imagine
(slightly facetiously) the reasoning that leads to this approach is something like:
I love using
ggplot
! It is flexible and lets me quickly specify a range of plot types.My package users need to visualize some output from functions in my package.
Therefore, I will add functions to my package that output
ggplot
s.
This reasoning reflects a failure to understand that the flexibility and expressive power of ggplot
comes from not having a new monolithic function for every plot type.
This is the reasoning of someone who recognizes the power of ggplot
but does not understand why it is powerful.
ggplot
is powerful because you can specify a large range of plot types using simple, consistent mappings from data onto aesthetics
onto geometries. This allows you to build complex plots through recombining elements of a relatively small vocabulary—to read and write plotting code without learning new
parameters and syntax for every new plot type.1
Functions that output pre-made ggplot
objects are inflexible in the same way that base R plotting functions are, and bring
virtually all the same baggage and usability issues. Some2 even
recreate ggplot
functionality with new parameters, like
mapping categories of input onto colors, or even get halfway to reimplementing faceting. I have inhabited the base R
plotting universe before; I have no desire to go back to learning a new set of parameters for every new plot type I encounter.
What is the alternative?
I would love, instead, for package authors to create functions that return data in a tidy
format, allowing users to easily visualize output using plain-vanilla ggplot
s. For more esoteric plot types, a package might
include new geom
s or stat
s. Rarely should we need to go further than that.3
Put another way, I think that this is an anti-pattern:
some_object_from_my_package %>%
big_function_with_a_bunch_of_parameters_that_returns_a_ggplot(...)
And is almost always better when replaced with something like this:
some_object_from_my_package %>%
function_to_get_data_in_tidy_format() %>%
ggplot(aes(...)) +
some_existing_geom_or_maybe_a_new_one_designed_for_this_task(...)
As a user, this approach means:
To customize this plot, I do not need to learn a million parameters for each package’s approach to mapping data onto visual properties. I do not need to learn how that particular function reimplements faceting (if it does). I already know how to do all that in
ggplot
.To theme this plot, I do not need to learn each package’s approach to theming (especially egregious are packages that create their own additional theming functions on top of and apart from
ggplot
’s existing themes—now I need to learn the function parameters and a whole new theming system!)If I want to compose complex custom plots, possibly remixing and recombining what would be output from multiple monolithic plotting functions in your (or others’) packages, I can do that. With two
ggplot
objects, what do I do?plot1 = some_object_from_package_1 %>% big_plotting_function_1() plot2 = some_object_from_package_2 %>% big_plotting_function_2() ????? What would I even put here ?????
If instead I have the data and some simple
geom
s, I can easily compose a new plot combining the output of both. Something like this:data1 = some_object_from_package_1 %>% get_data_in_tidy_format_1() data2 = some_object_from_package_2 %>% get_data_in_tidy_format_2() data1 %>% ggplot(aes(...)) + geom_for_plotting_data1(...) + geom_for_plotting_data2(..., data = data2)
You might think this adds a bunch of extra typing, but in practice it doesn’t. Maybe in the simple cases, but as soon
as you want to customize the output, each monolithic ggplot
function has to introduce myriad parameters anyway
(parameters you have to learn!). Meanwhile,
the vanilla ggplot formulation is clear and easy to understand for anyone already familiar with ggplot
.
So please, please, please—I don’t want another new monolithic ggplot
function. I don’t need another new monolithic
ggplot
function. What I need is the data, in a tidy format with consistent names, and maybe some geom
s if you’ve
got ’em.
The language underlying
ggplot
is also a principled way of specifying plots that, once learned, allows you to reason at a higher level about plot specification.↩︎I am deliberately not calling out specific packages in this post. Suffice it to say I could point to specific package(s) for each of my examples.↩︎
I admit I can think of two use cases where you might create a monolithic ggplot function: (1) The plot is simple, common, and rarely needs customization (i.e. few-to-no parameters); (2) The plot would be complex or onerous if done manually (e.g. sets of plots that would require multiple datasets and/or ggplot specifications to replicate). Most monolithic ggplot functions I have encountered meet neither criteria.↩︎