With the popularity and power of ggplot2, some R package authors are changing
their plotting functions to output
ggplot objects instead of base R plots. This is a great idea for existing package
maintainers that simply want to update their output to a modern, flexible and themeable plotting library.
However, I have also encountered a handful of packages that fall into the trap of creating new
monolithic ggplot functions: heavyweight, base-R-like functions with lots of parameters that output custom
ggplot objects. I imagine
(slightly facetiously) the reasoning that leads to this approach is something like:
I love using
ggplot! It is flexible and lets me quickly specify a range of plot types.
My package users need to visualize some output from functions in my package.
Therefore, I will add functions to my package that output
This reasoning reflects a failure to understand that the flexibility and expressive power of
comes from not having a new monolithic function for every plot type.
This is the reasoning of someone who recognizes the power of
ggplot but does not understand why it is powerful.
ggplot is powerful because you can specify a large range of plot types using simple, consistent mappings from data onto aesthetics
onto geometries. This allows you to build complex plots through recombining elements of a relatively small vocabulary—to read and write plotting code without learning new
parameters and syntax for every new plot type.1
Functions that output pre-made
ggplot objects are inflexible in the same way that base R plotting functions are, and bring
virtually all the same baggage and usability issues. Some2 even
ggplot functionality with new parameters, like
mapping categories of input onto colors, or even get halfway to reimplementing faceting. I have inhabited the base R
plotting universe before; I have no desire to go back to learning a new set of parameters for every new plot type I encounter.
What is the alternative?
I would love, instead, for package authors to create functions that return data in a tidy
format, allowing users to easily visualize output using plain-vanilla
ggplots. For more esoteric plot types, a package might
stats. Rarely should we need to go further than that.3
Put another way, I think that this is an anti-pattern:
some_object_from_my_package %>% big_function_with_a_bunch_of_parameters_that_returns_a_ggplot(...)
And is almost always better when replaced with something like this:
some_object_from_my_package %>% function_to_get_data_in_tidy_format() %>% ggplot(aes(...)) + some_existing_geom_or_maybe_a_new_one_designed_for_this_task(...)
As a user, this approach means:
To customize this plot, I do not need to learn a million parameters for each package’s approach to mapping data onto visual properties. I do not need to learn how that particular function reimplements faceting (if it does). I already know how to do all that in
To theme this plot, I do not need to learn each package’s approach to theming (especially egregious are packages that create their own additional theming functions on top of and apart from
ggplot’s existing themes—now I need to learn the function parameters and a whole new theming system!)
If I want to compose complex custom plots, possibly remixing and recombining what would be output from multiple monolithic plotting functions in your (or others’) packages, I can do that. With two
ggplotobjects, what do I do?
plot1 = some_object_from_package_1 %>% big_plotting_function_1() plot2 = some_object_from_package_2 %>% big_plotting_function_2() ????? What would I even put here ?????
If instead I have the data and some simple
geoms, I can easily compose a new plot combining the output of both. Something like this:
data1 = some_object_from_package_1 %>% get_data_in_tidy_format_1() data2 = some_object_from_package_2 %>% get_data_in_tidy_format_2() data1 %>% ggplot(aes(...)) + geom_for_plotting_data1(...) + geom_for_plotting_data2(..., data = data2)
You might think this adds a bunch of extra typing, but in practice it doesn’t. Maybe in the simple cases, but as soon
as you want to customize the output, each monolithic
ggplot function has to introduce myriad parameters anyway
(parameters you have to learn!). Meanwhile,
the vanilla ggplot formulation is clear and easy to understand for anyone already familiar with
So please, please, please—I don’t want another new monolithic
ggplot function. I don’t need another new monolithic
ggplot function. What I need is the data, in a tidy format with consistent names, and maybe some
geoms if you’ve
I am deliberately not calling out specific packages in this post. Suffice it to say I could point to specific package(s) for each of my examples.↩︎
I admit I can think of two use cases where you might create a monolithic ggplot function: (1) The plot is simple, common, and rarely needs customization (i.e. few-to-no parameters); (2) The plot would be complex or onerous if done manually (e.g. sets of plots that would require multiple datasets and/or ggplot specifications to replicate). Most monolithic ggplot functions I have encountered meet neither criteria.↩︎