I don’t want your monolithic ggplot function - Matthew Kay

With the popularity and power of ggplot2, some R package authors are changing their plotting functions to output ggplot objects instead of base R plots. This is a great idea for existing package maintainers that simply want to update their output to a modern, flexible and themeable plotting library.

However, I have also encountered a handful of packages that fall into the trap of creating new monolithic ggplot functions: heavyweight, base-R-like functions with lots of parameters that output custom ggplot objects. I imagine (slightly facetiously) the reasoning that leads to this approach is something like:

I love using ggplot! It is flexible and lets me quickly specify a range of plot types.
My package users need to visualize some output from functions in my package.
Therefore, I will add functions to my package that output ggplots.

This reasoning reflects a failure to understand that the flexibility and expressive power of ggplot comes from not having a new monolithic function for every plot type. This is the reasoning of someone who recognizes the power of ggplot but does not understand why it is powerful.

ggplot is powerful because you can specify a large range of plot types using simple, consistent mappings from data onto aesthetics onto geometries. This allows you to build complex plots through recombining elements of a relatively small vocabulary—to read and write plotting code without learning new parameters and syntax for every new plot type.¹

Functions that output pre-made ggplot objects are inflexible in the same way that base R plotting functions are, and bring virtually all the same baggage and usability issues. Some² even recreate ggplot functionality with new parameters, like mapping categories of input onto colors, or even get halfway to reimplementing faceting. I have inhabited the base R plotting universe before; I have no desire to go back to learning a new set of parameters for every new plot type I encounter.

What is the alternative?

I would love, instead, for package authors to create functions that return data in a tidy format, allowing users to easily visualize output using plain-vanilla ggplots. For more esoteric plot types, a package might include new geoms or stats. Rarely should we need to go further than that.³

Put another way, I think that this is an anti-pattern:

some_object_from_my_package %>%
  big_function_with_a_bunch_of_parameters_that_returns_a_ggplot(...)

And is almost always better when replaced with something like this:

some_object_from_my_package %>%
  function_to_get_data_in_tidy_format() %>%
  ggplot(aes(...)) +
  some_existing_geom_or_maybe_a_new_one_designed_for_this_task(...)

As a user, this approach means:

To customize this plot, I do not need to learn a million parameters for each package’s approach to mapping data onto visual properties. I do not need to learn how that particular function reimplements faceting (if it does). I already know how to do all that in ggplot.
To theme this plot, I do not need to learn each package’s approach to theming (especially egregious are packages that create their own additional theming functions on top of and apart from ggplot’s existing themes—now I need to learn the function parameters and a whole new theming system!)

If I want to compose complex custom plots, possibly remixing and recombining what would be output from multiple monolithic plotting functions in your (or others’) packages, I can do that. With two ggplot objects, what do I do?

plot1 = some_object_from_package_1 %>%
  big_plotting_function_1()

plot2 = some_object_from_package_2 %>%
  big_plotting_function_2()

????? What would I even put here ?????

If instead I have the data and some simple geoms, I can easily compose a new plot combining the output of both. Something like this:

data1 = some_object_from_package_1 %>%
  get_data_in_tidy_format_1()

data2 = some_object_from_package_2 %>%
  get_data_in_tidy_format_2()

data1 %>%
  ggplot(aes(...)) +
  geom_for_plotting_data1(...) +
  geom_for_plotting_data2(..., data = data2)

You might think this adds a bunch of extra typing, but in practice it doesn’t. Maybe in the simple cases, but as soon as you want to customize the output, each monolithic ggplot function has to introduce myriad parameters anyway (parameters you have to learn!). Meanwhile, the vanilla ggplot formulation is clear and easy to understand for anyone already familiar with ggplot.

So please, please, please—I don’t want another new monolithic ggplot function. I don’t need another new monolithic ggplot function. What I need is the data, in a tidy format with consistent names, and maybe some geoms if you’ve got ’em.

The language underlying ggplot is also a principled way of specifying plots that, once learned, allows you to reason at a higher level about plot specification.↩︎
I am deliberately not calling out specific packages in this post. Suffice it to say I could point to specific package(s) for each of my examples.↩︎
I admit I can think of two use cases where you might create a monolithic ggplot function: (1) The plot is simple, common, and rarely needs customization (i.e. few-to-no parameters); (2) The plot would be complex or onerous if done manually (e.g. sets of plots that would require multiple datasets and/or ggplot specifications to replicate). Most monolithic ggplot functions I have encountered meet neither criteria.↩︎