ggplot2 3.4.0

  graphics, ggplot2

  Thomas Lin Pedersen

We’re so happy to announce the release of ggplot2 3.4.0 on CRAN. ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details. The new version can be installed from CRAN using install.packages("ggplot2").

This release is not full of exciting new features. Instead we have focused on the internals, tightening up of the API, and improving the messaging, especially when it comes to errors and warnings. While the release also contains a few new features these other aspects are the stars of this release.

You can see a full list of changes in the release notes

Hello linewidth

Arguably the biggest user.visible change in this release is the introduction of a new fundamental aesthetic. From this release on, linewidth will take over sizing of the width of lines—something that was earlier handled by size. The reason for this change is that prior to this release size was used for two related, but different, properties: the size of points (and glyphs) and the width of lines. Since one is area based and one is length based they fundamentally needs different scaling and the default size scale has always catered to area sizing, using a square root transform. This conflation has also made it hard for composite geoms like geom_pointrange() to control the line width and point size separately.

There is not much to discuss when it comes to how to use this “feature”, as it is a matter of switching out size with linewidth whenever you target stroke sizing:

ggplot(airquality) + 
  geom_line(aes(Day, Temp, linewidth = Month, group = Month)) + 
  scale_linewidth(range = c(0.5, 3))

Now, changing such a fundamental thing when a package is as old and widely used as ggplot2 is no small undertaking, and I wish it had been done earlier, but better late than never. We have gone to great lengths to ensure that old code continues to work. For the most part using size will continue to behave like before:

ggplot(airquality) + 
  geom_line(aes(Day, Temp, size = Month, group = Month)) + 
  scale_size(range = c(0.5, 3))
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#>  Please use `linewidth` instead.

As you can see you get the expected plot but also gets a deprecation warning asking you to update your code. Comparing the two legends we can also see the discrepancy in scaling that we discussed above, showing a much more even progression with linewidth.

All of this should work with all the geoms provided by ggplot2 (and we have described a clear upgrade path for extension developers to adopt this), except for a few instances where size remains a valid aesthetic for the geom. In these cases you will not get a deprecation warning and your output may change in look when running old code. The two geoms this concerns are geom_pointrange() and geom_sf() which both continues to use size to scale points. Comparing the output from e.g.  geom_pointrange() we can see how using size now only targets the point and not the line:

ggplot(airquality) + 
  geom_pointrange(aes(x = factor(Month), y = Temp), stat = "summary", size = 2)
#> No summary function supplied, defaulting to `mean_se()`

We recognize that introducing silent visual changes like this is not optimal but we weighted both sides and decided that it was better in the long run to rip the band-aid off and commit fully to the linewidth change in one release.

The switch to linewidth goes beyond aesthetics and should target every part of the API that have used size to target line width. This is mostly present in theming where element_rect() and element_line() now uses linewidth as argument instead of size. As above a deprecation warning will inform you of this change:

ggplot(mtcars) + 
  geom_point(aes(x = mpg, y = disp)) + 
  theme(panel.grid = element_line(linewidth = 0.2))

We have done our best to ensure that it is easy for our extension developers to follow the path laid out by ggplot2 when it comes to embracing the new aesthetic, but you will probably experience a period of discrepancy between some of your favorite extensions and ggplot2. I have full confidence that our amazing extension developers will adapt quickly so that period will probably be short.

On the topic of line width

We have made a few other internal changes when it comes to line widths. The biggest of these are perhaps a new default for polygon line width in geom_sf(). The change came about as we already had induced visual changes to old code due to the linewidth aesthetic introduction and based on feedback from the spatial community we saw that size was most often used to thin the polygon borders. The new default is 0.2 (down from 0.5) and hopefully strikes a nice balance:

nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
p1 <- ggplot(nc) +
  geom_sf(linewidth = 0.5) + 
  ggtitle("Old default")

p2 <- ggplot(nc) +
  geom_sf() + 
  ggtitle("New default")

p1/p2

More minor is a small fix we did to guide_colorbar() where it was brought to our attention that the ticks.linewidth and frame.linewidth weren’t given in the same unit as every other line width in ggplot2. This has been corrected and the default has been adjusted to retain the same look but if you have given these specifically in your code you are likely to notice a visual change.

Other breaking changes

In the grab-bag of breaking changes we have now formally deprecated qplot(). It will continue to work as always but will be a bit noisy about it. Don’t expect the function to disappear, but the deprecation signals that we don’t intend to do further work on qplot() to keep it current with new features etc. In the same vein, stat() and ..var.. for marking aesthetics from stats are also formally deprecated in favor of after_stat(). Again, the result is that using these old APIs will be noisy but still work.

On the topic of after_stat(), the values and computations inside of it now use the un-transformed variables rather than the transformed ones. This is a bit esoteric and only applies to aesthetics that have had a scale transformation applied to them, so you may never notice.

Lastly, we have made a switch to using rlang::hash() instead of digest::digest() which may result in the automatic ordering of legends changing, again a pretty minor change. If you care about the ordering of the legends you can always take control of it using the order argument inside the different guide_*() constructors.

Better errors

One of the most substantial changes in usability in this release is a complete rewrite of the errors and warnings. This goes deeper than changing wordings as the messaging is now based on the signal handling in the cli package that provides rich text formatting and better ways to guide the user to a resolution. Consider the following easy to make mistake of using the pipe instead of +:

ggplot(mtcars) |> 
  geom_point(aes(mpg, disp))
#> Error in `geom_point()`:
#> ! `mapping` must be created by `aes()`
#>  Did you use `%>%` or `|>` instead of `+`?

As can be seen, the error now clearly states where it is happening, then tells you what is wrong, and lastly gives you a hint at what might be the solution.

However, this is not all. One of the biggest issues with error reporting in ggplot2 is that most code is evaluated during rendering, not when the API calls are made. Because of this it has been difficult to link a user error in a geom specification to the actual error message that arises. This could send the user on a treasure hunt to identify what to change in order to fix the code. With the changes in 3.4.0 we are now much better at directing the user to the right place in their code when errors in the rendering happens:

huron <- data.frame(year = 1875:1972, level = as.vector(LakeHuron))
ggplot(huron) +
  geom_line(aes(year, level)) + 
  geom_ribbon(aes(year, xmin = level - 5, xmax = level + 5))
#> Error in `geom_ribbon()`:
#> ! Problem while setting up geom.
#>  Error occurred in the 2nd layer.
#> Caused by error in `compute_geom_1()` at ggplot2/R/ggproto.r:182:16:
#> ! `geom_ribbon()` requires the following missing aesthetics: ymin and
#>   ymax or y

We can see that the error message correctly identifies the geom responsible for the layer, communicates during what part of the rendering it happened during, and points to the index of the layer in the case that multiple layers from the same geom have been used. Lastly it shows the original error that can help you with solving the issue.

Hopefully the changes goes a long way to make ggplot2 even more welcoming to new and seasoned users alike. However, this effort is never done and we continue to appreciate issues in the github repository pointing out unhelpful errors or warnings that arises so that we may improve it further.

vctrs inside

The last part of the large housekeeping changes in this release is that ggplot2 finally embraces vctrs and uses it’s functions internally primarily for binding data together. Apart from a nice bump in rendering speed it also means that we now better support data types built upon vctrs and subscribe to the more well-defined coercion rules that it provides. The last point is a double edged sword though, as your code may contain a diverse mix of data types in different layers that worked before but doesn’t align with the strictness of vctrs. While we have gone to lengths to ensure that your code still works you will begin to see deprecation notices if you e.g. factor on a variable that is incompatible across layers:

labels <- data.frame(
  label = paste("gear", 3:5),
  gear = as.character(3:5),
  x = 100,
  y = 11
)
ggplot(mtcars) + 
  geom_point(aes(disp, mpg)) + 
  geom_text(aes(x, y, label = label), labels, hjust = "left") + 
  facet_wrap(~gear)
#> Warning: Combining variables of class <numeric> and <character> was deprecated in
#> ggplot2 3.4.0.
#>  Please ensure your variables are compatible before plotting (location:
#>   `combine_vars()`)

While this may seem like an unnecessary annoyance we hope that you’ll learn to appreciate that this strictness can save you from silent bugs where you end up combining variables that are basically incompatible.

New features

While most of the focus has been on internal housekeeping in this release a few new features has also crept in, courtesy of our amazing contributors from the community:

Stacking non-aligned data

position_stack() has always required that groups share a common x-value to be stacked. The nature of most time series data etc. makes it so that this is often the case, but not always. We have now introduced a stat_align() that takes care of interpolating y-values in each group at every unique x-value in the data so that they can be stacked. This stat is now the default for geom_area():

df <- tibble::tribble(
    ~g, ~x, ~y,
    "a", 1, 2,
    "a", 3, 5,
    "a", 5, 1,
    "b", 2, 0,
    "b", 4, 6,
    "b", 6, 7
)
p1 <- ggplot(df, aes(x, y, fill = g)) + 
  geom_area(stat = "identity", alpha = 0.5) + 
  ggtitle("stat_identity()")
p2 <- ggplot(df, aes(x, y, fill = g)) + 
  geom_area(alpha = 0.5) + 
  ggtitle("stat_align()")

(p1 | p2) & theme(legend.position = "none")

Bounded density estimation

geom_density() have gained a bounds argument allowing you to perform density estimation with bound correction. This can leads to might better edge estimates when bounds are known for a sample:

data <- data.frame(x = rexp(100))
ggplot(data, aes(x)) +
  geom_density(aes(colour = "unbounded"), key_glyph = "path") +
  geom_density(aes(colour = "bounded"), bounds = c(0, Inf), key_glyph = "path") +
  stat_function(aes(colour = "true distribution"), fun = dexp) + 
  scale_colour_manual(
    name = NULL, 
    values = c("black", "firebrick", "forestgreen"),
    breaks = c("true distribution", "unbounded", "bounded")
  )

No clipping in facet strips

It is now possible to turn clipping in the facet strips off. For the most part the default works fine but in certain situations you’d like the strip text or the border to be seen in full. The new feature is a theme setting:

p <- ggplot(diamonds) + 
  geom_bar(aes(y = color)) + 
  facet_wrap(~ cut) + 
  theme_minimal() + 
  theme(
    strip.background = element_rect("grey90", colour = "grey90", linewidth = 1),
    axis.line.y = element_line(linewidth = 1)
  )
p

In the (a bit contrived) theme above we see a jarring step between the strip background and the axis line because the border of the strip is clipped to the extent of the strip. We can fix this by turning off clipping:

p + theme(strip.clip = "off")

Justification in geom_bar()/geom_col()

You can now specify how the bars in geom_bar() should be justified with respect to the position on the axis they are tied to:

mtcars_centered <- mutate(mtcars, justification = "centered")
mtcars_left <- mutate(mtcars, justification = "left aligned")
ggplot(mapping = aes(x = gear)) + 
  geom_bar(data = mtcars_centered) + 
  geom_bar(data = mtcars_left, just = 0) + 
  facet_wrap(~justification, ncol = 1)

It goes without saying that you should only do this for good reasons because it goes against how people in general expect bar plots to behave, but for certain layout needs it can be a boon.

Acknowledgements

As always, this release could not be possible without contributions from our amazing community. A huge thanks goes out to everyone who has helped made ggplot2 3.4.0 a reality:

@92amartins, @acircleda, @AlgaeKat, @andreaskuepfer, @angleik, @aphalo, @artuurC, @asolisc, @baderstine, @basille, @bergsmat, @bersbersbers, @billdenney, @brianmsm, @brunomioto, @bwiernik, @capnrefsmmat, @clauswilke, @cmartin, @ConchuirohAodha, @corybrunson, @DanChaltiel, @DarioS, @Darxor, @davidchall, @davidhodge931, @dhrhzz, @DiegoJArg, @DISOhda, @drtoche, @Enterprise-J, @ewallace, @gbrlrgrs, @ggrothendieck, @GregorDall, @hadley, @henningpohl, @Hugh-Mungo, @IndrajeetPatil, @JacobElder, @jarauh, @javlon, @jdonland, @jessexknight, @jfunction, @JobNmadu, @JoFAM, @jooyoungseo, @jpquast, @jtlandis, @junjunlab, @jwhendy, @kapsner, @KasperThystrup, @kongdd, @LarryVincent, @leonjessen, @Lisamrshhsr, @llrs, @LuisLauM, @lynn242, @makrez, @MichaelChirico, @michaelgrund, @mikeroswell, @mjsmith037, @moodymudskipper, @mvanaman, @netique, @nfancy, @ngreifer, @nkehrein, @olobiolo, @orgadish, @pachadotdev, @padpadpadpad, @paupaiz, @ProfessorPeregrine, @PursuitOfDataScience, @rikudoukarthik, @rjake, @rressler, @SarenT, @Sebas256, @shenzhenzth, @skyroam, @stargorg, @stefanoborini, @steveharoz, @stragu, @szimmer, @tamas-ferenci, @teunbrand, @tfjaeger, @thomasp85, @thoolihan, @tjebo, @topepo, @trevorld, @tungttnguyen, @twest820, @waynerroper, @willgearty, @wmacnair, @wurli, @yutannihilation, and @zeehio.