Mapping in ggplot2 and R - bivariate maps

mapping two categorical variables

There is a need to represent global data concisely in many scientific fields, not in the least climate science. Commonly one plots a single continuous variable on a map, potentially with some annotations added for clarity. However, there is an intermediate form, mainly the bivariate map. In bivariate maps one plots two continuous variables mapped onto a cubic colour field.

Often these continuous fields are reclassified, using thresholds, into 3 or 5 ranges. This allows for accents to be created across the map, and reduces complexity of the map and the mapping process.

Here I show how to create such a bivariate map using ggplot in R, sourcing from ECMWF climate data (provided through the Climate Data Store and accessed with our ecmwfr package - data download code is provided below). The mock example has little scientific value, but shows the process of making and annotating a bivariate map in ggplot. Although this plot works for any projection we’ll use a Robinson projection as slightly more representative of the true area of landmasses.

So, let’s get started!

First we setup our environment ! You will need the tidyverse for processing and plotting, raster to deal with gridded climate data and sf + rnaturalearth for vector features, finally patchwork to compose a plot. The use of showtext is optional as this is required to sideload a non-standard font for show.

# load tidyverse
library(tidyverse)
library(raster)
library(rnaturalearth)
library(sf)
library(patchwork)

# custom fonts
library(showtext)
font_add_google("Prata", regular.wt = 400)
showtext_auto()

With all this loaded we can start composing our figure. For convenience we set the Robinson projection as a variable and load both layers of our ECMWF data (as a netcdf file) into our workspace. UV values are roughly scaled relative to the maximum value as recorded in the EU, setting this as 10 (the maximum on a normal scale). We do warn, this has no scientific basis as this is a mock example! We do the same for the temperature data, where we calculate the yearly monthly maximum.

# set robinson projection
robinson <- CRS("+proj=robin +over")

# read in raster data

# max monthly UV data (scaled by seconds in a day
# and referenced by a location in Belgium
# assuming UV max values of around 10 in
# this region)
u <- brick("data/uv_temp.nc", varname = "uvb")
u <- (max(u/(24*60*60), na.rm = TRUE)/ 32) * 10
u <- rotate(u)

# max yearly monthly temperature converted to C
# from K
t <- brick("data/uv_temp.nc", varname = "t2m")
t <- max(t) - 273.15
t <- rotate(t)

To provide context to the map we download a simple features (sf) vector of country outlines using the rnaturalearth package. In addition, we define a bounding box outlining the total extent of the map. Both are reprojected to the Robinson projection.

# download countries 
countries <- ne_countries(scale = 50, returnclass = c("sf"))

# create a bounding box for the robinson projection
# we'll use this as "trim" to remove jagged edges at
# end of the map (due to the curved nature of the
# robinson projection)
bb <- sf::st_union(sf::st_make_grid(
  st_bbox(c(xmin = -180,
            xmax = 180,
            ymax = 90,
            ymin = -90), crs = st_crs(4326)),
  n = 100))
bb_robinson <- st_transform(bb, as.character(robinson))

# transform the coastline to robinson
countries_robinson <- st_transform(countries, robinson)

For rapid rendering we reproject and resample the original map. Using pipes, and in one movement, we convert this reprojected raster file to a dataframe. Both UV and temperature data are treated this way. Note that you can change the resolution of the output by adjusting the resampling value. For a full resolution map you can disable resampling.

# convert gridded raster dato dataframe
u_df <- u %>%
  projectRaster(., res=50000, crs = robinson) %>%
  rasterToPoints %>%
  as.data.frame() %>%
  `colnames<-`(c("x", "y", "uv"))

t_df <- t %>%
  projectRaster(., res=50000, crs = robinson) %>%
  rasterToPoints %>%
  as.data.frame() %>%
  `colnames<-`(c("x", "y", "temp"))

We’ll use a bivariate map with three classes per layer / input. As such we reclassify the UV and temperature data in three classes each, from low to high (values 1 to 3). We bind these two dataframes together by location and combine the three classes in a descriptive factor which provides a coordinate on our colour cube (e.g. 1 - 1, 1 - 3).

# reclassify data using threshold values
# so we get 3 classes for each layer
u_df <- u_df %>%
  mutate(
    val = ifelse(uv <= 10, "1",
                 ifelse(uv >= 12, "3",
                        "2"))
  )

t_df <- t_df %>%
  mutate(
    val = ifelse(temp <= 10, "1",
                 ifelse(temp >= 28, "3",
                        "2"))
  )

# bind data by location and group
# by the value (index) created above
df <- left_join(t_df, u_df, by = c("x","y")) %>%
  mutate(
    group = paste(val.y, val.x, sep = " - ")
  ) %>%
  dplyr::select(-c(val.x, val.y))

Next we define a colour cube, where link descriptive factors equal to those used above to a particular colour. This colour description is then joined with our main dataframe for plotting. At this point all the information we need is contained within our main dataframe, and some ancillary elements (e.g. country outlines, bounding box).

# create a bivariate legend with indices
# matching those created above
legend_3 <- tibble(
  "3 - 3" = "#3F2949", # high UV, high temp
  "2 - 3" = "#435786",
  "1 - 3" = "#4885C1", # low UV, high temp
  "3 - 2" = "#77324C",
  "2 - 2" = "#806A8A", # medium UV, medium temp
  "1 - 2" = "#89A1C8",
  "3 - 1" = "#AE3A4E", # high UV, low temp
  "2 - 1" = "#BC7C8F",
  "1 - 1" = "#CABED0" # low UV, low temp
) %>%
  gather("group", "fill")

# match the group constructed
# above with the colour scheme
# with 3 categories this is the
# plotting dataframe
df <- left_join(df, legend_3)

Now we can create our main plot, which will be a layered cake of data. First we add the map data using geom_raster(), which is defined by an x and y coordinated, with colours defined by a fill value. We’ve defined the colours on a pixel by pixel basis. In order to use this information we use scale_fill_identity() to evaluate the colour in the fill column as the one to use in plotting (not as a factor).

With geom_sf() we add the countries. We’ll use a second geom_sf() to add the bounding box. This bounding box has no other purpose than to trim the jagged edges of low resolution projeced data around round edges (similar tricks with resolution and outlines have been used in previous tutorials). Finally we add some annotation elements. Given the global nature of the map we do not include any axis, as such th main theme is empty (void) to start with.

# create the main map
p1 <- ggplot()+
  geom_raster(
    data = df,
    aes(
      x=x,
      y=y,
      fill=fill
    ),
    interpolate = TRUE
    ) +
  scale_fill_identity() +
  geom_sf(data=countries_robinson,
          colour='grey25',
          linetype='solid',
          fill= NA,
          size=0.3) +
  geom_sf(data=bb_robinson,
          colour='white',
          linetype='solid',
          fill = NA,
          size=0.7) +
  labs(
    title = "Liquids: drink it, or rub it!",
    subtitle = "Do you always need water & sunscreen?",
    caption = "
Data: Copernicus Climate Data Store - 2020
Temperature thresholds: low < 10C < mid < 28C < high
Max seasonal UV Thresholds: normal < 10 < mid < 12 < high
"
  ) +
  theme_void() +
  theme(legend.position = 'none',
        plot.subtitle = element_text(
          color = "grey30",
          size = 40,
          hjust = 0.1),
        plot.title = element_text(
          family = "Prata",
          color = "grey30",
          size = 70,
          hjust = 0.1),
        plot.caption = element_text(
          color = "grey30",
          size = 25,
          lineheight = 0.3),
        plot.margin = margin(r = 10)
        )

Ggplot is not aware that this is a bivariate plot and a normal legend will not reflect this. To fix this we have to draw our own legend, a second plot!

We use the legend_3 dataframe to do this, splitting the labels up into their constituting coordinates. These coordinates will be used in combination with the linked colour (as used above) to plot a map of coloured boxes. Again, we are only interested in the colour and the axis along which these colours change. So, we set the theme to empty (void) and provide the necessary annotations for the axis.

# create the legend
p2 <- legend_3 %>%
  separate(group,
           into = c("uv", "temp"),
           sep = " - ") %>%
  mutate(temp = as.integer(temp),
         uv = as.integer(uv)) %>%
  ggplot() +
  geom_tile(mapping = aes(
    x = uv,
    y = temp,
    fill = fill)) +
  scale_fill_identity() +
  labs(x = "sunscreen →",
       y = "hydrate →") +
  theme_void() +
  theme(
    axis.title = element_text(
      size = 25,
    ),
    axis.title.y = element_text(angle = 90)) +
  coord_fixed()

Finally, we need to compose these two elements relative to eachother. Here we’ll use the patchwork framework to define the relatie location of each plot. Here the main plot takes up a space covering 1 - 7 columns along the X and Y-axis. The legend is located in the top right corner, taking up 1/7th of the area of the main plot.

The layout information is encoded as nested list, and provided to patchwork for rendering (we’ll save the figure as a png using ggsave).

# create a layout to plot both
# the main figure and the legend
layout <- c(
  area(t = 1, l = 1, b = 7, r = 7),
  area(t = 1, l = 6, b = 1, r = 7)
)

# create final layout
p <- p1 + p2 + 
  plot_layout(design = layout)

# save the plot
ggsave("drink_or_rub.png", width = 11)

And that’s it! This will give you the figure at the top of this post!

As always, keep an eye on our blog for new entries in this series on map building, and tips and tricks to improve your R mapping skills. The code for this project can be found on our github R tutorials page. Also check out the great bivariate vector map by Timo Grossenbacher from which I borrowed the colour scheme.

Avatar
Koen Hufkens, PhD
Partner, Researcher

As an earth system scientist and ecologist I model ecosystem processes.

Related

Next
Previous