There is a need to represent global data concisely in many scientific fields, not in the least climate science. Commonly one plots a single continuous variable on a map, potentially with some annotations added for clarity. However, there is an intermediate form, mainly the bivariate map. In bivariate maps one plots two continuous variables mapped onto a cubic colour field.
Often these continuous fields are reclassified, using thresholds, into 3 or 5 ranges. This allows for accents to be created across the map, and reduces complexity of the map and the mapping process.
Here I show how to create such a bivariate map using
R, sourcing from ECMWF climate data (provided through the Climate Data Store and accessed with our
ecmwfr package - data download code is provided below). The mock example has little scientific value, but shows the process of making and annotating a bivariate map in
ggplot. Although this plot works for any projection we’ll use a Robinson projection as slightly more representative of the true area of landmasses.
So, let’s get started!
First we setup our environment ! You will need the
tidyverse for processing and plotting,
raster to deal with gridded climate data and
rnaturalearth for vector features, finally
patchwork to compose a plot. The use of
showtext is optional as this is required to sideload a non-standard font for show.
# load tidyverse library(tidyverse) library(raster) library(rnaturalearth) library(sf) library(patchwork) # custom fonts library(showtext) font_add_google("Prata", regular.wt = 400) showtext_auto()
With all this loaded we can start composing our figure. For convenience we set the Robinson projection as a variable and load both layers of our ECMWF data (as a netcdf file) into our workspace. UV values are roughly scaled relative to the maximum value as recorded in the EU, setting this as 10 (the maximum on a normal scale). We do warn, this has no scientific basis as this is a mock example! We do the same for the temperature data, where we calculate the yearly monthly maximum.
# set robinson projection robinson <- CRS("+proj=robin +over") # read in raster data # max monthly UV data (scaled by seconds in a day # and referenced by a location in Belgium # assuming UV max values of around 10 in # this region) u <- brick("data/uv_temp.nc", varname = "uvb") u <- (max(u/(24*60*60), na.rm = TRUE)/ 32) * 10 u <- rotate(u) # max yearly monthly temperature converted to C # from K t <- brick("data/uv_temp.nc", varname = "t2m") t <- max(t) - 273.15 t <- rotate(t)
To provide context to the map we download a simple features (sf) vector of country outlines using the
rnaturalearth package. In addition, we define a bounding box outlining the total extent of the map. Both are reprojected to the Robinson projection.
# download countries countries <- ne_countries(scale = 50, returnclass = c("sf")) # create a bounding box for the robinson projection # we'll use this as "trim" to remove jagged edges at # end of the map (due to the curved nature of the # robinson projection) bb <- sf::st_union(sf::st_make_grid( st_bbox(c(xmin = -180, xmax = 180, ymax = 90, ymin = -90), crs = st_crs(4326)), n = 100)) bb_robinson <- st_transform(bb, as.character(robinson)) # transform the coastline to robinson countries_robinson <- st_transform(countries, robinson)
For rapid rendering we reproject and resample the original map. Using pipes, and in one movement, we convert this reprojected raster file to a dataframe. Both UV and temperature data are treated this way. Note that you can change the resolution of the output by adjusting the resampling value. For a full resolution map you can disable resampling.
# convert gridded raster dato dataframe u_df <- u %>% projectRaster(., res=50000, crs = robinson) %>% rasterToPoints %>% as.data.frame() %>% `colnames<-`(c("x", "y", "uv")) t_df <- t %>% projectRaster(., res=50000, crs = robinson) %>% rasterToPoints %>% as.data.frame() %>% `colnames<-`(c("x", "y", "temp"))
We’ll use a bivariate map with three classes per layer / input. As such we reclassify the UV and temperature data in three classes each, from low to high (values 1 to 3). We bind these two dataframes together by location and combine the three classes in a descriptive factor which provides a coordinate on our colour cube (e.g. 1 - 1, 1 - 3).
# reclassify data using threshold values # so we get 3 classes for each layer u_df <- u_df %>% mutate( val = ifelse(uv <= 10, "1", ifelse(uv >= 12, "3", "2")) ) t_df <- t_df %>% mutate( val = ifelse(temp <= 10, "1", ifelse(temp >= 28, "3", "2")) ) # bind data by location and group # by the value (index) created above df <- left_join(t_df, u_df, by = c("x","y")) %>% mutate( group = paste(val.y, val.x, sep = " - ") ) %>% dplyr::select(-c(val.x, val.y))
Next we define a colour cube, where link descriptive factors equal to those used above to a particular colour. This colour description is then joined with our main dataframe for plotting. At this point all the information we need is contained within our main dataframe, and some ancillary elements (e.g. country outlines, bounding box).
# create a bivariate legend with indices # matching those created above legend_3 <- tibble( "3 - 3" = "#3F2949", # high UV, high temp "2 - 3" = "#435786", "1 - 3" = "#4885C1", # low UV, high temp "3 - 2" = "#77324C", "2 - 2" = "#806A8A", # medium UV, medium temp "1 - 2" = "#89A1C8", "3 - 1" = "#AE3A4E", # high UV, low temp "2 - 1" = "#BC7C8F", "1 - 1" = "#CABED0" # low UV, low temp ) %>% gather("group", "fill") # match the group constructed # above with the colour scheme # with 3 categories this is the # plotting dataframe df <- left_join(df, legend_3)
Now we can create our main plot, which will be a layered cake of data. First we add the map data using
geom_raster(), which is defined by an x and y coordinated, with colours defined by a fill value. We’ve defined the colours on a pixel by pixel basis. In order to use this information we use
scale_fill_identity() to evaluate the colour in the
fill column as the one to use in plotting (not as a factor).
geom_sf() we add the countries. We’ll use a second
geom_sf() to add the bounding box. This bounding box has no other purpose than to trim the jagged edges of low resolution projeced data around round edges (similar tricks with resolution and outlines have been used in previous tutorials). Finally we add some annotation elements. Given the global nature of the map we do not include any axis, as such th main theme is empty (void) to start with.
# create the main map p1 <- ggplot()+ geom_raster( data = df, aes( x=x, y=y, fill=fill ), interpolate = TRUE ) + scale_fill_identity() + geom_sf(data=countries_robinson, colour='grey25', linetype='solid', fill= NA, size=0.3) + geom_sf(data=bb_robinson, colour='white', linetype='solid', fill = NA, size=0.7) + labs( title = "Liquids: drink it, or rub it!", subtitle = "Do you always need water & sunscreen?", caption = " Data: Copernicus Climate Data Store - 2020 Temperature thresholds: low < 10C < mid < 28C < high Max seasonal UV Thresholds: normal < 10 < mid < 12 < high " ) + theme_void() + theme(legend.position = 'none', plot.subtitle = element_text( color = "grey30", size = 40, hjust = 0.1), plot.title = element_text( family = "Prata", color = "grey30", size = 70, hjust = 0.1), plot.caption = element_text( color = "grey30", size = 25, lineheight = 0.3), plot.margin = margin(r = 10) )
Ggplot is not aware that this is a bivariate plot and a normal legend will not reflect this. To fix this we have to draw our own legend, a second plot!
We use the
legend_3 dataframe to do this, splitting the labels up into their constituting coordinates. These coordinates will be used in combination with the linked colour (as used above) to plot a map of coloured boxes. Again, we are only interested in the colour and the axis along which these colours change. So, we set the theme to empty (void) and provide the necessary annotations for the axis.
# create the legend p2 <- legend_3 %>% separate(group, into = c("uv", "temp"), sep = " - ") %>% mutate(temp = as.integer(temp), uv = as.integer(uv)) %>% ggplot() + geom_tile(mapping = aes( x = uv, y = temp, fill = fill)) + scale_fill_identity() + labs(x = "sunscreen →", y = "hydrate →") + theme_void() + theme( axis.title = element_text( size = 25, ), axis.title.y = element_text(angle = 90)) + coord_fixed()
Finally, we need to compose these two elements relative to eachother. Here we’ll use the
patchwork framework to define the relatie location of each plot. Here the main plot takes up a space covering 1 - 7 columns along the X and Y-axis. The legend is located in the top right corner, taking up 1/7th of the area of the main plot.
The layout information is encoded as nested list, and provided to
patchwork for rendering (we’ll save the figure as a png using
# create a layout to plot both # the main figure and the legend layout <- c( area(t = 1, l = 1, b = 7, r = 7), area(t = 1, l = 6, b = 1, r = 7) ) # create final layout p <- p1 + p2 + plot_layout(design = layout) # save the plot ggsave("drink_or_rub.png", width = 11)
And that’s it! This will give you the figure at the top of this post!
As always, keep an eye on our blog for new entries in this series on map building, and tips and tricks to improve your R mapping skills. The code for this project can be found on our github R tutorials page. Also check out the great bivariate vector map by Timo Grossenbacher from which I borrowed the colour scheme.