Announcement

Acknowledgement

Dr. Hua Zhou’s slides

rm(list = ls()) # clean-up workspace
library("tidyverse")
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Big Sur 11.5.2
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.7     purrr_0.3.4    
## [5] readr_2.0.1     tidyr_1.1.4     tibble_3.1.5    ggplot2_3.3.5  
## [9] tidyverse_1.3.1
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.1  xfun_0.25         bslib_0.2.5.1     haven_2.4.3      
##  [5] colorspace_2.0-2  vctrs_0.3.8       generics_0.1.0    htmltools_0.5.1.1
##  [9] yaml_2.2.1        utf8_1.2.2        rlang_0.4.11      jquerylib_0.1.4  
## [13] pillar_1.6.3      glue_1.4.2        withr_2.4.2       DBI_1.1.1        
## [17] dbplyr_2.1.1      modelr_0.1.8      readxl_1.3.1      lifecycle_1.0.1  
## [21] munsell_0.5.0     gtable_0.3.0      cellranger_1.1.0  rvest_1.0.1      
## [25] evaluate_0.14     knitr_1.33        tzdb_0.1.2        fansi_0.5.0      
## [29] broom_0.7.9       Rcpp_1.0.7        backports_1.2.1   scales_1.1.1     
## [33] jsonlite_1.7.2    fs_1.5.0          hms_1.1.0         digest_0.6.28    
## [37] stringi_1.7.3     grid_4.1.1        cli_3.0.1         tools_4.1.1      
## [41] magrittr_2.0.1    sass_0.4.0        crayon_1.4.1      pkgconfig_2.0.3  
## [45] ellipsis_0.3.2    xml2_1.3.2        reprex_2.0.1      lubridate_1.7.10 
## [49] rstudioapi_0.13   assertthat_0.2.1  rmarkdown_2.10    httr_1.4.2       
## [53] R6_2.5.1          compiler_4.1.1

Coordinate systems | r4ds chapter 3.9







ggplot(nz, aes(x = long, y = lat, group = group)) +
  geom_polygon(fill = "white", colour = "black")


Graphics for communications | r4ds chapter 28

Label

labs()

Title

  • Figure title should be descriptive:

    ggplot(mpg, aes(x = displ, y = hwy)) +
      geom_point(aes(color = class)) +
      geom_smooth(se = FALSE) +
      labs(title = "Fuel efficiency generally decreases with engine size")

Subtitle and caption

  • subtitle adds additional detail in a smaller font beneath the title.

  • caption adds text at the bottom right of the plot, often used to describe the source of the data.

    ggplot(mpg, aes(displ, hwy)) +
      geom_point(aes(color = class)) +
      geom_smooth(se = FALSE) + 
      labs(
        title = "Fuel efficiency generally decreases with engine size",
        subtitle = "Two seaters (sports cars) are an exception because of their light weight",
        caption = "Data from fueleconomy.gov"
      )

Axis labels

  • ggplot(mpg, aes(displ, hwy)) +
    geom_point(aes(colour = class)) +
    geom_smooth(se = FALSE) +
    labs(
      x = "Engine displacement (L)",
      y = "Highway fuel economy (mpg)"
    )

Math equations

  • read about available options in ?plotmath

    df <- tibble(x = runif(10), y = runif(10))
    ggplot(df, aes(x, y)) + geom_point() +
      labs(
        x = quote(sum(x[i] ^ 2, i == 1, n)),
        y = quote(alpha + beta + frac(delta, theta))
      )

  • R package latex2exp could convert tex math expressions (Ref)

    library(latex2exp)
    df <- tibble(x = runif(10), y = runif(10))
    ggplot(df, aes(x, y)) + geom_point() +
      labs(
        y = TeX("Example: $\\alpha + \\beta + \\frac{\\delta}{\\theta}$"),
        x = TeX("$\\sum_{i = 1}^{n} x_i^2$")
      )

Annotations

  • Find the most fuel efficient car in each car class:

    best_in_class <- mpg %>%
      group_by(class) %>%
      filter(row_number(desc(hwy)) == 1)
    
    # equivalent as 
    # best_in_class <- filter(group_by(mpg, class), row_number(desc(hwy)) == 1)
    best_in_class
    ## # A tibble: 7 × 11
    ## # Groups:   class [7]
    ##   manufacturer model     displ  year   cyl trans  drv     cty   hwy fl    class 
    ##   <chr>        <chr>     <dbl> <int> <int> <chr>  <chr> <int> <int> <chr> <chr> 
    ## 1 chevrolet    corvette    5.7  1999     8 manua… r        16    26 p     2seat…
    ## 2 dodge        caravan …   2.4  1999     4 auto(… f        18    24 r     miniv…
    ## 3 nissan       altima      2.5  2008     4 manua… f        23    32 r     midsi…
    ## 4 subaru       forester…   2.5  2008     4 manua… 4        20    27 r     suv   
    ## 5 toyota       toyota t…   2.7  2008     4 manua… 4        17    22 r     pickup
    ## 6 volkswagen   jetta       1.9  1999     4 manua… f        33    44 d     compa…
    ## 7 volkswagen   new beet…   1.9  1999     4 manua… f        35    44 d     subco…
  • dplyr::desc function transforms a vector into a format that will be sorted in descending order

  • dplyr::filter function subsets a data frame, retaining all rows that satisfy your conditions


  • Annotate points
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(aes(colour = class)) +
  geom_text(aes(label = model), data = best_in_class)


  • geom_label() draws a rectangle behind the text
ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  geom_label(aes(label = model), data = best_in_class, nudge_y = 2, alpha = 0.5)


  • ggrepel package automatically adjust labels so that they don’t overlap:

    library("ggrepel")
    ggplot(mpg, aes(displ, hwy)) +
      geom_point(aes(colour = class)) +
      geom_point(size = 3, shape = 1, data = best_in_class) +
      ggrepel::geom_label_repel(aes(label = model), data = best_in_class)

Scales

  • ggplot(mpg, aes(displ, hwy)) +
      geom_point(aes(colour = class))

    automatically adds scales

    ggplot(mpg, aes(displ, hwy)) +
      geom_point(aes(colour = class)) +
      scale_x_continuous() +
      scale_y_continuous() +
      scale_colour_discrete()


  • breaks

    ggplot(mpg, aes(displ, hwy)) +
      geom_point() +
      scale_y_continuous(breaks = seq(15, 40, by = 5))

When you have relatively few data and want to highlight exactly where the observations occur. This plot that shows when each US president started and ended their term.

presidential %>%
  mutate(id = 33 + row_number()) %>%
  ggplot(aes(start, id)) +
    geom_point() +
    geom_segment(aes(xend = end, yend = id)) +
    scale_x_date(NULL, breaks = presidential$start, date_labels = "'%y")


  • labels

    ggplot(mpg, aes(displ, hwy)) +
      geom_point() +
      scale_x_continuous(labels = NULL) +
      scale_y_continuous(labels = NULL)


  • Plot y-axis at log scale:

    ggplot(mpg, aes(x = displ, y = hwy)) +
      geom_point() +
      scale_y_log10()


  • Plot x-axis in reverse order:

    ggplot(mpg, aes(x = displ, y = hwy)) +
      geom_point() +
      scale_x_reverse()


#install.packages("wesanderson")
library(wesanderson)
for (name in names(wes_palettes)) {
  print(wes_palette(name))
}

  • use scale_colour_manual() to use predefined mapping between values and colors
presidential %>%
  mutate(id = 33 + row_number()) %>%
  ggplot(aes(start, id, colour = party)) +
    geom_point() +
    geom_segment(aes(xend = end, yend = id)) +
    scale_colour_manual(values = c(Republican = "red", Democratic = "blue"))

  • the above plot can be improved
presidential %>%
  mutate(id = 33 + row_number()) %>%
  ggplot(aes(start, id, colour = party)) +
    geom_point() +
    geom_segment(aes(xend = end, yend = id)) +
    scale_colour_manual(values = c(Republican = "red", Democratic = "blue")) +
    scale_x_date(NULL, breaks = presidential$start, date_labels = "'%y")


  • use scale_colour_gradient() or scale_fill_gradient() for continuous colour

  • viridis::scale_colour_viridis()

df <- tibble(
  x = rnorm(10000),
  y = rnorm(10000)
)
ggplot(df, aes(x, y)) +
  geom_hex() +
  coord_fixed()

ggplot(df, aes(x, y)) +
  geom_hex() +
  viridis::scale_fill_viridis() +
  coord_fixed()


All color scales come in two variety:

  • scale_colour_x() for colour aesthetics

  • scale_fill_x() for fill aesthetics

Legends

  • Set legend position: "left", "right", "top", "bottom", none:

    ggplot(mpg, aes(displ, hwy)) +
      geom_point(aes(colour = class)) + 
      theme(legend.position = "left")


Zooming

  • Without clipping (removes unseen data points)

    ggplot(mpg, mapping = aes(displ, hwy)) +
      geom_point(aes(color = class)) +
      geom_smooth() +
      coord_cartesian(xlim = c(5, 7), ylim = c(10, 30))


  • With clipping (removes unseen data points)

    ggplot(mpg, mapping = aes(displ, hwy)) +
      geom_point(aes(color = class)) +
      geom_smooth() +
      xlim(5, 7) + ylim(10, 30)

    same as

    mpg %>%
      filter(displ >= 5, displ <= 7, hwy >= 10, hwy <= 30) %>%
      ggplot(aes(displ, hwy)) +
      geom_point(aes(color = class)) +
      geom_smooth()


  • ggplot(mpg, mapping = aes(displ, hwy)) +
      geom_point(aes(color = class)) +
      geom_smooth() +
      scale_x_continuous(limits = c(5, 7)) +
      scale_y_continuous(limits = c(10, 30))

Themes

  • ggplot(mpg, aes(displ, hwy)) +
      geom_point(aes(color = class)) +
      geom_smooth(se = FALSE) +
      theme_bw()

Saving plots

ggplot(mpg, aes(displ, hwy)) + geom_point()

ggsave("my-plot.pdf")
## Saving 7 x 5 in image

Cheat sheet

RStudio cheat sheet is extremely helpful.