Announcement
Acknowledgement
Coordinate systems | r4ds chapter 3.9
Graphics for communications | r4ds chapter 28
- Label
- Annotations
- Scales
- Legends
- Zooming
- Themes
- Saving plots
- Cheat sheet

Announcement

Mid-term evaluation (voluntary, anonymous, ~ 10 min)
- 3 / 19
HW1 due today
HW2 will be posted on Friday
HW1 and project description will be graded this weekend
- Check if you forgot to merge from develop branch into master branch
- git push them into your repository (even if you have emailed me, just in case…)
- HW1 will be graded by two questions (selected non-randomly)
Lab keys are for you to check your results.
- If your function behaves differently from what you expect, time to DEBUG
Try using your local git application. Try not to upload files via GitHub webpage.
- At the minimum, avoid “Add files via upload” as commit message (Howto).

Acknowledgement

Dr. Hua Zhou’s slides

rm(list = ls()) # clean-up workspace
library("tidyverse")

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

sessionInfo()

## R version 4.1.1 (2021-08-10)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Big Sur 11.5.2
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.7     purrr_0.3.4    
## [5] readr_2.0.1     tidyr_1.1.4     tibble_3.1.5    ggplot2_3.3.5  
## [9] tidyverse_1.3.1
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.1  xfun_0.25         bslib_0.2.5.1     haven_2.4.3      
##  [5] colorspace_2.0-2  vctrs_0.3.8       generics_0.1.0    htmltools_0.5.1.1
##  [9] yaml_2.2.1        utf8_1.2.2        rlang_0.4.11      jquerylib_0.1.4  
## [13] pillar_1.6.3      glue_1.4.2        withr_2.4.2       DBI_1.1.1        
## [17] dbplyr_2.1.1      modelr_0.1.8      readxl_1.3.1      lifecycle_1.0.1  
## [21] munsell_0.5.0     gtable_0.3.0      cellranger_1.1.0  rvest_1.0.1      
## [25] evaluate_0.14     knitr_1.33        tzdb_0.1.2        fansi_0.5.0      
## [29] broom_0.7.9       Rcpp_1.0.7        backports_1.2.1   scales_1.1.1     
## [33] jsonlite_1.7.2    fs_1.5.0          hms_1.1.0         digest_0.6.28    
## [37] stringi_1.7.3     grid_4.1.1        cli_3.0.1         tools_4.1.1      
## [41] magrittr_2.0.1    sass_0.4.0        crayon_1.4.1      pkgconfig_2.0.3  
## [45] ellipsis_0.3.2    xml2_1.3.2        reprex_2.0.1      lubridate_1.7.10 
## [49] rstudioapi_0.13   assertthat_0.2.1  rmarkdown_2.10    httr_1.4.2       
## [53] R6_2.5.1          compiler_4.1.1

Coordinate systems | r4ds chapter 3.9

Recall the mpg data:

mpg

## # A tibble: 234 × 11
##    manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 audi         a4           1.8  1999     4 auto… f        18    29 p     comp…
##  2 audi         a4           1.8  1999     4 manu… f        21    29 p     comp…
##  3 audi         a4           2    2008     4 manu… f        20    31 p     comp…
##  4 audi         a4           2    2008     4 auto… f        21    30 p     comp…
##  5 audi         a4           2.8  1999     6 auto… f        16    26 p     comp…
##  6 audi         a4           2.8  1999     6 manu… f        18    26 p     comp…
##  7 audi         a4           3.1  2008     6 auto… f        18    27 p     comp…
##  8 audi         a4 quattro   1.8  1999     4 manu… 4        18    26 p     comp…
##  9 audi         a4 quattro   1.8  1999     4 auto… 4        16    25 p     comp…
## 10 audi         a4 quattro   2    2008     4 manu… 4        20    28 p     comp…
## # … with 224 more rows

Boxplots (grouped by class):

ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
  geom_boxplot()

coord_cartesian() is the default cartesian coordinate system:

ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
  geom_boxplot() + 
  coord_cartesian(xlim = c(0, 5))

coord_fixed() specifies aspect ratio (x / y):

ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
  geom_boxplot() + 
  coord_fixed(ratio = 1/2)

coord_flip() flips x- and y- axis:

ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
  geom_boxplot() + 
  coord_flip()

Pie chart:

bar <- ggplot(data = diamonds) + 
  geom_bar(
    mapping = aes(x = cut, fill = cut), 
    show.legend = FALSE,
    width = 1
  ) + 
  theme(aspect.ratio = 1) +
  labs(x = NULL, y = NULL)

bar + coord_flip()

bar + coord_polar()

A map:

library("maps")

## 
## Attaching package: 'maps'

## The following object is masked from 'package:purrr':
## 
##     map

nz <- map_data("nz")
head(nz, 20)

##        long       lat group order        region subregion
## 1  172.7433 -34.44215     1     1 North.Island       <NA>
## 2  172.7983 -34.45562     1     2 North.Island       <NA>
## 3  172.8528 -34.44846     1     3 North.Island       <NA>
## 4  172.8986 -34.41786     1     4 North.Island       <NA>
## 5  172.9593 -34.42503     1     5 North.Island       <NA>
## 6  173.0184 -34.39895     1     6 North.Island       <NA>
## 7  173.0229 -34.44662     1     7 North.Island       <NA>
## 8  173.0184 -34.49343     1     8 North.Island       <NA>
## 9  172.9616 -34.50426     1     9 North.Island       <NA>
## 10 172.9181 -34.47367     1    10 North.Island       <NA>
## 11 172.9353 -34.52225     1    11 North.Island       <NA>
## 12 172.8808 -34.51504     1    12 North.Island       <NA>
## 13 172.9049 -34.55646     1    13 North.Island       <NA>
## 14 172.9553 -34.53303     1    14 North.Island       <NA>
## 15 172.9376 -34.57806     1    15 North.Island       <NA>
## 16 172.9760 -34.61227     1    16 North.Island       <NA>
## 17 172.9926 -34.56723     1    17 North.Island       <NA>
## 18 173.0218 -34.61404     1    18 North.Island       <NA>
## 19 173.0396 -34.65902     1    19 North.Island       <NA>
## 20 173.0676 -34.70044     1    20 North.Island       <NA>

ggplot(nz, aes(x = long, y = lat, group = group)) +
  geom_polygon(fill = "white", colour = "black")

coord_quickmap() puts maps in scale:

ggplot(nz, aes(long, lat, group = group)) +
  geom_polygon(fill = "white", colour = "black") +
  coord_quickmap()

Graphics for communications | r4ds chapter 28

Label

labs()

Title

Figure title should be descriptive:

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(se = FALSE) +
  labs(title = "Fuel efficiency generally decreases with engine size")

Subtitle and caption

subtitle adds additional detail in a smaller font beneath the title.

caption adds text at the bottom right of the plot, often used to describe the source of the data.

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(se = FALSE) + 
  labs(
    title = "Fuel efficiency generally decreases with engine size",
    subtitle = "Two seaters (sports cars) are an exception because of their light weight",
    caption = "Data from fueleconomy.gov"
  )

Axis labels

ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_smooth(se = FALSE) +
labs(
  x = "Engine displacement (L)",
  y = "Highway fuel economy (mpg)"
)

Math equations

read about available options in ?plotmath

df <- tibble(x = runif(10), y = runif(10))
ggplot(df, aes(x, y)) + geom_point() +
  labs(
    x = quote(sum(x[i] ^ 2, i == 1, n)),
    y = quote(alpha + beta + frac(delta, theta))
  )

R package latex2exp could convert tex math expressions (Ref)

library(latex2exp)
df <- tibble(x = runif(10), y = runif(10))
ggplot(df, aes(x, y)) + geom_point() +
  labs(
    y = TeX("Example: $\\alpha + \\beta + \\frac{\\delta}{\\theta}$"),
    x = TeX("$\\sum_{i = 1}^{n} x_i^2$")
  )

Annotations

Find the most fuel efficient car in each car class:

best_in_class <- mpg %>%
  group_by(class) %>%
  filter(row_number(desc(hwy)) == 1)

# equivalent as 
# best_in_class <- filter(group_by(mpg, class), row_number(desc(hwy)) == 1)
best_in_class

## # A tibble: 7 × 11
## # Groups:   class [7]
##   manufacturer model     displ  year   cyl trans  drv     cty   hwy fl    class 
##   <chr>        <chr>     <dbl> <int> <int> <chr>  <chr> <int> <int> <chr> <chr> 
## 1 chevrolet    corvette    5.7  1999     8 manua… r        16    26 p     2seat…
## 2 dodge        caravan …   2.4  1999     4 auto(… f        18    24 r     miniv…
## 3 nissan       altima      2.5  2008     4 manua… f        23    32 r     midsi…
## 4 subaru       forester…   2.5  2008     4 manua… 4        20    27 r     suv   
## 5 toyota       toyota t…   2.7  2008     4 manua… 4        17    22 r     pickup
## 6 volkswagen   jetta       1.9  1999     4 manua… f        33    44 d     compa…
## 7 volkswagen   new beet…   1.9  1999     4 manua… f        35    44 d     subco…

dplyr::desc function transforms a vector into a format that will be sorted in descending order
dplyr::filter function subsets a data frame, retaining all rows that satisfy your conditions

Annotate points

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(aes(colour = class)) +
  geom_text(aes(label = model), data = best_in_class)

geom_label() draws a rectangle behind the text

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  geom_label(aes(label = model), data = best_in_class, nudge_y = 2, alpha = 0.5)

ggrepel package automatically adjust labels so that they don’t overlap:

library("ggrepel")
ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  geom_point(size = 3, shape = 1, data = best_in_class) +
  ggrepel::geom_label_repel(aes(label = model), data = best_in_class)

Scales

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class))

automatically adds scales

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  scale_x_continuous() +
  scale_y_continuous() +
  scale_colour_discrete()

breaks

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  scale_y_continuous(breaks = seq(15, 40, by = 5))

When you have relatively few data and want to highlight exactly where the observations occur. This plot that shows when each US president started and ended their term.

presidential %>%
  mutate(id = 33 + row_number()) %>%
  ggplot(aes(start, id)) +
    geom_point() +
    geom_segment(aes(xend = end, yend = id)) +
    scale_x_date(NULL, breaks = presidential$start, date_labels = "'%y")

labels

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  scale_x_continuous(labels = NULL) +
  scale_y_continuous(labels = NULL)

Plot y-axis at log scale:

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  scale_y_log10()

Plot x-axis in reverse order:

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  scale_x_reverse()

ColorBrewer scales are documentd online at http://colorbrewer2.org/
Available via RColorBrewer package

Current favorite, R package wesanderson that stores Wes Anderson Palettes.

#install.packages("wesanderson")
library(wesanderson)
for (name in names(wes_palettes)) {
  print(wes_palette(name))
}

use scale_colour_manual() to use predefined mapping between values and colors

presidential %>%
  mutate(id = 33 + row_number()) %>%
  ggplot(aes(start, id, colour = party)) +
    geom_point() +
    geom_segment(aes(xend = end, yend = id)) +
    scale_colour_manual(values = c(Republican = "red", Democratic = "blue"))

the above plot can be improved

presidential %>%
  mutate(id = 33 + row_number()) %>%
  ggplot(aes(start, id, colour = party)) +
    geom_point() +
    geom_segment(aes(xend = end, yend = id)) +
    scale_colour_manual(values = c(Republican = "red", Democratic = "blue")) +
    scale_x_date(NULL, breaks = presidential$start, date_labels = "'%y")

use scale_colour_gradient() or scale_fill_gradient() for continuous colour
viridis::scale_colour_viridis()

df <- tibble(
  x = rnorm(10000),
  y = rnorm(10000)
)
ggplot(df, aes(x, y)) +
  geom_hex() +
  coord_fixed()

ggplot(df, aes(x, y)) +
  geom_hex() +
  viridis::scale_fill_viridis() +
  coord_fixed()

All color scales come in two variety:

scale_colour_x() for colour aesthetics
scale_fill_x() for fill aesthetics

Legends

Set legend position: "left", "right", "top", "bottom", none:

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) + 
  theme(legend.position = "left")

See following link for more details on how to change title, labels, … of a legend.

http://www.sthda.com/english/wiki/ggplot2-legend-easy-steps-to-change-the-position-and-the-appearance-of-a-graph-legend-in-r-software

Zooming

Without clipping (removes unseen data points)

ggplot(mpg, mapping = aes(displ, hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth() +
  coord_cartesian(xlim = c(5, 7), ylim = c(10, 30))

With clipping (removes unseen data points)

ggplot(mpg, mapping = aes(displ, hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth() +
  xlim(5, 7) + ylim(10, 30)

same as

mpg %>%
  filter(displ >= 5, displ <= 7, hwy >= 10, hwy <= 30) %>%
  ggplot(aes(displ, hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth()

ggplot(mpg, mapping = aes(displ, hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth() +
  scale_x_continuous(limits = c(5, 7)) +
  scale_y_continuous(limits = c(10, 30))

Themes

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(se = FALSE) +
  theme_bw()

Saving plots

ggplot(mpg, aes(displ, hwy)) + geom_point()

ggsave("my-plot.pdf")
## Saving 7 x 5 in image

Cheat sheet

RStudio cheat sheet is extremely helpful.

Data visualization with ggplot2 (cont.)

MATH-7360 Data Analysis

Dr. Xiang Ji @ Tulane University

Oct 6, 2021