Mid-term evaluation (voluntary, anonymous, ~ 10 min)
HW1 due today
HW2 will be posted on Friday
HW1 and project description will be graded this weekend
Check if you forgot to merge
from develop branch into master branch
git push them into your repository (even if you have emailed me, just in case…)
HW1 will be graded by two questions (selected non-randomly)
Lab keys are for you to check your results.
Try using your local git
application. Try not to upload files via GitHub webpage.
Dr. Hua Zhou’s slides
rm(list = ls()) # clean-up workspace
library("tidyverse")
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.5 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Big Sur 11.5.2
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
## [5] readr_2.0.1 tidyr_1.1.4 tibble_3.1.5 ggplot2_3.3.5
## [9] tidyverse_1.3.1
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.1.1 xfun_0.25 bslib_0.2.5.1 haven_2.4.3
## [5] colorspace_2.0-2 vctrs_0.3.8 generics_0.1.0 htmltools_0.5.1.1
## [9] yaml_2.2.1 utf8_1.2.2 rlang_0.4.11 jquerylib_0.1.4
## [13] pillar_1.6.3 glue_1.4.2 withr_2.4.2 DBI_1.1.1
## [17] dbplyr_2.1.1 modelr_0.1.8 readxl_1.3.1 lifecycle_1.0.1
## [21] munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0 rvest_1.0.1
## [25] evaluate_0.14 knitr_1.33 tzdb_0.1.2 fansi_0.5.0
## [29] broom_0.7.9 Rcpp_1.0.7 backports_1.2.1 scales_1.1.1
## [33] jsonlite_1.7.2 fs_1.5.0 hms_1.1.0 digest_0.6.28
## [37] stringi_1.7.3 grid_4.1.1 cli_3.0.1 tools_4.1.1
## [41] magrittr_2.0.1 sass_0.4.0 crayon_1.4.1 pkgconfig_2.0.3
## [45] ellipsis_0.3.2 xml2_1.3.2 reprex_2.0.1 lubridate_1.7.10
## [49] rstudioapi_0.13 assertthat_0.2.1 rmarkdown_2.10 httr_1.4.2
## [53] R6_2.5.1 compiler_4.1.1
Recall the mpg data:
mpg
## # A tibble: 234 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
## 3 audi a4 2 2008 4 manu… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
## 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
## 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
## 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
## # … with 224 more rows
Boxplots (grouped by class):
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot()
coord_cartesian()
is the default cartesian coordinate system:
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot() +
coord_cartesian(xlim = c(0, 5))
coord_fixed()
specifies aspect ratio (x / y):
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot() +
coord_fixed(ratio = 1/2)
coord_flip()
flips x- and y- axis:
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot() +
coord_flip()
Pie chart:
bar <- ggplot(data = diamonds) +
geom_bar(
mapping = aes(x = cut, fill = cut),
show.legend = FALSE,
width = 1
) +
theme(aspect.ratio = 1) +
labs(x = NULL, y = NULL)
bar + coord_flip()
bar + coord_polar()
A map:
library("maps")
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
nz <- map_data("nz")
head(nz, 20)
## long lat group order region subregion
## 1 172.7433 -34.44215 1 1 North.Island <NA>
## 2 172.7983 -34.45562 1 2 North.Island <NA>
## 3 172.8528 -34.44846 1 3 North.Island <NA>
## 4 172.8986 -34.41786 1 4 North.Island <NA>
## 5 172.9593 -34.42503 1 5 North.Island <NA>
## 6 173.0184 -34.39895 1 6 North.Island <NA>
## 7 173.0229 -34.44662 1 7 North.Island <NA>
## 8 173.0184 -34.49343 1 8 North.Island <NA>
## 9 172.9616 -34.50426 1 9 North.Island <NA>
## 10 172.9181 -34.47367 1 10 North.Island <NA>
## 11 172.9353 -34.52225 1 11 North.Island <NA>
## 12 172.8808 -34.51504 1 12 North.Island <NA>
## 13 172.9049 -34.55646 1 13 North.Island <NA>
## 14 172.9553 -34.53303 1 14 North.Island <NA>
## 15 172.9376 -34.57806 1 15 North.Island <NA>
## 16 172.9760 -34.61227 1 16 North.Island <NA>
## 17 172.9926 -34.56723 1 17 North.Island <NA>
## 18 173.0218 -34.61404 1 18 North.Island <NA>
## 19 173.0396 -34.65902 1 19 North.Island <NA>
## 20 173.0676 -34.70044 1 20 North.Island <NA>
ggplot(nz, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "white", colour = "black")
coord_quickmap()
puts maps in scale:
ggplot(nz, aes(long, lat, group = group)) +
geom_polygon(fill = "white", colour = "black") +
coord_quickmap()
labs()
Figure title should be descriptive:
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(title = "Fuel efficiency generally decreases with engine size")
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_smooth(se = FALSE) +
labs(
x = "Engine displacement (L)",
y = "Highway fuel economy (mpg)"
)
read about available options in ?plotmath
df <- tibble(x = runif(10), y = runif(10))
ggplot(df, aes(x, y)) + geom_point() +
labs(
x = quote(sum(x[i] ^ 2, i == 1, n)),
y = quote(alpha + beta + frac(delta, theta))
)
R package latex2exp
could convert tex math expressions (Ref)
library(latex2exp)
df <- tibble(x = runif(10), y = runif(10))
ggplot(df, aes(x, y)) + geom_point() +
labs(
y = TeX("Example: $\\alpha + \\beta + \\frac{\\delta}{\\theta}$"),
x = TeX("$\\sum_{i = 1}^{n} x_i^2$")
)
Find the most fuel efficient car in each car class:
best_in_class <- mpg %>%
group_by(class) %>%
filter(row_number(desc(hwy)) == 1)
# equivalent as
# best_in_class <- filter(group_by(mpg, class), row_number(desc(hwy)) == 1)
best_in_class
## # A tibble: 7 × 11
## # Groups: class [7]
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 chevrolet corvette 5.7 1999 8 manua… r 16 26 p 2seat…
## 2 dodge caravan … 2.4 1999 4 auto(… f 18 24 r miniv…
## 3 nissan altima 2.5 2008 4 manua… f 23 32 r midsi…
## 4 subaru forester… 2.5 2008 4 manua… 4 20 27 r suv
## 5 toyota toyota t… 2.7 2008 4 manua… 4 17 22 r pickup
## 6 volkswagen jetta 1.9 1999 4 manua… f 33 44 d compa…
## 7 volkswagen new beet… 1.9 1999 4 manua… f 35 44 d subco…
dplyr::desc
function transforms a vector into a format that will be sorted in descending order
dplyr::filter
function subsets a data frame, retaining all rows that satisfy your conditions
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(colour = class)) +
geom_text(aes(label = model), data = best_in_class)
geom_label()
draws a rectangle behind the textggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_label(aes(label = model), data = best_in_class, nudge_y = 2, alpha = 0.5)
ggrepel
package automatically adjust labels so that they don’t overlap:
library("ggrepel")
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_point(size = 3, shape = 1, data = best_in_class) +
ggrepel::geom_label_repel(aes(label = model), data = best_in_class)
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class))
automatically adds scales
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
scale_x_continuous() +
scale_y_continuous() +
scale_colour_discrete()
breaks
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
scale_y_continuous(breaks = seq(15, 40, by = 5))
When you have relatively few data and want to highlight exactly where the observations occur. This plot that shows when each US president started and ended their term.
presidential %>%
mutate(id = 33 + row_number()) %>%
ggplot(aes(start, id)) +
geom_point() +
geom_segment(aes(xend = end, yend = id)) +
scale_x_date(NULL, breaks = presidential$start, date_labels = "'%y")
labels
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
scale_x_continuous(labels = NULL) +
scale_y_continuous(labels = NULL)
Plot y-axis at log scale:
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
scale_y_log10()
Plot x-axis in reverse order:
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
scale_x_reverse()
ColorBrewer scales are documentd online at http://colorbrewer2.org/
Available via RColorBrewer package
#install.packages("wesanderson")
library(wesanderson)
for (name in names(wes_palettes)) {
print(wes_palette(name))
}
scale_colour_manual()
to use predefined mapping between values and colorspresidential %>%
mutate(id = 33 + row_number()) %>%
ggplot(aes(start, id, colour = party)) +
geom_point() +
geom_segment(aes(xend = end, yend = id)) +
scale_colour_manual(values = c(Republican = "red", Democratic = "blue"))
presidential %>%
mutate(id = 33 + row_number()) %>%
ggplot(aes(start, id, colour = party)) +
geom_point() +
geom_segment(aes(xend = end, yend = id)) +
scale_colour_manual(values = c(Republican = "red", Democratic = "blue")) +
scale_x_date(NULL, breaks = presidential$start, date_labels = "'%y")
use scale_colour_gradient()
or scale_fill_gradient()
for continuous colour
viridis::scale_colour_viridis()
df <- tibble(
x = rnorm(10000),
y = rnorm(10000)
)
ggplot(df, aes(x, y)) +
geom_hex() +
coord_fixed()
ggplot(df, aes(x, y)) +
geom_hex() +
viridis::scale_fill_viridis() +
coord_fixed()
All color scales come in two variety:
scale_colour_x()
for colour
aesthetics
scale_fill_x()
for fill
aesthetics
Set legend position: "left"
, "right"
, "top"
, "bottom"
, none
:
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
theme(legend.position = "left")
See following link for more details on how to change title, labels, … of a legend.
Without clipping (removes unseen data points)
ggplot(mpg, mapping = aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth() +
coord_cartesian(xlim = c(5, 7), ylim = c(10, 30))
With clipping (removes unseen data points)
ggplot(mpg, mapping = aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth() +
xlim(5, 7) + ylim(10, 30)
same as
mpg %>%
filter(displ >= 5, displ <= 7, hwy >= 10, hwy <= 30) %>%
ggplot(aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth()
ggplot(mpg, mapping = aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth() +
scale_x_continuous(limits = c(5, 7)) +
scale_y_continuous(limits = c(10, 30))
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
theme_bw()
ggplot(mpg, aes(displ, hwy)) + geom_point()
ggsave("my-plot.pdf")
## Saving 7 x 5 in image
RStudio cheat sheet is extremely helpful.