rm(list = ls()) # clean-up workspace
“The simple graph has brought more information to the data analyst’s mind than any other device.”
John Tukey
data is available from the ggplot2
mpg %>% print(width = Inf)
## # A tibble: 234 × 11
## manufacturer model displ year cyl trans drv cty hwy fl
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p
## 3 audi a4 2 2008 4 manual(m6) f 20 31 p
## 4 audi a4 2 2008 4 auto(av) f 21 30 p
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p
## 7 audi a4 3.1 2008 6 auto(av) f 18 27 p
## 8 audi a4 quattro 1.8 1999 4 manual(m5) 4 18 26 p
## 9 audi a4 quattro 1.8 1999 4 auto(l5) 4 16 25 p
## 10 audi a4 quattro 2 2008 4 manual(m6) 4 20 28 p
## class
## <chr>
## 1 compact
## 2 compact
## 3 compact
## 4 compact
## 5 compact
## 6 compact
## 7 compact
## 8 compact
## 9 compact
## 10 compact
## # … with 224 more rows
Tibbles are a generalized form of data frames, which are extensively used in tidyverse.
: engine displacement, in litres.
: highway fuel efficiency, in mile per gallen (mpg).
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
vs displ
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
An aesthetic maps data to a specifc feature of plot.
Check available aesthetics for a geometric object by ?geom_point
Color points according to class
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
Assign different sizes to points according to class
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = class))
#> Warning: Using size for a discrete variable is not advised.
Assign different transparency levels to points according to class
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, alpha = class))
## Warning: Using alpha for a discrete variable is not advised.
Assign different shapes to points according to class
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = class))
Maximum of 6 shapes at a time. By default, additional groups will go unplotted.
Set the color of all points to be blue:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
Facets divide a plot into subplots based on the values of one or more discrete variables.
A subplot for each car type:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
A subplot for each car type and drive:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ class)
: smooth lineHow are these two plots similar?
vs displ
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))
Different line types according to drv
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv))
Different line colors according to drv
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy, color = drv))
Lines overlaid over scatter plot:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(x = displ, y = hwy))
Same as
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() + geom_smooth()
Different aesthetics in different layers:
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth(data = filter(mpg, class == "subcompact"), se = FALSE)
## # A tibble: 53,940 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39
## # … with 53,930 more rows
creates bar chart:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
Bar charts, like histograms, frequency polygons, smoothers, and boxplots, plot some computed variables instead of raw data.
Check available computed variables for a geometric object via help:
Use stat_count()
ggplot(data = diamonds) +
stat_count(mapping = aes(x = cut))
has a default geom geom_bar()
Display frequency instead of counts:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = stat(prop), group = 1))
Note the aesthetics mapping
overwrites the default grouping (by cut
) by considering all observations as a group. Without this we get
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = stat(prop)))
vs geom_col()
makes the height of the bar proportional to the number of cases in each group (or if the weight aesthetic is supplied, the sum of the weights).
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
The height of bar is the number of diamonds in each cut category.
makes the heights of the bars to represent values in the data.
ggplot(data = diamonds) +
geom_col(mapping = aes(x = cut, y = carat))
The height of bar is total carat in each cut category.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, weight = carat))
Color bar:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, colour = cut))
Fill color:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut))
Fill color according to another variable:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity))
The stacking is performed automatically by the position adjustment specified by the position
If you don’t want a stacked bar chart, you can use one of three other options:
position = "identity"
will place each object exactly where it falls in the context of the graph.
ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) +
geom_bar(alpha = 1/5, position = "identity")
ggplot(data = diamonds, mapping = aes(x = cut, colour = clarity)) +
geom_bar(fill = NA, position = "identity")
setting alpha
to a small value makes the bars slightly transparent
position adjustment is more useful (default) for 2d geoms
position = "fill"
works like stacking, but makes each set of stacked bars the same height.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")
position = "dodge"
places overlapping objects directly beside one another.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")
add random noise to X and Y position of each element to avoid overplotting:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), position = "jitter")
is similar:
ggplot(data = mpg) +
geom_jitter(mapping = aes(x = displ, y = hwy))