rm(list = ls()) # clean-up workspace
No due time for Lab submissions, but please do finish + submit them before semester ends
Please use RMarkdown file for Labs, not R files.
I’d love to learn it for you, but it doesn’t work that way. Learning seems more similar to eating…
Lab 2 solutions posted
1-page report of course project due this week
Homogeneous | Heterogeneous | |
---|---|---|
1d | Atomic vector | List |
2d | Matrix | Data frame |
nd | Array |
Homogeneous: all contents must be of the same type
Heterogeneous: the contents can be of different types
The basic data structure in R.
Two flavors: atomic vectors and lists
Three common properties:
Type, typeof()
, what it is.
Length, length()
, how many elements it contains.
Attributes, attributes()
, additional arbitrary metadata.
No scalars in R. They are length 1 vectors.
Note: is.vector()
does not test if an object is a vector. Use is.atomic()
or is.list()
to test.
There are four common types of atomic vectors (remember Lab 2?)
logical
integer
numeric (actually double)
character
Many commands in R generate a vector of output, rather than a single number.
The c()
command: creates a vector containing a list of specific elements.
Example 1
c(7, 3, 6, 0)
## [1] 7 3 6 0
c(73:60)
## [1] 73 72 71 70 69 68 67 66 65 64 63 62 61 60
c(7:3, 6:0)
## [1] 7 6 5 4 3 6 5 4 3 2 1 0
c(rep(7:3, 6), 0)
## [1] 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 0
Example 2 The command seq()
creates a sequence of numbers.
seq(7)
## [1] 1 2 3 4 5 6 7
seq(3, 70, by = 6)
## [1] 3 9 15 21 27 33 39 45 51 57 63 69
seq(3, 70, length = 6)
## [1] 3.0 16.4 29.8 43.2 56.6 70.0
c()
’s:Example 3
c(1, c(2, c(3, 4)))
## [1] 1 2 3 4
Elements can be of any type, including lists.
Construct list by using list()
instead of c()
.
x <- list(1:3, "a", c(TRUE, FALSE, TRUE), c(2.3, 5.9))
str(x)
## List of 4
## $ : int [1:3] 1 2 3
## $ : chr "a"
## $ : logi [1:3] TRUE FALSE TRUE
## $ : num [1:2] 2.3 5.9
$
.x.named <- list(vector = 1:3, name = "a", logical = c(TRUE, FALSE, TRUE), range = c(2.3, 5.9))
str(x.named)
## List of 4
## $ vector : int [1:3] 1 2 3
## $ name : chr "a"
## $ logical: logi [1:3] TRUE FALSE TRUE
## $ range : num [1:2] 2.3 5.9
x.named$vector
## [1] 1 2 3
x.named$range
## [1] 2.3 5.9
Lists are used to build up many of the more complicated data structures in R.
For example, both data frames (another data structure in R) and linear models objects (as produced by lm()
) are lists.
All objects can have arbitrary additional attributes to store metadata about the object.
Attributes can be thought as a named list.
Use attr()
to access individual attribute or attributes()
to access all attributes as a list.
By default, most attributes are lost when modifying a vector. Only the most important ones stay:
Names, a character vector giving each element a name.
Dimensions, used to turn vectors into matrices and arrays.
Class, used to implement S3 object system.
y <- 1:10
attr(y, "my_attribute") <- "This is a vector"
attr(y, "my_attribute")
## [1] "This is a vector"
str(y)
## int [1:10] 1 2 3 4 5 6 7 8 9 10
## - attr(*, "my_attribute")= chr "This is a vector"
str(attributes(y))
## List of 1
## $ my_attribute: chr "This is a vector"
A factor is a vector that can contain only predefined values and is used to store categorical data.
Built upon integer vectors using two attributes:
the class
, “factor”: makes them behave differently from regular integer vectors
the levels
: defines the set of allowed values
Sometimes when a data frame is read directly from a file, you may get a column of factor instead of numeric because of non-numeric value in the column (e.g. missing value encoded specially)
Possible remedy: coerce the vector from a factor to a character vecctor, and then from a character to a double vector
Better use na.strings
argument to read.csv()
function
Use brackets to select element of a vector.
x <- 73:60
x[2]
## [1] 72
x[2:5]
## [1] 72 71 70 69
x[-(2:5)]
## [1] 73 68 67 66 65 64 63 62 61 60
Can access by “name” (safe with column/row order changes)
y <- 1:3
names(y) <- c("do", "re", "mi")
y[3]
## mi
## 3
y["mi"]
## mi
## 3
adding a dim
attribute to an atomic vector allows it to behave like a multi-dimensional array
matrix is a special case of array
matrix()
command creates a matrix from the given set of values
# Two scalar arguments to specify rows and columns
a <- matrix(1:6, ncol = 3, nrow = 2)
# One vector argument to describe all dimensions
b <- array(1:12, c(2, 3, 2))
# You can also modify an object in place by setting dim()
c <- 1:6
dim(c) <- c(3, 2)
c
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
dim(c) <- c(2, 3)
c
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
Exercise Write a command to generate a random permutation of the numbers between 1 and 5 and save it to an object.
set.seed(7360) # the course seed number
order(runif(5))
## [1] 3 5 2 4 1
sample(1:5, 5)
## [1] 2 1 5 3 4
Most common way of storing data in R
A list of equal-length vectors
2-dimensional structure, shares properties of both matrix
and list
has attributes, names()
, colnames()
and rownames()
length()
of a data frame is the length of the underlying list, same as ncol()
We will focus more on tibble
, a data frame, but more.
Functions are a fundamental building block of R
Functions are objects in their own right (so that they can have attributes()
)
All R functions have three parts:
the formals()
, the list of arguments which controls how you can call the function
the body()
, the code inside the function
the environment()
, the “map” of the location of the function’s variables
f <- function(x) x^2
f
## function(x) x^2
formals(f)
## $x
body(f)
## x^2
environment(f)
## <environment: R_GlobalEnv>
There is no special syntax for defining and naming a function
simply create a function object (with function
) and bind it to a name with <-
DoNothing <- function() {
return(invisible(NULL))
}
DoNothing()
mean(1:10, na.rm = TRUE)
## [1] 5.5
args <- list(1:10, na.rm = TRUE)
do.call(mean, args)
## [1] 5.5
do.call()
.Now let’s discuss scoping
R uses lexical scoping that follows four primary rules:
Name masking
Functions versus variables
A fresh start
Dynamic lookup
x <- 10
y <- 20
g02 <- function(){
x <- 1 # a local variable to the function
y <- 2
c(x, y)
}
g02()
## [1] 1 2
x <- 2
g03 <- function() {
y <- 1
c(x, y)
}
g03()
## [1] 2 1
y
## [1] 20
R searches inside the current function, then looks where the function is defined and so on, all the way up to the global environment.
Finally, R looks in other loaded packages.
y <- 10
f <- function(x) {
y <- 2
y^2 + g(x)
}
g <- function(x) {
x * y
}
What is the value of f(3)
?
In R, functions are ordinary objects. This means the scoping rules described above also apply to functions.
However, rules get complicated when functions and non-functions share the same name.
Better avoid assigning same names to objects
g11 <- function() {
if (!exists("a")) {
a <- 1
} else {
a <- a + 1
}
a
}
g11()
## [,1] [,2] [,3]
## [1,] 2 4 6
## [2,] 3 5 7
g11()
## [,1] [,2] [,3]
## [1,] 2 4 6
## [2,] 3 5 7
What happens if we do
a <- 1:5
g11()
g11()
Lexical scoping determines where to look for values.
R looks for values when the function is run, not when the function is created.
g12 <- function() x + 1
x <- 15
g12()
## [1] 16
x <- 20
g12()
## [1] 21
Depending on variables defined in the global environment can be bad!
codetools::findGlobals()
can be helpful
You can define default values for arguments
Default values can be in terms of other arguments, or even in terms of variables defined later in the function
This is because R uses Lazy Evaluation that function arguments are only evaluated if accessed.
h04 <- function(x = 1, y = x * 2, z = a + b) {
a <- 10
b <- 100
c(x, y, z)
}
h04()
## [1] 1 2 110
...
(dot-dot-dot)Functions can have a special argument ...
With ...
, a function can take any number of additional arguments
You can use ...
to pass those additional arguments on to another function
Pro
x <- list(c(1, 3, NA), c(4, NA, 6))
str(lapply(x, mean, na.rm = TRUE))
## List of 2
## $ : num 2
## $ : num 5
Con
sum(1, 2, NA, na_rm = TRUE)
## [1] NA
These are the basic control-flow constructs of the R language. They function in much the same way as control statements in any Algol-like (Algol short for “Algorithmic Language”) language. They are all reserved words.
keyword | usage |
---|---|
if | if(cond) expr |
if-else | if(cond) cons.expr else alt.expr |
for | for(var in seq) expr |
while | while(cond) expr |
break | breaks out of a for loop |
next | halts the processing of the current iteration and advances the looping index |
Most functions exit in one of two ways:
return a value, indicating success
throw an error, indicating failure
There are two ways that a function can return a value:
j01 <- function(x) {
if (x < 10) {
0
} else {
10
}
}
j01(5)
## [1] 0
j01(15)
## [1] 10
return()
j02 <- function(x) {
if (x < 10) {
return(0)
} else {
return(10)
}
}
invisible()
to the last value:j04 <- function() invisible(1)
j04()
If a function cannot complete its assigned task, it should throw an error with stop()
, which immediately terminates the execution of the function.
j05 <- function() {
stop("I'm an error")
return(10)
}
j05()
## Error in j05(): I'm an error
Use on.exit()
to set up an exit handler that is run regardless of whether the function exits normally or with an error
Always set add = TRUE
when using on.exit()
. Otherwise, each call will overwrite the previous exit handler.
j06 <- function(x) {
cat("Hello\n")
on.exit(cat("Goodbye!\n"), add = TRUE)
if (x) {
return(10)
} else {
stop("Error")
}
}
j06(TRUE)
## Hello
## Goodbye!
## [1] 10
j06(FALSE)
## Hello
## Error in j06(FALSE): Error
## Goodbye!
with_dir <- function(dir, code) {
old <- setwd(dir)
on.exit(setwd(old), add = TRUE)
code
}
getwd()
## [1] "/Users/xji3/Dropbox/My_Files/Tulane/Teaching/tulane-math-7360-2021.github.io/lectures/06-Data_structure"
with_dir("~", getwd())
## [1] "/Users/xji3"
getwd()
## [1] "/Users/xji3/Dropbox/My_Files/Tulane/Teaching/tulane-math-7360-2021.github.io/lectures/06-Data_structure"