If you are not familiar with shell environment, you could start with finding where you are by type command pwd
in the terminal window. You can also open the graphic file system by type open .
(for mac) and xdg-open .
(for linux) in the terminal. If you want to change to another location, try to play with terminal using commands from the following list. Tip: cd
Some useful commands to get you started.
pwd
print absolute path to the current working directory (where you are right now)
ls
list contents of a directory
ls -l
list detailed contents of a directory
ls -al
list all contents of a directory, including those start with .
(hidden files/folders)
cd
change directory
cd ..
go to the parent directory of the current working directory
Manipulate files and directories
cp
copies file to a new location.
mv
moves file to a new location.
touch
creates a text file; if file already exists, it’s left unchanged.
rm
deletes a file.
mkdir
creates a new directory.
rmdir
deletes an empty directory.
rm -rf
deletes a directory and all contents in that directory (be cautious using the -f
option …).
rm(list = ls()) # clean up workspace first
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Big Sur 11.5.2
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.27 R6_2.5.0 jsonlite_1.7.2 magrittr_2.0.1
## [5] evaluate_0.14 rlang_0.4.11 stringi_1.7.3 jquerylib_0.1.4
## [9] bslib_0.2.5.1 rmarkdown_2.10 tools_4.1.1 stringr_1.4.0
## [13] xfun_0.25 yaml_2.2.1 compiler_4.1.1 htmltools_0.5.1.1
## [17] knitr_1.33 sass_0.4.0
We will implement two simplest approaches for numerical integration and apply them to integrate sin(x)
from \(0\) to \(\pi\).
midpoint
and trapezoid
values.\(\mbox{midpoint} = (b - a) \times f(\frac{a + b}{2})\)
\(\mbox{trapezoid} = \frac{b - a}{2} \times (f(a) + f(b))\)
# if you are modifying this source file directly, remember to change the above flag to eval=TRUE
# finish the code below
midpoint <- function(f, a, b) {
result <-
return(result)
}
trapezoid <- function(f, a, b) {
result <-
return(result)
}
# what do you get?
midpoint(sin, 0, pi)
trapezoid(sin, 0, pi)
midpoint.composite <- function(f, a, b, n = 10) {
points <- seq(a, b, length = n + 1)
area <- 0
for (i in seq_len(n)) {
area <- area + midpoint()
}
return(area)
}
trapezoid.composite <- function(f, a, b, n = 10) {
points <- seq(a, b, length = n + 1)
area <- 0
for (i in seq_len(n)) {
area <- area + trapezoid()
}
return(area)
}
midpoint.composite(sin, 0, pi, n = 10)
midpoint.composite(sin, 0, pi, n = 100)
midpoint.composite(sin, 0, pi, n = 1000)
trapezoid.composite(sin, 0, pi, n = 10)
trapezoid.composite(sin, 0, pi, n = 100)
trapezoid.composite(sin, 0, pi, n = 1000)
Now try the above functions with n=10, 100, 1000
. Explain your findings.
midpoint.composite.vectorize <- function(f, a, b, n = 10) {
points <- seq(a, b, length = n + 1)
areas <- midpoint(f, points[], points[]) # Tip: the first points[] should be the list of all a's and the second points[] should be the list of all b's
return(sum(areas))
}
trapezoid.composite.vectorize <- function(f, a, b, n = 10) {
points <- seq(a, b, length = n + 1)
areas <- trapezoid(f, points[], points[])
return(sum(areas))
}
midpoint.composite.vectorize(sin, 0, pi, n = 10)
midpoint.composite.vectorize(sin, 0, pi, n = 100)
midpoint.composite.vectorize(sin, 0, pi, n = 1000)
trapezoid.composite.vectorize(sin, 0, pi, n = 10)
trapezoid.composite.vectorize(sin, 0, pi, n = 100)
trapezoid.composite.vectorize(sin, 0, pi, n = 1000)
Now try the above vectorized functions with n=10, 100, 1000
. Explain your findings.
From William Dunlap:
“User CPU time” gives the CPU time spent by the current process (i.e., the current R session) and “system CPU time” gives the CPU time spent by the kernel (the operating system) on behalf of the current process. The operating system is used for things like opening files, doing input or output, starting other processes, and looking at the system clock: operations that involve resources that many processes must share. Different operating systems will have different things done by the operating system.
system.time(midpoint.composite(sin, 0, pi, n = 10000))
system.time(trapezoid.composite(sin, 0, pi, n = 10000))
system.time(midpoint.composite.vectorize(sin, 0, pi, n = 10000))
system.time(trapezoid.composite.vectorize(sin, 0, pi, n = 10000))
Now let’s implement the Normal equations from scratch. \(\hat{\beta} = (X^{\top}X)^{-1}X^{\top}Y\).
my.normal.equations <- function(X, Y) {
if (!is.vector(Y)) {
stop("Y is not a vector!")
}
if (!is.matrix(X)) { # force X to be a matrix for now
stop("X is not a matrix!")
}
if (dim(X)[1] != length(Y)) {
stop("Dimension mismatch between X and Y!")
}
return() # finish the calculation for beta
}
set.seed(7360)
sample.size <- 100
num.col <- 2
X <- matrix(rnorm(sample.size * num.col), nrow = sample.size, ncol = num.col)
X <- cbind(1, X)
Y <- rnorm(sample.size)
system.time(result.lm <- lm(Y ~ X[, 2] + X[, 3]))
summary(result.lm)
system.time(result.my.normal.equations <- my.normal.equations(X, Y))
result.my.normal.equations
Does your result match the estimated coefficients from the lm()
function?