rm(list = ls()) # clean-up workspace
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.5 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.0.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(faraway)
The question concerns data from a case-control study of esophageal cancer in Ileet-Vilaine, France. The data is distributed with R and may be obtained along with a description of the variables by:
data(esoph)
help(esoph)
Comment on the relationships seen in the plots.
lmod <- glm(chd ~ height + cigs, family = binomial, wcgs)
gdf <- wcgs %>%
mutate(residuals = residuals(lmod), linpred = predict(lmod)) %>%
group_by(cigs) %>%
summarise(residuals = mean(residuals), count = n())
gdf %>%
ggplot(mapping = aes(x = cigs, y = residuals, size = sqrt(count))) +
geom_point() +
theme_bw()
Use AIC as a criterion to select a model using the step
function. Which model is selected?
All three factors are ordered and so special contrasts have been used approriate for ordered factors involving linear, quadratic and cubic terms. Further simplification of the model may be possible by eliminating some of these terms. Use the unclass
function to convert the factors to a numerical representation and check whether the model may be simplified.
Does your final model fit the data? Is the test you make accurate for this data?
Check for outlier in your final model.
What is the predicted effect of moving one category higher in alcohol consumption?
Compute a 95% confidence interval for this predicted effect.