rm(list = ls()) # clean-up workspace

Acknowledgement

Dr. Hua Zhou’s HW assignment

CFR of COVID-19

Of primary interest to public is the risk of dying from COVID-19. A commonly used measure is case fatality rate/ratio/risk (CFR), which is defined as \[ \frac{\text{number of deaths from disease}}{\text{number of diagnosed cases of disease}}. \] Apparently CFR is not a fixed constant; it changes with time, location, and other factors. Also CFR is different from the infection fatality rate (IFR), the probability that someone infected with COVID-19 dies from it.

In this exercise, we use logistic regression to study how US county-level CFR changes according to demographic information and some health-, education-, and economy-indicators.

Data sources

Sample code for data preparation

Load the tidyverse package for data manipulation and visualization.

# tidyverse of data manipulation and visualization
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Read in the data of COVID-19 cases reported on 2021-11-02.

county_count <- read_csv("./11-02-2021.csv") %>%
  # cast fips into dbl for use as a key for joining tables
  mutate(FIPS = as.numeric(FIPS)) %>%
  filter(Country_Region == "US") %>%
  print(width = Inf)
## Rows: 4006 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): Admin2, Province_State, Country_Region, Combined_Key
## dbl  (7): FIPS, Lat, Long_, Confirmed, Deaths, Incident_Rate, Case_Fatality_...
## lgl  (2): Recovered, Active
## dttm (1): Last_Update
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 3,279 × 14
##     FIPS Admin2   Province_State Country_Region Last_Update           Lat Long_
##    <dbl> <chr>    <chr>          <chr>          <dttm>              <dbl> <dbl>
##  1  1001 Autauga  Alabama        US             2021-11-03 06:22:09  32.5 -86.6
##  2  1003 Baldwin  Alabama        US             2021-11-03 06:22:09  30.7 -87.7
##  3  1005 Barbour  Alabama        US             2021-11-03 06:22:09  31.9 -85.4
##  4  1007 Bibb     Alabama        US             2021-11-03 06:22:09  33.0 -87.1
##  5  1009 Blount   Alabama        US             2021-11-03 06:22:09  34.0 -86.6
##  6  1011 Bullock  Alabama        US             2021-11-03 06:22:09  32.1 -85.7
##  7  1013 Butler   Alabama        US             2021-11-03 06:22:09  31.8 -86.7
##  8  1015 Calhoun  Alabama        US             2021-11-03 06:22:09  33.8 -85.8
##  9  1017 Chambers Alabama        US             2021-11-03 06:22:09  32.9 -85.4
## 10  1019 Cherokee Alabama        US             2021-11-03 06:22:09  34.2 -85.6
##    Confirmed Deaths Recovered Active Combined_Key          Incident_Rate
##        <dbl>  <dbl> <lgl>     <lgl>  <chr>                         <dbl>
##  1     10271    148 NA        NA     Autauga, Alabama, US         18384.
##  2     37445    558 NA        NA     Baldwin, Alabama, US         16774.
##  3      3605     76 NA        NA     Barbour, Alabama, US         14603.
##  4      4283     89 NA        NA     Bibb, Alabama, US            19126.
##  5     10423    179 NA        NA     Blount, Alabama, US          18025.
##  6      1526     44 NA        NA     Bullock, Alabama, US         15107.
##  7      3365     96 NA        NA     Butler, Alabama, US          17303.
##  8     22341    497 NA        NA     Calhoun, Alabama, US         19666.
##  9      5787    144 NA        NA     Chambers, Alabama, US        17402.
## 10      3071     61 NA        NA     Cherokee, Alabama, US        11723.
##    Case_Fatality_Ratio
##                  <dbl>
##  1                1.44
##  2                1.49
##  3                2.11
##  4                2.08
##  5                1.72
##  6                2.88
##  7                2.85
##  8                2.22
##  9                2.49
## 10                1.99
## # … with 3,269 more rows

Standardize the variable names by changing them to lower case.

names(county_count) <- str_to_lower(names(county_count))

Sanity check by displaying the unique US states and territories:

county_count %>%
  select(province_state) %>%
  distinct() %>%
  arrange(province_state) %>%
  print(n = Inf)
## # A tibble: 59 × 1
##    province_state          
##    <chr>                   
##  1 Alabama                 
##  2 Alaska                  
##  3 American Samoa          
##  4 Arizona                 
##  5 Arkansas                
##  6 California              
##  7 Colorado                
##  8 Connecticut             
##  9 Delaware                
## 10 Diamond Princess        
## 11 District of Columbia    
## 12 Florida                 
## 13 Georgia                 
## 14 Grand Princess          
## 15 Guam                    
## 16 Hawaii                  
## 17 Idaho                   
## 18 Illinois                
## 19 Indiana                 
## 20 Iowa                    
## 21 Kansas                  
## 22 Kentucky                
## 23 Louisiana               
## 24 Maine                   
## 25 Maryland                
## 26 Massachusetts           
## 27 Michigan                
## 28 Minnesota               
## 29 Mississippi             
## 30 Missouri                
## 31 Montana                 
## 32 Nebraska                
## 33 Nevada                  
## 34 New Hampshire           
## 35 New Jersey              
## 36 New Mexico              
## 37 New York                
## 38 North Carolina          
## 39 North Dakota            
## 40 Northern Mariana Islands
## 41 Ohio                    
## 42 Oklahoma                
## 43 Oregon                  
## 44 Pennsylvania            
## 45 Puerto Rico             
## 46 Recovered               
## 47 Rhode Island            
## 48 South Carolina          
## 49 South Dakota            
## 50 Tennessee               
## 51 Texas                   
## 52 Utah                    
## 53 Vermont                 
## 54 Virgin Islands          
## 55 Virginia                
## 56 Washington              
## 57 West Virginia           
## 58 Wisconsin               
## 59 Wyoming

We want to exclude entries from American Samoa, Diamond Princess, Grand Princess, Guam, Northern Mariana Islands, Puerto Rico, Recovered, and Virgin Islands, and only consider counties from 50 states and DC.

county_count <- county_count %>%
  filter(!(province_state %in% c("American Samoa", "Diamond Princess", "Grand Princess", 
                                 "Recovered", "Guam", "Northern Mariana Islands", 
                                 "Puerto Rico", "Virgin Islands"))) %>%
  print(width = Inf)
## # A tibble: 3,192 × 14
##     fips admin2   province_state country_region last_update           lat long_
##    <dbl> <chr>    <chr>          <chr>          <dttm>              <dbl> <dbl>
##  1  1001 Autauga  Alabama        US             2021-11-03 06:22:09  32.5 -86.6
##  2  1003 Baldwin  Alabama        US             2021-11-03 06:22:09  30.7 -87.7
##  3  1005 Barbour  Alabama        US             2021-11-03 06:22:09  31.9 -85.4
##  4  1007 Bibb     Alabama        US             2021-11-03 06:22:09  33.0 -87.1
##  5  1009 Blount   Alabama        US             2021-11-03 06:22:09  34.0 -86.6
##  6  1011 Bullock  Alabama        US             2021-11-03 06:22:09  32.1 -85.7
##  7  1013 Butler   Alabama        US             2021-11-03 06:22:09  31.8 -86.7
##  8  1015 Calhoun  Alabama        US             2021-11-03 06:22:09  33.8 -85.8
##  9  1017 Chambers Alabama        US             2021-11-03 06:22:09  32.9 -85.4
## 10  1019 Cherokee Alabama        US             2021-11-03 06:22:09  34.2 -85.6
##    confirmed deaths recovered active combined_key          incident_rate
##        <dbl>  <dbl> <lgl>     <lgl>  <chr>                         <dbl>
##  1     10271    148 NA        NA     Autauga, Alabama, US         18384.
##  2     37445    558 NA        NA     Baldwin, Alabama, US         16774.
##  3      3605     76 NA        NA     Barbour, Alabama, US         14603.
##  4      4283     89 NA        NA     Bibb, Alabama, US            19126.
##  5     10423    179 NA        NA     Blount, Alabama, US          18025.
##  6      1526     44 NA        NA     Bullock, Alabama, US         15107.
##  7      3365     96 NA        NA     Butler, Alabama, US          17303.
##  8     22341    497 NA        NA     Calhoun, Alabama, US         19666.
##  9      5787    144 NA        NA     Chambers, Alabama, US        17402.
## 10      3071     61 NA        NA     Cherokee, Alabama, US        11723.
##    case_fatality_ratio
##                  <dbl>
##  1                1.44
##  2                1.49
##  3                2.11
##  4                2.08
##  5                1.72
##  6                2.88
##  7                2.85
##  8                2.22
##  9                2.49
## 10                1.99
## # … with 3,182 more rows

Graphical summarize the COVID-19 confirmed cases and deaths on 2021-11-02 by state.

county_count %>%
  # turn into long format for easy plotting
  pivot_longer(confirmed:recovered, 
               names_to = "case", 
               values_to = "count") %>%
  group_by(province_state) %>%
  ggplot() + 
  geom_col(mapping = aes(x = province_state, y = `count`, fill = `case`)) + 
  # scale_y_log10() + 
  labs(title = "US COVID-19 Situation on 2021-11-02", x = "State") + 
  theme(axis.text.x = element_text(angle = 90))
## Warning: Removed 3192 rows containing missing values (position_stack).

Read in the 2020 county-level health ranking data.

county_info <- read_csv("./us-county-health-rankings-2020.csv") %>%
  filter(!is.na(county)) %>%
  # cast fips into dbl for use as a key for joining tables
  mutate(fips = as.numeric(fips)) %>%
  select(fips, 
         state,
         county,
         percent_fair_or_poor_health, 
         percent_smokers, 
         percent_adults_with_obesity, 
         # food_environment_index,
         percent_with_access_to_exercise_opportunities, 
         percent_excessive_drinking,
         # teen_birth_rate, 
         percent_uninsured,
         # primary_care_physicians_rate,
         # preventable_hospitalization_rate,
         # high_school_graduation_rate,
         percent_some_college,
         percent_unemployed,
         percent_children_in_poverty,
         # `80th_percentile_income`,
         # `20th_percentile_income`,
         percent_single_parent_households,
         # violent_crime_rate,
         percent_severe_housing_problems,
         overcrowding,
         # life_expectancy,
         # age_adjusted_death_rate,
         percent_adults_with_diabetes,
         # hiv_prevalence_rate,
         percent_food_insecure,
         # percent_limited_access_to_healthy_foods,
         percent_insufficient_sleep,
         percent_uninsured_2,
         median_household_income,
         average_traffic_volume_per_meter_of_major_roadways,
         percent_homeowners,
         # percent_severe_housing_cost_burden,
         population_2,
         percent_less_than_18_years_of_age,
         percent_65_and_over,
         percent_black,
         percent_asian,
         percent_hispanic,
         percent_female,
         percent_rural) %>%
  print(width = Inf)
## Rows: 3193 Columns: 507
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (7): state, county, unreliable, primary_care_physicians_ratio, dentist...
## dbl (497): fips, num_deaths, years_of_potential_life_lost_rate, 95percent_ci...
## lgl   (3): presence_of_water_violation, non_petitioned_cases, petitioned_cases
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 3,142 × 30
##     fips state   county   percent_fair_or_poor_health percent_smokers
##    <dbl> <chr>   <chr>                          <dbl>           <dbl>
##  1  1001 Alabama Autauga                         20.9            18.1
##  2  1003 Alabama Baldwin                         17.5            17.5
##  3  1005 Alabama Barbour                         29.6            22.0
##  4  1007 Alabama Bibb                            19.4            19.1
##  5  1009 Alabama Blount                          21.7            19.2
##  6  1011 Alabama Bullock                         31.0            22.9
##  7  1013 Alabama Butler                          27.9            21.8
##  8  1015 Alabama Calhoun                         23.1            20.6
##  9  1017 Alabama Chambers                        24.0            19.4
## 10  1019 Alabama Cherokee                        20.7            17.5
##    percent_adults_with_obesity percent_with_access_to_exercise_opportunities
##                          <dbl>                                         <dbl>
##  1                        33.3                                         69.1 
##  2                        31                                           73.7 
##  3                        41.7                                         53.2 
##  4                        37.6                                         16.3 
##  5                        33.8                                         15.6 
##  6                        37.2                                          2.50
##  7                        43.3                                         48.6 
##  8                        38.5                                         47.7 
##  9                        40.1                                         61.9 
## 10                        35                                           33.4 
##    percent_excessive_drinking percent_uninsured percent_some_college
##                         <dbl>             <dbl>                <dbl>
##  1                       15.0              8.72                 62.0
##  2                       18.0             11.3                  67.4
##  3                       12.8             12.2                  34.9
##  4                       15.6             10.2                  44.1
##  5                       14.2             13.4                  53.4
##  6                       12.1             11.4                  35.0
##  7                       11.9             11.2                  41.7
##  8                       13.8             11.9                  59.2
##  9                       12.7             11.9                  48.5
## 10                       14.1             11.2                  51.8
##    percent_unemployed percent_children_in_poverty
##                 <dbl>                       <dbl>
##  1               3.63                        19.3
##  2               3.62                        13.9
##  3               5.17                        43.9
##  4               3.97                        27.8
##  5               3.51                        18  
##  6               4.69                        68.3
##  7               4.79                        36.3
##  8               4.65                        26.5
##  9               3.91                        30.7
## 10               3.57                        24.7
##    percent_single_parent_households percent_severe_housing_problems overcrowding
##                               <dbl>                           <dbl>        <dbl>
##  1                             26.2                            14.7        1.20 
##  2                             24.1                            13.6        1.27 
##  3                             56.6                            14.6        1.69 
##  4                             28.7                            10.5        0.255
##  5                             28.6                            10.5        1.89 
##  6                             74.8                            18.1        0.113
##  7                             52.7                            13.2        1.69 
##  8                             40.2                            13.7        1.54 
##  9                             46.6                            16.0        4.04 
## 10                             23.8                            13          1.5  
##    percent_adults_with_diabetes percent_food_insecure percent_insufficient_sleep
##                           <dbl>                 <dbl>                      <dbl>
##  1                         11.1                  13.2                       35.9
##  2                         10.7                  11.6                       33.3
##  3                         17.6                  22                         38.6
##  4                         14.5                  14.3                       38.1
##  5                         17                    10.7                       35.9
##  6                         23.7                  24.8                       45.0
##  7                         19.2                  20.6                       41.9
##  8                         17.5                  15.7                       41.3
##  9                         19.9                  17.9                       37.3
## 10                         15.2                  12.5                       35.4
##    percent_uninsured_2 median_household_income
##                  <dbl>                   <dbl>
##  1                11.1                   59338
##  2                14.3                   57588
##  3                16.1                   34382
##  4                13                     46064
##  5                17.1                   50412
##  6                15.2                   29267
##  7                14.5                   37365
##  8                15.4                   45400
##  9                15.2                   39917
## 10                13.9                   42132
##    average_traffic_volume_per_meter_of_major_roadways percent_homeowners
##                                                 <dbl>              <dbl>
##  1                                              88.5                74.9
##  2                                              87.0                73.6
##  3                                             102.                 61.4
##  4                                              29.3                75.1
##  5                                              33.4                78.6
##  6                                               4.07               75.5
##  7                                              19.3                69.9
##  8                                             110.                 69.5
##  9                                              20.3                67.8
## 10                                              25.9                79.0
##    population_2 percent_less_than_18_years_of_age percent_65_and_over
##           <dbl>                             <dbl>               <dbl>
##  1        55601                              23.7                15.6
##  2       218022                              21.6                20.4
##  3        24881                              20.9                19.4
##  4        22400                              20.5                16.5
##  5        57840                              23.2                18.2
##  6        10138                              21.1                16.4
##  7        19680                              22.2                20.3
##  8       114277                              21.6                17.7
##  9        33615                              20.8                19.5
## 10        26032                              19.2                23.0
##    percent_black percent_asian percent_hispanic percent_female percent_rural
##            <dbl>         <dbl>            <dbl>          <dbl>         <dbl>
##  1         19.3          1.22              2.97           51.4          42.0
##  2          8.78         1.15              4.65           51.5          42.3
##  3         48.0          0.454             4.28           47.2          67.8
##  4         21.1          0.237             2.62           46.8          68.4
##  5          1.46         0.320             9.57           50.7          90.0
##  6         69.5          0.187             7.96           45.5          51.4
##  7         44.6          1.32              1.51           53.4          71.2
##  8         20.9          0.964             3.91           51.9          33.7
##  9         39.6          1.33              2.56           52.1          49.1
## 10          4.24         0.338             1.62           50.5          85.7
## # … with 3,132 more rows

For stability in estimating CFR, we restrict to counties with \(\ge 5\) confirmed cases.

county_count <- county_count %>%
  filter(confirmed >= 5)

We join the COVID-19 count data and county-level information using FIPS (Federal Information Processing System) as key.

county_data <- county_count %>%
  left_join(county_info, by = "fips") %>%
  print(width = Inf)
## # A tibble: 3,157 × 43
##     fips admin2   province_state country_region last_update           lat long_
##    <dbl> <chr>    <chr>          <chr>          <dttm>              <dbl> <dbl>
##  1  1001 Autauga  Alabama        US             2021-11-03 06:22:09  32.5 -86.6
##  2  1003 Baldwin  Alabama        US             2021-11-03 06:22:09  30.7 -87.7
##  3  1005 Barbour  Alabama        US             2021-11-03 06:22:09  31.9 -85.4
##  4  1007 Bibb     Alabama        US             2021-11-03 06:22:09  33.0 -87.1
##  5  1009 Blount   Alabama        US             2021-11-03 06:22:09  34.0 -86.6
##  6  1011 Bullock  Alabama        US             2021-11-03 06:22:09  32.1 -85.7
##  7  1013 Butler   Alabama        US             2021-11-03 06:22:09  31.8 -86.7
##  8  1015 Calhoun  Alabama        US             2021-11-03 06:22:09  33.8 -85.8
##  9  1017 Chambers Alabama        US             2021-11-03 06:22:09  32.9 -85.4
## 10  1019 Cherokee Alabama        US             2021-11-03 06:22:09  34.2 -85.6
##    confirmed deaths recovered active combined_key          incident_rate
##        <dbl>  <dbl> <lgl>     <lgl>  <chr>                         <dbl>
##  1     10271    148 NA        NA     Autauga, Alabama, US         18384.
##  2     37445    558 NA        NA     Baldwin, Alabama, US         16774.
##  3      3605     76 NA        NA     Barbour, Alabama, US         14603.
##  4      4283     89 NA        NA     Bibb, Alabama, US            19126.
##  5     10423    179 NA        NA     Blount, Alabama, US          18025.
##  6      1526     44 NA        NA     Bullock, Alabama, US         15107.
##  7      3365     96 NA        NA     Butler, Alabama, US          17303.
##  8     22341    497 NA        NA     Calhoun, Alabama, US         19666.
##  9      5787    144 NA        NA     Chambers, Alabama, US        17402.
## 10      3071     61 NA        NA     Cherokee, Alabama, US        11723.
##    case_fatality_ratio state   county   percent_fair_or_poor_health
##                  <dbl> <chr>   <chr>                          <dbl>
##  1                1.44 Alabama Autauga                         20.9
##  2                1.49 Alabama Baldwin                         17.5
##  3                2.11 Alabama Barbour                         29.6
##  4                2.08 Alabama Bibb                            19.4
##  5                1.72 Alabama Blount                          21.7
##  6                2.88 Alabama Bullock                         31.0
##  7                2.85 Alabama Butler                          27.9
##  8                2.22 Alabama Calhoun                         23.1
##  9                2.49 Alabama Chambers                        24.0
## 10                1.99 Alabama Cherokee                        20.7
##    percent_smokers percent_adults_with_obesity
##              <dbl>                       <dbl>
##  1            18.1                        33.3
##  2            17.5                        31  
##  3            22.0                        41.7
##  4            19.1                        37.6
##  5            19.2                        33.8
##  6            22.9                        37.2
##  7            21.8                        43.3
##  8            20.6                        38.5
##  9            19.4                        40.1
## 10            17.5                        35  
##    percent_with_access_to_exercise_opportunities percent_excessive_drinking
##                                            <dbl>                      <dbl>
##  1                                         69.1                        15.0
##  2                                         73.7                        18.0
##  3                                         53.2                        12.8
##  4                                         16.3                        15.6
##  5                                         15.6                        14.2
##  6                                          2.50                       12.1
##  7                                         48.6                        11.9
##  8                                         47.7                        13.8
##  9                                         61.9                        12.7
## 10                                         33.4                        14.1
##    percent_uninsured percent_some_college percent_unemployed
##                <dbl>                <dbl>              <dbl>
##  1              8.72                 62.0               3.63
##  2             11.3                  67.4               3.62
##  3             12.2                  34.9               5.17
##  4             10.2                  44.1               3.97
##  5             13.4                  53.4               3.51
##  6             11.4                  35.0               4.69
##  7             11.2                  41.7               4.79
##  8             11.9                  59.2               4.65
##  9             11.9                  48.5               3.91
## 10             11.2                  51.8               3.57
##    percent_children_in_poverty percent_single_parent_households
##                          <dbl>                            <dbl>
##  1                        19.3                             26.2
##  2                        13.9                             24.1
##  3                        43.9                             56.6
##  4                        27.8                             28.7
##  5                        18                               28.6
##  6                        68.3                             74.8
##  7                        36.3                             52.7
##  8                        26.5                             40.2
##  9                        30.7                             46.6
## 10                        24.7                             23.8
##    percent_severe_housing_problems overcrowding percent_adults_with_diabetes
##                              <dbl>        <dbl>                        <dbl>
##  1                            14.7        1.20                          11.1
##  2                            13.6        1.27                          10.7
##  3                            14.6        1.69                          17.6
##  4                            10.5        0.255                         14.5
##  5                            10.5        1.89                          17  
##  6                            18.1        0.113                         23.7
##  7                            13.2        1.69                          19.2
##  8                            13.7        1.54                          17.5
##  9                            16.0        4.04                          19.9
## 10                            13          1.5                           15.2
##    percent_food_insecure percent_insufficient_sleep percent_uninsured_2
##                    <dbl>                      <dbl>               <dbl>
##  1                  13.2                       35.9                11.1
##  2                  11.6                       33.3                14.3
##  3                  22                         38.6                16.1
##  4                  14.3                       38.1                13  
##  5                  10.7                       35.9                17.1
##  6                  24.8                       45.0                15.2
##  7                  20.6                       41.9                14.5
##  8                  15.7                       41.3                15.4
##  9                  17.9                       37.3                15.2
## 10                  12.5                       35.4                13.9
##    median_household_income average_traffic_volume_per_meter_of_major_roadways
##                      <dbl>                                              <dbl>
##  1                   59338                                              88.5 
##  2                   57588                                              87.0 
##  3                   34382                                             102.  
##  4                   46064                                              29.3 
##  5                   50412                                              33.4 
##  6                   29267                                               4.07
##  7                   37365                                              19.3 
##  8                   45400                                             110.  
##  9                   39917                                              20.3 
## 10                   42132                                              25.9 
##    percent_homeowners population_2 percent_less_than_18_years_of_age
##                 <dbl>        <dbl>                             <dbl>
##  1               74.9        55601                              23.7
##  2               73.6       218022                              21.6
##  3               61.4        24881                              20.9
##  4               75.1        22400                              20.5
##  5               78.6        57840                              23.2
##  6               75.5        10138                              21.1
##  7               69.9        19680                              22.2
##  8               69.5       114277                              21.6
##  9               67.8        33615                              20.8
## 10               79.0        26032                              19.2
##    percent_65_and_over percent_black percent_asian percent_hispanic
##                  <dbl>         <dbl>         <dbl>            <dbl>
##  1                15.6         19.3          1.22              2.97
##  2                20.4          8.78         1.15              4.65
##  3                19.4         48.0          0.454             4.28
##  4                16.5         21.1          0.237             2.62
##  5                18.2          1.46         0.320             9.57
##  6                16.4         69.5          0.187             7.96
##  7                20.3         44.6          1.32              1.51
##  8                17.7         20.9          0.964             3.91
##  9                19.5         39.6          1.33              2.56
## 10                23.0          4.24         0.338             1.62
##    percent_female percent_rural
##             <dbl>         <dbl>
##  1           51.4          42.0
##  2           51.5          42.3
##  3           47.2          67.8
##  4           46.8          68.4
##  5           50.7          90.0
##  6           45.5          51.4
##  7           53.4          71.2
##  8           51.9          33.7
##  9           52.1          49.1
## 10           50.5          85.7
## # … with 3,147 more rows

Numerical summaries of each variable:

summary(county_data)
##       fips          admin2          province_state     country_region    
##  Min.   : 1001   Length:3157        Length:3157        Length:3157       
##  1st Qu.:18592   Class :character   Class :character   Class :character  
##  Median :29187   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :30842                                                           
##  3rd Qu.:46006                                                           
##  Max.   :90053                                                           
##  NA's   :10                                                              
##   last_update                       lat            long_        
##  Min.   :2021-11-03 06:22:09   Min.   :19.60   Min.   :-174.16  
##  1st Qu.:2021-11-03 06:22:09   1st Qu.:34.68   1st Qu.: -98.14  
##  Median :2021-11-03 06:22:09   Median :38.37   Median : -90.30  
##  Mean   :2021-11-03 06:22:09   Mean   :38.45   Mean   : -92.17  
##  3rd Qu.:2021-11-03 06:22:09   3rd Qu.:41.83   3rd Qu.: -83.43  
##  Max.   :2021-11-03 06:22:09   Max.   :69.31   Max.   : -67.63  
##                                NA's   :33      NA's   :33       
##    confirmed           deaths        recovered       active       
##  Min.   :      8   Min.   :    0.0   Mode:logical   Mode:logical  
##  1st Qu.:   1620   1st Qu.:   28.0   NA's:3157      NA's:3157     
##  Median :   3921   Median :   66.0                                
##  Mean   :  14557   Mean   :  235.7                                
##  3rd Qu.:  10240   3rd Qu.:  162.0                                
##  Max.   :1495014   Max.   :26661.0                                
##                                                                   
##  combined_key       incident_rate   case_fatality_ratio    state          
##  Length:3157        Min.   : 1962   Min.   :   0.000    Length:3157       
##  Class :character   1st Qu.:12824   1st Qu.:   1.193    Class :character  
##  Mode  :character   Median :15077   Median :   1.632    Mode  :character  
##                     Mean   :14994   Mean   :   2.902                      
##                     3rd Qu.:17153   3rd Qu.:   2.184                      
##                     Max.   :54277   Max.   :2829.070                      
##                     NA's   :33                                            
##     county          percent_fair_or_poor_health percent_smokers 
##  Length:3157        Min.   : 8.121              Min.   : 5.909  
##  Class :character   1st Qu.:14.361              1st Qu.:14.989  
##  Mode  :character   Median :17.261              Median :16.989  
##                     Mean   :17.975              Mean   :17.525  
##                     3rd Qu.:20.953              3rd Qu.:19.766  
##                     Max.   :40.991              Max.   :41.491  
##                     NA's   :43                  NA's   :43      
##  percent_adults_with_obesity percent_with_access_to_exercise_opportunities
##  Min.   :12.4                Min.   :  0.00                               
##  1st Qu.:29.3                1st Qu.: 48.47                               
##  Median :33.1                Median : 65.80                               
##  Mean   :32.9                Mean   : 62.74                               
##  3rd Qu.:36.6                3rd Qu.: 79.99                               
##  Max.   :57.7                Max.   :100.00                               
##  NA's   :43                  NA's   :48                                   
##  percent_excessive_drinking percent_uninsured percent_some_college
##  Min.   : 7.81              Min.   : 2.263    Min.   : 15.18      
##  1st Qu.:15.34              1st Qu.: 7.381    1st Qu.: 49.79      
##  Median :17.58              Median :10.553    Median : 57.93      
##  Mean   :17.55              Mean   :11.471    Mean   : 57.84      
##  3rd Qu.:19.68              3rd Qu.:14.470    3rd Qu.: 66.47      
##  Max.   :28.62              Max.   :33.750    Max.   :100.00      
##  NA's   :43                 NA's   :43        NA's   :43          
##  percent_unemployed percent_children_in_poverty
##  Min.   : 1.302     Min.   : 2.50              
##  1st Qu.: 3.126     1st Qu.:14.60              
##  Median : 3.875     Median :20.10              
##  Mean   : 4.130     Mean   :21.17              
##  3rd Qu.: 4.818     3rd Qu.:26.40              
##  Max.   :19.904     Max.   :68.30              
##  NA's   :43         NA's   :43                 
##  percent_single_parent_households percent_severe_housing_problems
##  Min.   : 0.00                    Min.   : 3.22                  
##  1st Qu.:25.63                    1st Qu.:11.01                  
##  Median :31.71                    Median :13.33                  
##  Mean   :32.46                    Mean   :13.87                  
##  3rd Qu.:37.74                    3rd Qu.:15.93                  
##  Max.   :87.20                    Max.   :70.89                  
##  NA's   :44                       NA's   :43                     
##   overcrowding    percent_adults_with_diabetes percent_food_insecure
##  Min.   : 0.000   Min.   : 1.80                Min.   : 2.90        
##  1st Qu.: 1.231   1st Qu.: 9.30                1st Qu.:10.60        
##  Median : 1.878   Median :11.60                Median :12.75        
##  Mean   : 2.415   Mean   :12.14                Mean   :13.25        
##  3rd Qu.: 2.840   3rd Qu.:14.60                3rd Qu.:15.20        
##  Max.   :51.585   Max.   :34.10                Max.   :36.30        
##  NA's   :43       NA's   :43                   NA's   :43           
##  percent_insufficient_sleep percent_uninsured_2 median_household_income
##  Min.   :23.03              Min.   : 2.683      Min.   : 25385         
##  1st Qu.:30.10              1st Qu.: 8.537      1st Qu.: 43650         
##  Median :33.01              Median :12.481      Median : 50514         
##  Mean   :33.07              Mean   :13.586      Mean   : 52725         
##  3rd Qu.:36.13              3rd Qu.:17.429      3rd Qu.: 58741         
##  Max.   :46.71              Max.   :42.397      Max.   :140382         
##  NA's   :43                 NA's   :43          NA's   :43             
##  average_traffic_volume_per_meter_of_major_roadways percent_homeowners
##  Min.   :   0.00                                    Min.   :19.61     
##  1st Qu.:  26.92                                    1st Qu.:67.54     
##  Median :  57.97                                    Median :72.58     
##  Mean   : 129.63                                    Mean   :71.43     
##  3rd Qu.: 123.28                                    3rd Qu.:77.00     
##  Max.   :4496.41                                    Max.   :92.40     
##  NA's   :43                                         NA's   :43        
##   population_2      percent_less_than_18_years_of_age percent_65_and_over
##  Min.   :     152   Min.   : 7.069                    Min.   : 4.83      
##  1st Qu.:   11043   1st Qu.:20.025                    1st Qu.:16.30      
##  Median :   26096   Median :22.051                    Median :18.93      
##  Mean   :  104770   Mean   :22.038                    Mean   :19.28      
##  3rd Qu.:   68348   3rd Qu.:23.840                    3rd Qu.:21.80      
##  Max.   :10105518   Max.   :41.992                    Max.   :57.59      
##  NA's   :43         NA's   :43                        NA's   :43         
##  percent_black     percent_asian     percent_hispanic  percent_female 
##  Min.   : 0.0000   Min.   : 0.0000   Min.   : 0.6105   Min.   :26.84  
##  1st Qu.: 0.7283   1st Qu.: 0.4651   1st Qu.: 2.3948   1st Qu.:49.43  
##  Median : 2.2841   Median : 0.7381   Median : 4.3525   Median :50.32  
##  Mean   : 9.0652   Mean   : 1.5707   Mean   : 9.6658   Mean   :49.89  
##  3rd Qu.:10.3587   3rd Qu.: 1.4350   3rd Qu.: 9.9949   3rd Qu.:51.03  
##  Max.   :85.4143   Max.   :43.3570   Max.   :96.3595   Max.   :56.87  
##  NA's   :43        NA's   :43        NA's   :43        NA's   :43     
##  percent_rural   
##  Min.   :  0.00  
##  1st Qu.: 33.25  
##  Median : 59.48  
##  Mean   : 58.58  
##  3rd Qu.: 87.67  
##  Max.   :100.00  
##  NA's   :49

List rows in county_data that don’t have a match in county_count:

county_data %>%
  filter(is.na(state) & is.na(county)) %>%
  print(n = Inf)
## # A tibble: 43 × 43
##     fips admin2   province_state country_region last_update           lat  long_
##    <dbl> <chr>    <chr>          <chr>          <dttm>              <dbl>  <dbl>
##  1  2063 Chugach  Alaska         US             2021-11-03 06:22:09  61.2 -150. 
##  2  2066 Copper … Alaska         US             2021-11-03 06:22:09  60.4 -163. 
##  3 90002 Unassig… Alaska         US             2021-11-03 06:22:09  NA     NA  
##  4 90005 Unassig… Arkansas       US             2021-11-03 06:22:09  NA     NA  
##  5 90006 Unassig… California     US             2021-11-03 06:22:09  NA     NA  
##  6 90008 Unassig… Colorado       US             2021-11-03 06:22:09  NA     NA  
##  7 90009 Unassig… Connecticut    US             2021-11-03 06:22:09  NA     NA  
##  8 90010 Unassig… Delaware       US             2021-11-03 06:22:09  NA     NA  
##  9 90012 Unassig… Florida        US             2021-11-03 06:22:09  NA     NA  
## 10 80013 Out of … Georgia        US             2021-11-03 06:22:09  NA     NA  
## 11 90013 Unassig… Georgia        US             2021-11-03 06:22:09  NA     NA  
## 12 80015 Out of … Hawaii         US             2021-11-03 06:22:09  NA     NA  
## 13 80017 Out of … Illinois       US             2021-11-03 06:22:09  NA     NA  
## 14 90017 Unassig… Illinois       US             2021-11-03 06:22:09  NA     NA  
## 15 90019 Unassig… Iowa           US             2021-11-03 06:22:09  NA     NA  
## 16 90021 Unassig… Kentucky       US             2021-11-03 06:22:09  NA     NA  
## 17 90022 Unassig… Louisiana      US             2021-11-03 06:22:09  NA     NA  
## 18    NA Dukes a… Massachusetts  US             2021-11-03 06:22:09  41.4  -70.7
## 19 90025 Unassig… Massachusetts  US             2021-11-03 06:22:09  NA     NA  
## 20    NA Federal… Michigan       US             2021-11-03 06:22:09  NA     NA  
## 21    NA Michiga… Michigan       US             2021-11-03 06:22:09  NA     NA  
## 22 80026 Out of … Michigan       US             2021-11-03 06:22:09  NA     NA  
## 23 90026 Unassig… Michigan       US             2021-11-03 06:22:09  NA     NA  
## 24 90027 Unassig… Minnesota      US             2021-11-03 06:22:09  NA     NA  
## 25    NA Kansas … Missouri       US             2021-11-03 06:22:09  39.1  -94.6
## 26 90031 Unassig… Nebraska       US             2021-11-03 06:22:09  NA     NA  
## 27 90033 Unassig… New Hampshire  US             2021-11-03 06:22:09  NA     NA  
## 28 90034 Unassig… New Jersey     US             2021-11-03 06:22:09  NA     NA  
## 29 90035 Unassig… New Mexico     US             2021-11-03 06:22:09  NA     NA  
## 30 90036 Unassig… New York       US             2021-11-03 06:22:09  NA     NA  
## 31 90040 Unassig… Oklahoma       US             2021-11-03 06:22:09  NA     NA  
## 32 90044 Unassig… Rhode Island   US             2021-11-03 06:22:09  NA     NA  
## 33 80047 Out of … Tennessee      US             2021-11-03 06:22:09  NA     NA  
## 34 90047 Unassig… Tennessee      US             2021-11-03 06:22:09  NA     NA  
## 35    NA Bear Ri… Utah           US             2021-11-03 06:22:09  41.5 -113. 
## 36    NA Central… Utah           US             2021-11-03 06:22:09  39.4 -112. 
## 37    NA Southea… Utah           US             2021-11-03 06:22:09  39.0 -111. 
## 38    NA Southwe… Utah           US             2021-11-03 06:22:09  37.9 -111. 
## 39    NA TriCoun… Utah           US             2021-11-03 06:22:09  40.1 -110. 
## 40 90049 Unassig… Utah           US             2021-11-03 06:22:09  NA     NA  
## 41    NA Weber-M… Utah           US             2021-11-03 06:22:09  41.3 -112. 
## 42 90050 Unassig… Vermont        US             2021-11-03 06:22:09  NA     NA  
## 43 90053 Unassig… Washington     US             2021-11-03 06:22:09  NA     NA  
## # … with 36 more variables: confirmed <dbl>, deaths <dbl>, recovered <lgl>,
## #   active <lgl>, combined_key <chr>, incident_rate <dbl>,
## #   case_fatality_ratio <dbl>, state <chr>, county <chr>,
## #   percent_fair_or_poor_health <dbl>, percent_smokers <dbl>,
## #   percent_adults_with_obesity <dbl>,
## #   percent_with_access_to_exercise_opportunities <dbl>,
## #   percent_excessive_drinking <dbl>, percent_uninsured <dbl>, …

We found there are some rows that miss fips.

county_count %>%
  filter(is.na(fips)) %>%
  select(fips, admin2, province_state) %>%
  print(n = Inf)
## # A tibble: 10 × 3
##     fips admin2                                    province_state
##    <dbl> <chr>                                     <chr>         
##  1    NA Dukes and Nantucket                       Massachusetts 
##  2    NA Federal Correctional Institution (FCI)    Michigan      
##  3    NA Michigan Department of Corrections (MDOC) Michigan      
##  4    NA Kansas City                               Missouri      
##  5    NA Bear River                                Utah          
##  6    NA Central Utah                              Utah          
##  7    NA Southeast Utah                            Utah          
##  8    NA Southwest Utah                            Utah          
##  9    NA TriCounty                                 Utah          
## 10    NA Weber-Morgan                              Utah

We need to (1) manually set the fips for some counties, (2) discard those Unassigned, unassigned or Out of, and (3) try to join with county_info again.

county_data <- county_count %>%
  # manually set FIPS for some counties
  mutate(fips = ifelse(admin2 == "Dukes and Nantucket" & province_state == "Massachusetts", 25019, fips)) %>% 
  mutate(fips = ifelse(admin2 == "Weber-Morgan" & province_state == "Utah", 49057, fips)) %>%
  # remove variable `recovered` and `active` because they are just columns of NAs
  mutate(recovered = NULL, active = NULL) %>%
  filter(!(is.na(fips) | str_detect(admin2, "Out of") | str_detect(admin2, "Unassigned"))) %>%
  left_join(county_info, by = "fips") %>%
  drop_na() %>%
  print(width = Inf)
## # A tibble: 3,109 × 41
##     fips admin2   province_state country_region last_update           lat long_
##    <dbl> <chr>    <chr>          <chr>          <dttm>              <dbl> <dbl>
##  1  1001 Autauga  Alabama        US             2021-11-03 06:22:09  32.5 -86.6
##  2  1003 Baldwin  Alabama        US             2021-11-03 06:22:09  30.7 -87.7
##  3  1005 Barbour  Alabama        US             2021-11-03 06:22:09  31.9 -85.4
##  4  1007 Bibb     Alabama        US             2021-11-03 06:22:09  33.0 -87.1
##  5  1009 Blount   Alabama        US             2021-11-03 06:22:09  34.0 -86.6
##  6  1011 Bullock  Alabama        US             2021-11-03 06:22:09  32.1 -85.7
##  7  1013 Butler   Alabama        US             2021-11-03 06:22:09  31.8 -86.7
##  8  1015 Calhoun  Alabama        US             2021-11-03 06:22:09  33.8 -85.8
##  9  1017 Chambers Alabama        US             2021-11-03 06:22:09  32.9 -85.4
## 10  1019 Cherokee Alabama        US             2021-11-03 06:22:09  34.2 -85.6
##    confirmed deaths combined_key          incident_rate case_fatality_ratio
##        <dbl>  <dbl> <chr>                         <dbl>               <dbl>
##  1     10271    148 Autauga, Alabama, US         18384.                1.44
##  2     37445    558 Baldwin, Alabama, US         16774.                1.49
##  3      3605     76 Barbour, Alabama, US         14603.                2.11
##  4      4283     89 Bibb, Alabama, US            19126.                2.08
##  5     10423    179 Blount, Alabama, US          18025.                1.72
##  6      1526     44 Bullock, Alabama, US         15107.                2.88
##  7      3365     96 Butler, Alabama, US          17303.                2.85
##  8     22341    497 Calhoun, Alabama, US         19666.                2.22
##  9      5787    144 Chambers, Alabama, US        17402.                2.49
## 10      3071     61 Cherokee, Alabama, US        11723.                1.99
##    state   county   percent_fair_or_poor_health percent_smokers
##    <chr>   <chr>                          <dbl>           <dbl>
##  1 Alabama Autauga                         20.9            18.1
##  2 Alabama Baldwin                         17.5            17.5
##  3 Alabama Barbour                         29.6            22.0
##  4 Alabama Bibb                            19.4            19.1
##  5 Alabama Blount                          21.7            19.2
##  6 Alabama Bullock                         31.0            22.9
##  7 Alabama Butler                          27.9            21.8
##  8 Alabama Calhoun                         23.1            20.6
##  9 Alabama Chambers                        24.0            19.4
## 10 Alabama Cherokee                        20.7            17.5
##    percent_adults_with_obesity percent_with_access_to_exercise_opportunities
##                          <dbl>                                         <dbl>
##  1                        33.3                                         69.1 
##  2                        31                                           73.7 
##  3                        41.7                                         53.2 
##  4                        37.6                                         16.3 
##  5                        33.8                                         15.6 
##  6                        37.2                                          2.50
##  7                        43.3                                         48.6 
##  8                        38.5                                         47.7 
##  9                        40.1                                         61.9 
## 10                        35                                           33.4 
##    percent_excessive_drinking percent_uninsured percent_some_college
##                         <dbl>             <dbl>                <dbl>
##  1                       15.0              8.72                 62.0
##  2                       18.0             11.3                  67.4
##  3                       12.8             12.2                  34.9
##  4                       15.6             10.2                  44.1
##  5                       14.2             13.4                  53.4
##  6                       12.1             11.4                  35.0
##  7                       11.9             11.2                  41.7
##  8                       13.8             11.9                  59.2
##  9                       12.7             11.9                  48.5
## 10                       14.1             11.2                  51.8
##    percent_unemployed percent_children_in_poverty
##                 <dbl>                       <dbl>
##  1               3.63                        19.3
##  2               3.62                        13.9
##  3               5.17                        43.9
##  4               3.97                        27.8
##  5               3.51                        18  
##  6               4.69                        68.3
##  7               4.79                        36.3
##  8               4.65                        26.5
##  9               3.91                        30.7
## 10               3.57                        24.7
##    percent_single_parent_households percent_severe_housing_problems overcrowding
##                               <dbl>                           <dbl>        <dbl>
##  1                             26.2                            14.7        1.20 
##  2                             24.1                            13.6        1.27 
##  3                             56.6                            14.6        1.69 
##  4                             28.7                            10.5        0.255
##  5                             28.6                            10.5        1.89 
##  6                             74.8                            18.1        0.113
##  7                             52.7                            13.2        1.69 
##  8                             40.2                            13.7        1.54 
##  9                             46.6                            16.0        4.04 
## 10                             23.8                            13          1.5  
##    percent_adults_with_diabetes percent_food_insecure percent_insufficient_sleep
##                           <dbl>                 <dbl>                      <dbl>
##  1                         11.1                  13.2                       35.9
##  2                         10.7                  11.6                       33.3
##  3                         17.6                  22                         38.6
##  4                         14.5                  14.3                       38.1
##  5                         17                    10.7                       35.9
##  6                         23.7                  24.8                       45.0
##  7                         19.2                  20.6                       41.9
##  8                         17.5                  15.7                       41.3
##  9                         19.9                  17.9                       37.3
## 10                         15.2                  12.5                       35.4
##    percent_uninsured_2 median_household_income
##                  <dbl>                   <dbl>
##  1                11.1                   59338
##  2                14.3                   57588
##  3                16.1                   34382
##  4                13                     46064
##  5                17.1                   50412
##  6                15.2                   29267
##  7                14.5                   37365
##  8                15.4                   45400
##  9                15.2                   39917
## 10                13.9                   42132
##    average_traffic_volume_per_meter_of_major_roadways percent_homeowners
##                                                 <dbl>              <dbl>
##  1                                              88.5                74.9
##  2                                              87.0                73.6
##  3                                             102.                 61.4
##  4                                              29.3                75.1
##  5                                              33.4                78.6
##  6                                               4.07               75.5
##  7                                              19.3                69.9
##  8                                             110.                 69.5
##  9                                              20.3                67.8
## 10                                              25.9                79.0
##    population_2 percent_less_than_18_years_of_age percent_65_and_over
##           <dbl>                             <dbl>               <dbl>
##  1        55601                              23.7                15.6
##  2       218022                              21.6                20.4
##  3        24881                              20.9                19.4
##  4        22400                              20.5                16.5
##  5        57840                              23.2                18.2
##  6        10138                              21.1                16.4
##  7        19680                              22.2                20.3
##  8       114277                              21.6                17.7
##  9        33615                              20.8                19.5
## 10        26032                              19.2                23.0
##    percent_black percent_asian percent_hispanic percent_female percent_rural
##            <dbl>         <dbl>            <dbl>          <dbl>         <dbl>
##  1         19.3          1.22              2.97           51.4          42.0
##  2          8.78         1.15              4.65           51.5          42.3
##  3         48.0          0.454             4.28           47.2          67.8
##  4         21.1          0.237             2.62           46.8          68.4
##  5          1.46         0.320             9.57           50.7          90.0
##  6         69.5          0.187             7.96           45.5          51.4
##  7         44.6          1.32              1.51           53.4          71.2
##  8         20.9          0.964             3.91           51.9          33.7
##  9         39.6          1.33              2.56           52.1          49.1
## 10          4.24         0.338             1.62           50.5          85.7
## # … with 3,099 more rows

Summarize again

summary(county_data)
##       fips          admin2          province_state     country_region    
##  Min.   : 1001   Length:3109        Length:3109        Length:3109       
##  1st Qu.:18179   Class :character   Class :character   Class :character  
##  Median :29163   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :30326                                                           
##  3rd Qu.:45051                                                           
##  Max.   :56045                                                           
##   last_update                       lat            long_        
##  Min.   :2021-11-03 06:22:09   Min.   :19.60   Min.   :-174.16  
##  1st Qu.:2021-11-03 06:22:09   1st Qu.:34.65   1st Qu.: -98.07  
##  Median :2021-11-03 06:22:09   Median :38.35   Median : -90.21  
##  Mean   :2021-11-03 06:22:09   Mean   :38.40   Mean   : -92.01  
##  3rd Qu.:2021-11-03 06:22:09   3rd Qu.:41.80   3rd Qu.: -83.40  
##  Max.   :2021-11-03 06:22:09   Max.   :69.31   Max.   : -67.63  
##    confirmed           deaths        combined_key       incident_rate  
##  Min.   :     19   Min.   :    0.0   Length:3109        Min.   : 1962  
##  1st Qu.:   1644   1st Qu.:   28.0   Class :character   1st Qu.:12829  
##  Median :   3941   Median :   67.0   Mode  :character   Median :15074  
##  Mean   :  14650   Mean   :  229.9                      Mean   :14998  
##  3rd Qu.:  10275   3rd Qu.:  164.0                      3rd Qu.:17168  
##  Max.   :1495014   Max.   :26661.0                      Max.   :54277  
##  case_fatality_ratio    state              county         
##  Min.   :0.000       Length:3109        Length:3109       
##  1st Qu.:1.209       Class :character   Class :character  
##  Median :1.641       Mode  :character   Mode  :character  
##  Mean   :1.768                                            
##  3rd Qu.:2.189                                            
##  Max.   :7.628                                            
##  percent_fair_or_poor_health percent_smokers  percent_adults_with_obesity
##  Min.   : 8.121              Min.   : 5.909   Min.   :12.4               
##  1st Qu.:14.361              1st Qu.:14.987   1st Qu.:29.3               
##  Median :17.260              Median :16.985   Median :33.1               
##  Mean   :17.968              Mean   :17.508   Mean   :32.9               
##  3rd Qu.:20.950              3rd Qu.:19.755   3rd Qu.:36.6               
##  Max.   :40.991              Max.   :41.491   Max.   :57.7               
##  percent_with_access_to_exercise_opportunities percent_excessive_drinking
##  Min.   :  0.00                                Min.   : 7.81             
##  1st Qu.: 48.52                                1st Qu.:15.34             
##  Median : 65.82                                Median :17.58             
##  Mean   : 62.79                                Mean   :17.54             
##  3rd Qu.: 80.09                                3rd Qu.:19.68             
##  Max.   :100.00                                Max.   :28.62             
##  percent_uninsured percent_some_college percent_unemployed
##  Min.   : 2.263    Min.   :15.18        Min.   : 1.302    
##  1st Qu.: 7.376    1st Qu.:49.80        1st Qu.: 3.120    
##  Median :10.528    Median :57.93        Median : 3.873    
##  Mean   :11.452    Mean   :57.85        Mean   : 4.117    
##  3rd Qu.:14.445    3rd Qu.:66.47        3rd Qu.: 4.814    
##  Max.   :33.750    Max.   :90.67        Max.   :18.092    
##  percent_children_in_poverty percent_single_parent_households
##  Min.   : 2.50               Min.   : 0.00                   
##  1st Qu.:14.60               1st Qu.:25.62                   
##  Median :20.10               Median :31.70                   
##  Mean   :21.15               Mean   :32.44                   
##  3rd Qu.:26.40               3rd Qu.:37.70                   
##  Max.   :68.30               Max.   :87.20                   
##  percent_severe_housing_problems  overcrowding    percent_adults_with_diabetes
##  Min.   : 3.22                   Min.   : 0.000   Min.   : 1.80               
##  1st Qu.:11.01                   1st Qu.: 1.230   1st Qu.: 9.30               
##  Median :13.32                   Median : 1.877   Median :11.60               
##  Mean   :13.84                   Mean   : 2.391   Mean   :12.15               
##  3rd Qu.:15.92                   3rd Qu.: 2.840   3rd Qu.:14.60               
##  Max.   :60.26                   Max.   :38.058   Max.   :34.10               
##  percent_food_insecure percent_insufficient_sleep percent_uninsured_2
##  Min.   : 2.90         Min.   :23.03              Min.   : 2.683     
##  1st Qu.:10.60         1st Qu.:30.11              1st Qu.: 8.530     
##  Median :12.70         Median :33.01              Median :12.460     
##  Mean   :13.24         Mean   :33.07              Mean   :13.564     
##  3rd Qu.:15.20         3rd Qu.:36.13              3rd Qu.:17.383     
##  Max.   :36.30         Max.   :46.71              Max.   :42.397     
##  median_household_income average_traffic_volume_per_meter_of_major_roadways
##  Min.   : 25385          Min.   :   0.00                                   
##  1st Qu.: 43650          1st Qu.:  26.94                                   
##  Median : 50525          Median :  58.08                                   
##  Mean   : 52737          Mean   : 129.85                                   
##  3rd Qu.: 58742          3rd Qu.: 123.43                                   
##  Max.   :140382          Max.   :4496.41                                   
##  percent_homeowners  population_2      percent_less_than_18_years_of_age
##  Min.   :19.61      Min.   :     277   Min.   : 7.069                   
##  1st Qu.:67.55      1st Qu.:   11113   1st Qu.:20.027                   
##  Median :72.59      Median :   26158   Median :22.050                   
##  Mean   :71.44      Mean   :  105013   Mean   :22.028                   
##  3rd Qu.:77.01      3rd Qu.:   68557   3rd Qu.:23.838                   
##  Max.   :92.40      Max.   :10105518   Max.   :41.992                   
##  percent_65_and_over percent_black     percent_asian     percent_hispanic 
##  Min.   : 4.83       Min.   : 0.0000   Min.   : 0.0000   Min.   : 0.6105  
##  1st Qu.:16.30       1st Qu.: 0.7307   1st Qu.: 0.4654   1st Qu.: 2.3926  
##  Median :18.93       Median : 2.3228   Median : 0.7383   Median : 4.3532  
##  Mean   :19.29       Mean   : 9.0803   Mean   : 1.5709   Mean   : 9.6791  
##  3rd Qu.:21.81       3rd Qu.:10.3662   3rd Qu.: 1.4353   3rd Qu.:10.0066  
##  Max.   :57.59       Max.   :85.4143   Max.   :43.3570   Max.   :96.3595  
##  percent_female  percent_rural   
##  Min.   :26.84   Min.   :  0.00  
##  1st Qu.:49.43   1st Qu.: 33.15  
##  Median :50.32   Median : 59.45  
##  Mean   :49.90   Mean   : 58.54  
##  3rd Qu.:51.03   3rd Qu.: 87.30  
##  Max.   :56.87   Max.   :100.00

If there are variables with missing value for many counties, we go back and remove those variables from consideration.

Let’s create a final data frame for analysis.

county_data <- county_data %>%
  mutate(state = as.factor(state)) %>%
  select(county, confirmed, deaths, state, percent_fair_or_poor_health:percent_rural)
summary(county_data)
##     county            confirmed           deaths             state     
##  Length:3109        Min.   :     19   Min.   :    0.0   Texas   : 253  
##  Class :character   1st Qu.:   1644   1st Qu.:   28.0   Georgia : 159  
##  Mode  :character   Median :   3941   Median :   67.0   Virginia: 133  
##                     Mean   :  14650   Mean   :  229.9   Kentucky: 120  
##                     3rd Qu.:  10275   3rd Qu.:  164.0   Missouri: 115  
##                     Max.   :1495014   Max.   :26661.0   Kansas  : 105  
##                                                         (Other) :2224  
##  percent_fair_or_poor_health percent_smokers  percent_adults_with_obesity
##  Min.   : 8.121              Min.   : 5.909   Min.   :12.4               
##  1st Qu.:14.361              1st Qu.:14.987   1st Qu.:29.3               
##  Median :17.260              Median :16.985   Median :33.1               
##  Mean   :17.968              Mean   :17.508   Mean   :32.9               
##  3rd Qu.:20.950              3rd Qu.:19.755   3rd Qu.:36.6               
##  Max.   :40.991              Max.   :41.491   Max.   :57.7               
##                                                                          
##  percent_with_access_to_exercise_opportunities percent_excessive_drinking
##  Min.   :  0.00                                Min.   : 7.81             
##  1st Qu.: 48.52                                1st Qu.:15.34             
##  Median : 65.82                                Median :17.58             
##  Mean   : 62.79                                Mean   :17.54             
##  3rd Qu.: 80.09                                3rd Qu.:19.68             
##  Max.   :100.00                                Max.   :28.62             
##                                                                          
##  percent_uninsured percent_some_college percent_unemployed
##  Min.   : 2.263    Min.   :15.18        Min.   : 1.302    
##  1st Qu.: 7.376    1st Qu.:49.80        1st Qu.: 3.120    
##  Median :10.528    Median :57.93        Median : 3.873    
##  Mean   :11.452    Mean   :57.85        Mean   : 4.117    
##  3rd Qu.:14.445    3rd Qu.:66.47        3rd Qu.: 4.814    
##  Max.   :33.750    Max.   :90.67        Max.   :18.092    
##                                                           
##  percent_children_in_poverty percent_single_parent_households
##  Min.   : 2.50               Min.   : 0.00                   
##  1st Qu.:14.60               1st Qu.:25.62                   
##  Median :20.10               Median :31.70                   
##  Mean   :21.15               Mean   :32.44                   
##  3rd Qu.:26.40               3rd Qu.:37.70                   
##  Max.   :68.30               Max.   :87.20                   
##                                                              
##  percent_severe_housing_problems  overcrowding    percent_adults_with_diabetes
##  Min.   : 3.22                   Min.   : 0.000   Min.   : 1.80               
##  1st Qu.:11.01                   1st Qu.: 1.230   1st Qu.: 9.30               
##  Median :13.32                   Median : 1.877   Median :11.60               
##  Mean   :13.84                   Mean   : 2.391   Mean   :12.15               
##  3rd Qu.:15.92                   3rd Qu.: 2.840   3rd Qu.:14.60               
##  Max.   :60.26                   Max.   :38.058   Max.   :34.10               
##                                                                               
##  percent_food_insecure percent_insufficient_sleep percent_uninsured_2
##  Min.   : 2.90         Min.   :23.03              Min.   : 2.683     
##  1st Qu.:10.60         1st Qu.:30.11              1st Qu.: 8.530     
##  Median :12.70         Median :33.01              Median :12.460     
##  Mean   :13.24         Mean   :33.07              Mean   :13.564     
##  3rd Qu.:15.20         3rd Qu.:36.13              3rd Qu.:17.383     
##  Max.   :36.30         Max.   :46.71              Max.   :42.397     
##                                                                      
##  median_household_income average_traffic_volume_per_meter_of_major_roadways
##  Min.   : 25385          Min.   :   0.00                                   
##  1st Qu.: 43650          1st Qu.:  26.94                                   
##  Median : 50525          Median :  58.08                                   
##  Mean   : 52737          Mean   : 129.85                                   
##  3rd Qu.: 58742          3rd Qu.: 123.43                                   
##  Max.   :140382          Max.   :4496.41                                   
##                                                                            
##  percent_homeowners  population_2      percent_less_than_18_years_of_age
##  Min.   :19.61      Min.   :     277   Min.   : 7.069                   
##  1st Qu.:67.55      1st Qu.:   11113   1st Qu.:20.027                   
##  Median :72.59      Median :   26158   Median :22.050                   
##  Mean   :71.44      Mean   :  105013   Mean   :22.028                   
##  3rd Qu.:77.01      3rd Qu.:   68557   3rd Qu.:23.838                   
##  Max.   :92.40      Max.   :10105518   Max.   :41.992                   
##                                                                         
##  percent_65_and_over percent_black     percent_asian     percent_hispanic 
##  Min.   : 4.83       Min.   : 0.0000   Min.   : 0.0000   Min.   : 0.6105  
##  1st Qu.:16.30       1st Qu.: 0.7307   1st Qu.: 0.4654   1st Qu.: 2.3926  
##  Median :18.93       Median : 2.3228   Median : 0.7383   Median : 4.3532  
##  Mean   :19.29       Mean   : 9.0803   Mean   : 1.5709   Mean   : 9.6791  
##  3rd Qu.:21.81       3rd Qu.:10.3662   3rd Qu.: 1.4353   3rd Qu.:10.0066  
##  Max.   :57.59       Max.   :85.4143   Max.   :43.3570   Max.   :96.3595  
##                                                                           
##  percent_female  percent_rural   
##  Min.   :26.84   Min.   :  0.00  
##  1st Qu.:49.43   1st Qu.: 33.15  
##  Median :50.32   Median : 59.45  
##  Mean   :49.90   Mean   : 58.54  
##  3rd Qu.:51.03   3rd Qu.: 87.30  
##  Max.   :56.87   Max.   :100.00  
## 

Display the 10 counties with highest CFR.

county_data %>%
  mutate(cfr = deaths / confirmed) %>%
  select(county, state, confirmed, deaths, cfr) %>%
  arrange(desc(cfr)) %>%
  top_n(10)
## Selecting by cfr
## # A tibble: 10 × 5
##    county   state        confirmed deaths    cfr
##    <chr>    <fct>            <dbl>  <dbl>  <dbl>
##  1 Sabine   Texas              957     73 0.0763
##  2 Hancock  Georgia           1121     81 0.0723
##  3 McMullen Texas              117      8 0.0684
##  4 Harding  New Mexico          44      3 0.0682
##  5 Knox     Texas              351     21 0.0598
##  6 Jerauld  South Dakota       301     17 0.0565
##  7 Motley   Texas              161      9 0.0559
##  8 Candler  Georgia           1560     87 0.0558
##  9 Twiggs   Georgia           1084     60 0.0554
## 10 Foard    Texas              181     10 0.0552

Write final data into a csv file for future use.

write_csv(county_data, "covid19-county-data-20211102.csv.gz")

1

Read and run above code to generate a data frame county_data that includes county-level COVID-19 confirmed cases and deaths, demographic, and health related information.

2

What assumptions of CFR might be violated by defining CFR as deaths/confirmed from this data set? With acknowledgement of these severe limitations, we continue to use deaths/confirmed as a very rough proxy of CFR.

3

What assumptions of logistic regression may be violated by this data set?

4

Run a binomial regression, using variables state, …, percent_rural as predictors.

5

Interpret the regression coefficients of 3 significant predictors with p-value <0.01.

6

Apply analysis of deviance to (1) evaluate the goodness of fit of the model and (2) compare the model to the intercept-only model.

7

Perform analysis of deviance to evaluate the significance of each predictor. Display the 10 most significant predictors.

8

Construct confidence intervals of regression coefficients.

9

Plot the deviance residuals against the fitted values. Are there potential outliers?

10

Plot the half-normal plot. Are there potential outliers in predictor space?

11

Find the best sub-model using the AIC criterion.

12

Find the best sub-model using the lasso with cross validation.