We will scrape NFL data in this lab.
We will follow this tutorial on scraping NFL stats (although the tutorial uses python).
We want to scrape from the webpage of https://www.pro-football-reference.com/years/2019/passing.htm
Now open html with rvest
rm(list = ls()) # clean-up workspace
Inspect the table elements and try to find the patterns you can use to make the CSS selector for fetching the rows of the table
The CSS selector reference could be helpful here.
Find the CSS selector for the title row of the table. Is it an element of thead
?
Now get all the column names passing elements inside the thead
element.
# complete your code for getting the column names
col_names
Now find the CSS selector for each row of the table.
For example, one possible rule your instructor found is to look for all the <tr> tags inside the <tbody> element.
Hint: CSS selector [target=_blank]
selects all elements with target="_blank".
What does the CSS selector [data-stat=age]
select?
# Finish your code for scraping the whole table
You may refer to the “NFL_passing_2019.csv” file which contains the scraped data by your instructor.