3.2. Web Scraping#
Web scraping is a technique to automatically access and extract large amounts of information from a website, which can save a huge amount of time and effort. In this article, we will go through an easy example of how to automate downloading hundreds of files from the New York MTA. This is a great exercise for web scraping beginners who are looking to understand how to web scrape. We’ll be extracting information from this web page: http://web.mta.info/developers/turnstile.html
# Import libraries
import urllib.request
from bs4 import BeautifulSoup
# Specify url
urlpage = 'http://web.mta.info/developers/turnstile.html'
# Query the website and return the html to the variable 'page'
page = urllib.request.urlopen(urlpage)
# Parse the html using beautiful soup and store in variable 'soup'
soup = BeautifulSoup(page, 'html.parser')
# Print out the type of the variable 'soup'
print(type(soup))
# Find all the links on the page
soup.findAll('a', href=True)
# Let's extract all the text from the page
print(soup.text)
# Find all the links on the page
soup.findAll('a', href=True)
# Let's extract all the text from the page
print(soup.text)
<class 'bs4.BeautifulSoup'>
MTA Subway Hourly Ridership: Beginning February 2022 | State of New York
Skip to Main Content
DATA.NY.GOV
Sign In
EnglishEnglishEspañolItalianoFrançais中文Русский
Search
Search
OPEN NYCATALOGDEVELOPERSHELPVIDEO HELPSUPPORTED BROWSERSCATALOG NAVIGATIONABOUTPRESS RELEASESEXECUTIVE ORDEROPEN DATA PROGRAM OVERVIEWOPEN DATA HANDBOOKDATASET SUBMISSION GUIDEREPORTS
Menu
Menu
Close
OPEN NYCATALOGDEVELOPERSHELPVIDEO HELPSUPPORTED BROWSERSCATALOG NAVIGATIONABOUTPRESS RELEASESEXECUTIVE ORDEROPEN DATA PROGRAM OVERVIEWOPEN DATA HANDBOOKDATASET SUBMISSION GUIDEREPORTS
Sign In
Search
Language
English
Español
Italiano
Français
中文
Русский
MTA Subway Hourly Ridership: Beginning February 2022 | State of New York
Skip to Main Content
DATA.NY.GOV
Sign In
EnglishEnglishEspañolItalianoFrançais中文Русский
Search
Search
OPEN NYCATALOGDEVELOPERSHELPVIDEO HELPSUPPORTED BROWSERSCATALOG NAVIGATIONABOUTPRESS RELEASESEXECUTIVE ORDEROPEN DATA PROGRAM OVERVIEWOPEN DATA HANDBOOKDATASET SUBMISSION GUIDEREPORTS
Menu
Menu
Close
OPEN NYCATALOGDEVELOPERSHELPVIDEO HELPSUPPORTED BROWSERSCATALOG NAVIGATIONABOUTPRESS RELEASESEXECUTIVE ORDEROPEN DATA PROGRAM OVERVIEWOPEN DATA HANDBOOKDATASET SUBMISSION GUIDEREPORTS
Sign In
Search
Language
English
Español
Italiano
Français
中文
Русский