Web Scraping

3.2. Web Scraping#

Web scraping is a technique to automatically access and extract large amounts of information from a website, which can save a huge amount of time and effort. In this article, we will go through an easy example of how to automate downloading hundreds of files from the New York MTA. This is a great exercise for web scraping beginners who are looking to understand how to web scrape. We’ll be extracting information from this web page: http://web.mta.info/developers/turnstile.html

# Import libraries
import urllib.request
from bs4 import BeautifulSoup

# Specify url
urlpage =  'http://web.mta.info/developers/turnstile.html'

# Query the website and return the html to the variable 'page'
page = urllib.request.urlopen(urlpage)

# Parse the html using beautiful soup and store in variable 'soup'
soup = BeautifulSoup(page, 'html.parser')

# Print out the type of the variable 'soup'
print(type(soup))

# Find all the links on the page
soup.findAll('a', href=True)

# Let's extract all the text from the page
print(soup.text)

# Find all the links on the page
soup.findAll('a', href=True)

# Let's extract all the text from the page
print(soup.text)
<class 'bs4.BeautifulSoup'>
























MTA Subway Hourly Ridership: Beginning February 2022 | State of New York










        Skip to Main Content
      





DATA.NY.GOV



Sign In




EnglishEnglishEspañolItalianoFrançais中文Русский












Search


          Search
      








 
 
 
 




OPEN NYCATALOGDEVELOPERSHELPVIDEO HELPSUPPORTED BROWSERSCATALOG NAVIGATIONABOUTPRESS RELEASESEXECUTIVE ORDEROPEN DATA PROGRAM OVERVIEWOPEN DATA HANDBOOKDATASET SUBMISSION GUIDEREPORTS




            Menu
          




Menu

Close


OPEN NYCATALOGDEVELOPERSHELPVIDEO HELPSUPPORTED BROWSERSCATALOG NAVIGATIONABOUTPRESS RELEASESEXECUTIVE ORDEROPEN DATA PROGRAM OVERVIEWOPEN DATA HANDBOOKDATASET SUBMISSION GUIDEREPORTS

Sign In






Search










 
 
 
 




    Language
    


English
Español
Italiano
Français
中文
Русский

























 




 
 


 
 


















































MTA Subway Hourly Ridership: Beginning February 2022 | State of New York










        Skip to Main Content
      





DATA.NY.GOV



Sign In




EnglishEnglishEspañolItalianoFrançais中文Русский












Search


          Search
      








 
 
 
 




OPEN NYCATALOGDEVELOPERSHELPVIDEO HELPSUPPORTED BROWSERSCATALOG NAVIGATIONABOUTPRESS RELEASESEXECUTIVE ORDEROPEN DATA PROGRAM OVERVIEWOPEN DATA HANDBOOKDATASET SUBMISSION GUIDEREPORTS




            Menu
          




Menu

Close


OPEN NYCATALOGDEVELOPERSHELPVIDEO HELPSUPPORTED BROWSERSCATALOG NAVIGATIONABOUTPRESS RELEASESEXECUTIVE ORDEROPEN DATA PROGRAM OVERVIEWOPEN DATA HANDBOOKDATASET SUBMISSION GUIDEREPORTS

Sign In






Search










 
 
 
 




    Language
    


English
Español
Italiano
Français
中文
Русский