BeautifulSoup is an excellent Python package that makes web scraping comparatively straightforward.
Essentially, the fundamental sequence of steps is as follows:
- Define the url of the page you want to scrape
- Open the url
- Store the content of the page as an object we can do other stuff with.
from bs4 import BeautifulSoup from urllib import urlopen # Set the target url url = 'http://www.canlii.ca' # Open the url site = urlopen(url) # Grab the page contents and store it in an object called soup soup = BeautifulSoup(site, "lxml") # Find all <table> elements in the page table = soup.find_all("table") # print the table elements print table