BeautifulSoup: a very simple example

BeautifulSoup is an excellent Python package that makes web scraping comparatively straightforward.

Essentially, the fundamental sequence of steps is as follows:

  1. Define the url of the page you want to scrape
  2. Open the url
  3. Store the content of the page as an object we can do other stuff with.

For example,

from bs4 import BeautifulSoup
from urllib import urlopen

# Set the target url
url = 'http://www.canlii.ca'

# Open the url
site = urlopen(url)

# Grab the page contents and store it in an object called soup
soup = BeautifulSoup(site, "lxml")

# Find all <table> elements in the page
table = soup.find_all("table")

# print the table elements
print table