Web-scraping

Find child elements using BeautifulSoup

Suppose you're attempting to scrape a slab of HTML that looks a bit like this:

<tr class="oddRow">
  <td>
  <a href="/ukpga/2018/21/contents/enacted">Domestic Gas and Electricity (Tariff Cap) Act 2018</a>
  </td>
  <td>
    <a href="/ukpga/2018/21/contents/enacted">2018 c. 21</a>
  </td>
  <td>UK Public General Acts</td>
</tr>
<tr>
  <td>
    <a href="/ukpga/2018/20/contents/enacted">Northern Ireland Budget Act 2018</a>
  </td>
  <td>
    <a href="/ukpga/2018/20/contents/enacted">2018 c. 20</a>
  </td>

The bit you're looking to scrape is contained in <a> tag that sits as a child of the <td> tag, i.e. Northern Ireland Budget Act 2018.

Now, for all you know, there are going to be <a> elements all over the page, many of which you have no interest in. Because of this, something like stuff = soup.find_all('a') is no good.

What you really need to do is limit your scrape to only those <a> tags that have a <td> tags as its parent.

Here's how you do it:

td = soup.find_all('td') # Find all the td elements on the page

    for i in td:  

        # call .findChildren() on each item in the td list

        children = i.findChildren("a" , recursive=True)

        # Iterate over the list of children calling accessing the .text attribute on each child

        for child in children:
            what_i_want = child.text