Scraping news websites and looking for specific words and phrases

This afternoon, my colleague and Transparency Project member, Paul Magrath, told me he was interested finding out whether there's a way of systematically watching out for a set of pre-defined "trigger words" of interest to the Transparency Project in online articles published by a selection of news organisations with a nasty habit of misreporting family court proceedings. 

I thought "that's a perfect job for Python" and sat down to write a basic proof of concept for Paul to take a look at. 

The code, which is here, iterates through an RSS feed on the Daily Mail's online site, reads each article by requesting the article link for each item in the feed and checks it for a list of pre-defined triggers (currently devised around an article about Myleene Klass, of all people). The output is generated back to a CSV file for review. 

Here's the GitHub repo.