Regular expression

Use a variable in a regular expression

Sometimes you want to pass an object into a regular expression rather than explicitly state the pattern you're looking to match against.

An example of when you might want to do this is when you have a list of words and you want to iterate over a text and look for matches against the words in that list. 

Here's how this is done:

import re

subject = "In the room women come and go, talking of Michelangelo."

words = ['room', 'talking', 'Michelangelo']

for word in words:
    my_regex = r"\b(?=\w)" + re.escape(word) + r"\b(?!\w)"
    if re.search(my_regex, subject, re.IGNORECASE):
        print (word, ' found in the subject')

Regex to match upper case words

Here's a short demonstration of how to use a regular expression to identify UPPERCASE words in a bunch of text files. 

The goal in this particular snip is to open and read all of the .rtf files in a given directory and identify only the UPPERCASE words appearing in the file.

import os
import re

directory = '/path/to/files'

regex = r"\b[A-Z][A-Z]+\b"

for filename in os.listdir(directory):
    if filename.endswith(".rtf"):
        with open(filename, 'r') as f:
            transcript = f.read()
            matches = re.finditer(regex, transcript)
            for match in matches:
                print (match[0])