Authenticating into neo4j in a Jupyter Notebook using py2neo

I recently spend a frustrating few hours trying to replicate these examples in a Jupyter Notebook.

Every time I attempted to run this line...

graph = Graph()

... the trace would go loopy and I'd get a connection refused error. Here's how to get around this problem. 

Start Neo4j

You need a running instance of Neo4j before you even attempt to start running the code. So, either launch the desktop app or, if you prefer, launch an instance from the shell, like so,

$ neo4j start

By default, neo4j will listen on port 7474.

The authentication code

The following code can then be run inside the Notebook (or wherever) and you won't get the error I kept seeing:

from py2neo import authenticate, Graph

# set up authentication parameters
authenticate("localhost:7474", "neo4j", "Nov2015!!")

# connect to authenticated graph database
graph = Graph("http://localhost:7474/db/data/")

Use a variable in a regular expression

Sometimes you want to pass an object into a regular expression rather than explicitly state the pattern you're looking to match against.

An example of when you might want to do this is when you have a list of words and you want to iterate over a text and look for matches against the words in that list. 

Here's how this is done:

import re

subject = "In the room women come and go, talking of Michelangelo."

words = ['room', 'talking', 'Michelangelo']

for word in words:
    my_regex = r"\b(?=\w)" + re.escape(word) + r"\b(?!\w)"
    if re.search(my_regex, subject, re.IGNORECASE):
        print (word, ' found in the subject')

Login and navigate a website automatically with Selenium

Selenium is an incredible Python package for automating task in the browser. Essentially, Selenium can be used to script interaction with a website by taking control of the browser using Python. 

This example demonstrates how to complete a login form and navigate to various pages behind the login page using just a few of the many techniques available in the Selenium toolbox. 

This example assumes you already have the relevant driver installed and Selenium install (pip install selenium).

Dependencies

from selenium import webdriver

driver = webdriver.Chrome()

Target the first page 

Give the driver the starting URL and check you've landed where you should by running an assertion on the text in the title of the page:

driver.get("https://ilovefluffycats.com/authentication/signon")

assert "cats" in driver.title

Complete the username and password fields

Find the username field by its id in the HTML markup (e.g. id="uid) and the password by the name attribute (e.g. name="pwd")

username = driver.find_element_by_id("uid")
username.clear()
username.send_keys("mrcats")

password = driver.find_element_by_name("pwd")
password.clear()
password.send_keys("catskillz")

Click the login button

Now we need to submit the login credentials by clicking the submit button

driver.find_element_by_name("submitButton").click()

Click a link on the page based on the link text

This is handy where you know the text of the link you want to target, but there's no unique identifier reliably grip onto in the mark up. Here, we're simply looking for a link with the text: "Grumpy cats".

driver.find_element_by_link_text("Grumpy cats").click()

Python Technologies to Try out and Learn (February 2018)

Stuff I'm Currently Learning

NLTK - great library for dealing with text

Applying custom stopwords to a text file

Create inverted list of sentences and files

Splitting a single word into bigrams

Tokenising text files into sentences

Scikit-Learn - heavy duty machine learning library

Scikit-Learn: Cross-validated supervised text classification

Scikit-Learn: Document similarity

Scikit-Learn: Load your own data

Scikit-Learn: Supervised text classification

Django - rapid development web framework (I'm struggling with this one)

Selenium - control a browser with Python

Login and navigate a website with Selenium

Stuff that's on my list of things to learn

  • Pendulum - datetime parsing (the website for this library is gorgeous)
  • AWS Lambda - serverless computation service 

Reverse a string

This is one of those things that sounds quite simple, but seems to generate quite a lot of discussion on the best way to do it. If you're interested in diving into that discussion, take a look at this StackOverflow question and the answers. 

If, however, all you care about is actually reversing a string with Python, here's a couple of ways to do it. 

Let's say we have a string:

mystring = 'Hello my name is Daniel

Method 1

This method uses reverse() and has the benefit of being comparatively readable in comparison to Method 2.

print (''.join(reversed(mystring)))

Returns,

leinaD si eman ym olleH

Method 2

This method uses extended slice, which essentially slices one character of the end of the string in successive steps. Not as obvious as method 1, if you ask me. 

print (mystring[::-1])

Returns,

leinaD si eman ym olleH

 

 

Uniqify a Python List

I came across this really cool blog post outlining various fast ways to remove duplicate values from a Python list. 

Here's the fastest order-preserving example:

def f12(seq):
    return list(dict.fromkeys(seq))

my_list = [1,2,2,2,3,4,5,6,6,6,6]

print (f12(my_list))

Returns,

[1, 2, 3, 4, 5, 6]

This really quick solution appears to have been identified by a chap called Raymond Hettinger, so credit where it's due!

ICLR 2018 Website Redesign: Typography Sketch #1

Artboard 2

The Incorporated Council of Law Reporting for England & Wales (who I work for), are about to embark on a total redesign of their corporate website (i.e. all online assets that are not product).

ICLR's visual language has evolved to be incredibly reliant on the use of type (in particular, the Gotham typeface) and as much negative space as possible. 

This is a very quick sketch of a potential design for the new product page (i.e. the page(s) designed to explain ICLR's core product offer). The design focuses on text blocks consisting of three elements set in different weights with varying levels of character tracking.

Artboard 3.png

Write to a file

Here's how you write to a file with Python.

Let's say we have a string, myString, the contents of which we want to write to a file. 

myString = "I am going to write this string to a file with Python"

To write myString to a file, we first need to specify the file we want to write to, like so:

file = open('string.txt', 'w')

Then we use the write function (which is a built-in function) to write the string to the file:

file.write(myString)

And that's it!

 

 

 

 

f = open( 'article.txt', 'w')
f.write(first_article.text)

Find most common words in a corpus

This little sample demonstrates several basic text processing steps with a corpus of text files stored in a local directory. 

  • First, we read the corpus of text files into a list
  • Second, we knock out unwanted stuff, like things that aren't actually words and words that only consist of a single character
  • Third, we use standard NLTK stopwords and a list of custom stopwords, to strip out noise from the corpus
  • Finally, we use NLTK to calculate the most common words in each file in the corpus
from __future__ import division
import glob
from nltk.corpus import stopwords
from nltk import *
import re

# Bring in the default English NLTK stop words
stoplist = stopwords.words('english')

# Define additional stopwords in a string
additional_stopwords = """case law lawful judge judgment court mr justice would evidence mr order 
defendant act decision make two london appeal section lord one applicant mr. may could said also application whether 
made time first r miss give appellant november give fact within point sentence question upon matter 
leave part given must notice public state taken course cannot circumstances j that, offence set 
behalf however plaintiff see set say secretary regard - v claim right appeared second put e way material
view relation effect take might particular however, present court) october b reasons basis far 
referred trial found lord, land consider authority subject necessary considered 0171 see,s 
council think legal shall respect ground three case, crown without 2 relevant and, special business told clear
paragraph person account letter therefore jury th solicitor use years mrs mr provision discretion
matters respondent concerned cases defence reason issue well count argument facts gave proceedings 
position period needs approved used power us limited even either exercise counsel applicants submission
although counsel submitted st need appellants plaintiffs policy thomas making tribunal action entitled affadavit
december strand daniel transcript smith purpose refused offence offences general counts terms grounds conclusion number reasonable 
prosecution home hearing seems defendants educational clarke solicitors criminal following accept place come
already accepted required words local l;ater january provided stage report street september day sought greenwood
rather service accounts page hobhouse courts march third wilcock mind result months came learned appropriate date instructed
form division notes july went bernal official review principle consideration affidavit held lordship another dr different
notes quite royal possible instructed shorthand development amount has months wc respondents took clearly since find
satisfied members later fleet took interest parties name change information co sum ec done provisions party hd paid
"""

# Split the the additional stopwords string on each word and then add
# those words to the NLTK stopwords list
stoplist += additional_stopwords.split()

# Define the files that make up the corpus to be modelled

file_list = glob.glob(os.path.join(os.getcwd(), '/Users/danielhoadley/PycharmProjects/topicvis', '*.txt'))

# Construct an empty list into which the content of each file will be stored as a item

corpus = []

# Read the files

for file_path in file_list:
    with open(file_path) as f_input:
        content = f_input.read()
        only_words = re.sub("[^a-zA-Z]", " ", content) # Remove anything that isn't a 'word'
        no_single = re.sub(r'(?:^| )\w(?:$| )', ' ', only_words).strip() # Remove any words consisting of a single character
        corpus.append(no_single)
        f_input.close()

# Remove stopwords

texts = [[word for word in document.lower().split() if word not in stoplist] for document in corpus]

# Get the most common words in each text

for text in texts:
    fdist = FreqDist(text)
    print (fdist.most_common(2))

Create a Gensim Corpus for text files in a local directory

This snippet creates a Gensim corpus from text files stored in a local directory:

import os, gensim

def iter_documents(top_directory):
    """Iterate over all documents, yielding a document (=list of utf8 tokens) at a time."""
    for root, dirs, files in os.walk(top_directory):
        for file in filter(lambda file: file.endswith('.txt'), files):
            document = open(os.path.join(root, file)).read() # read the entire document, as one big string
            yield gensim.utils.tokenize(document, lower=True) # or whatever tokenization suits you

class MyCorpus(object):
    def __init__(self, top_dir):
        self.top_dir = top_dir
        self.dictionary = gensim.corpora.Dictionary(iter_documents(top_dir))
        self.dictionary.filter_extremes(no_below=1, keep_n=30000) # check API docs for pruning params

    def __iter__(self):
        for tokens in iter_documents(self.top_dir):
            yield self.dictionary.doc2bow(tokens)

corpus = MyCorpus('/path/to/files') # create a dictionary
for vector in corpus: # convert each document to a bag-of-word vector
    print (vector)

Fox

Fox

Strip XML tags out of file

This is a quick and dirty example of using a regular expression to remove XML tags from an an XML file. 

Suppose we have the following XML, sample.xml:

<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

What we want to do is use Python to strip out the XML element tags, so that we're left with something like this:

Tove
Jani
Reminder
Don't forget me this weekend!

Here's how to do it:

import re

text = re.sub('<[^<]+>', "", open("sample.xml").read())
with open("output.txt", "w") as f:
    f.write(text)

Using NLTK to remove stopwords from a text file

Text processing invariably requires that some words in the source corpus be removed before moving on to more complex tasks (such as keyword extraction, summarisation and topic modelling).

The sorts of words to be removed will typically include words that do not of themselves confer much semantic value (e.g. the, it, a, etc). The task in hand may also require additional, specialist words to be removed. This example uses NLTK to bring in a list of core English stopwords and then adds additional custom stopwords to the list. 

from nltk.corpus import stopwords

# Bring in the default English NLTK stop words
stoplist = stopwords.words('english')

# Define additional stopwords in a string
additional_stopwords = """case judge judgment court"""

# Split the the additional stopwords string on each word and then add
# those words to the NLTK stopwords list
stoplist += additional_stopwords.split()

# Open a file and read it into memory
file = open('sample.txt')
text = file.read()

# Apply the stoplist to the text
clean = [word for word in text.split() if word not in stoplist]

It's worth looking at a couple of discreet aspects of this code to see what's going on. 

The stoplist object is storing the NLTK English stopwords as a list:

stoplist = stopwords.words('english')

print (stoplist)

>>> ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your'...]

Then, we're adding our additional stopwords as individual tokens in a string object, additional_stopwords, and using split() to break that string down into individual tokens in a list object:

stoplist += additional_stopwords.split()

The above line of code updates the original stoplist object with the additional stopwords.

The text being passed in is a simple text file, which reads:

this is a case in which a judge sat down on a chair

When we pass the text through our generator, our output is:

print (clean)

>>> ['sat', 'chair']

 

 

 

 

Colour Board: ACIER

BOND41921587161202ACIER

I dont want no scrub

Colour Board: Delta ∆

Delta ∆ colour palette. Inspired by the logo on a beer can. The palette uses BBD9D1, D0BBB4 and D6993E. 

Artboard 2BBD9D1

Very basic CSS bonus animation (the code for which is set out below). 

Animation source code

<style>

#container {
  height: 400px;
    }
#line {
  width: 0%;
  height: 40px;
  background: #BBD9D1;
  margin: auto;
  animation: line 1s ease-in-out forwards;
  -webkit-transform-origin: 0 0;
  -moz-transform-origin: 1 5;
  transform-origin: 0 0;
  transform: rotate(-334deg);
  -moz-transform: rotate(300deg);
  -webkit-transform: rotate(-20deg);
  animation: line 5s infinite;
  margin-top: 100px;
}
@keyframes line {
    from { width: 0%; }
    to { width: 80%; }
  }

#line2 {
  width: 0%;
  height: 35px;
  background: #D0BBB4;
  margin: auto;
  animation: line 2s ease forwards;
  -webkit-transform-origin: 0 0;
  -moz-transform-origin: 1 0;
  transform-origin: 0 0;
  transform: rotate(60deg);
  animation: line 5s infinite;
}
@keyframes line2 {
    from { width: 0%; }
    to { width: 70%; }
}

#line3 {
  width: 0%;
  height: 30px;
  background: #D6993E;
  margin: auto;
  animation: line 6s ease forwards;
  -webkit-transform-origin: 0 0;
  -moz-transform-origin: 1 0;
  transform-origin: 0 0;
  transform: rotate(20deg);
  animation: line 5s infinite;
  margin-top: 60px;
}
@keyframes line3 {
    from { width: 0%; }
    to { width: 80%; }
}
</style>

<div id="container">
 
    <div id="line2"></div>  
    <div id="line3"></div>  
    <div id="line"></div> 
  
 </div>

Read a file

Reading the contents of a file in Python is straightforward and there are a couple of nice methods that cater for different use cases.

OPEN THE FILE

Suppose we want to read a file called my_text.txt. First, we open the file:

f = open('my_text.txt', 'r')

We now have the file as an object, f

READ THE ENTIRE FILE INTO A STRING

For most use cases, it's enough to simply read the entire contents of the file into a string. We can do this by using Python's read() method. 

content = f.read()
print (content)

READ THE ALL OF THE LINES IN THE FILE INTO A LIST

Sometimes, you're going to want to deal with the file you're working with at line level. Fortunately, Python's readlines() method is available. The readlines() stores each line in the file to be read as an item in a list.

content = f.readlines()

READ A SPECIFIC LINE IN THE FILE

There maybe times were you want to read a specific line in the file, which is what the readline() method can be used for. 

To access the first line in the file:

content = f.readline()

To access the second line in the file (remember Python is zero-indexed):

content = f.readline(1)

Brilliant explanation of Bayesian Probability

Bayes' theorem of probability plays a major role in artificial intelligence and machine learning. The core of the theorem is that we can make predictions about the probability of something happening based on prior knowledge of variables that might be related to that something happening.

The (simple) statement of the equation looks like this (the full proof is here):

The gist behind Bayes' theorem is easy enough to wrap your mind around, but if you want to start applying Bayesian probability in the context of machine learning and, like me, you didn't study statistics, I'd highly recommend this frankly brilliant explanation of the rule.