Development

Query Neo4j graph database with Requests

This snippet provides a very quick example of how to send a very simple CYPHER query to Neo4j with Python:

import requests

url = "http://localhost:7474/db/data/cypher"

payload = "{ \"query\" : \"MATCH p=()-[r:CONSIDERED_BY]->() RETURN p LIMIT 25\",\n  \"params\" : { } }"
headers = {
    'authorization': "Basic bmVvNGo6Zm9vYmFy",
    'cache-control': "no-cache"
    }

response = requests.request("POST", url, data=payload, headers=headers)

print(response.text)

Note that the 'authorization': "Basic bmVvNGo6Zm9vYmFy" bit in the headers {} section is a Base64 encoded representation of the username and password of my local Neo4j instance: neo4j:foobar

You can encode your own Neo4j username:password combination here: https://www.base64encode.org/

Connect to FTP site with Python

I was recently having a complete nightmare connecting to a FTP site using the FileZilla client and wanted to write a quick Python script to test the connection myself. 

Thanks to the ftplib module that comes with Python, a simple test was possible in only four lines of code.

Here's an example:

from ftplib import FTP

ftp = FTP('hostname.goes.here')
ftp.login(user='username', passwd='password')

# Get a list of the directories on the FTP site to test the connection work
ftp.retrlines('LIST') 

The console will then print the structure of the FTP site's root folder.

 

Redis: How to set the path to redis.sock

I was experimenting with a Python program the other day, which had me pulling my hair out. The program was simple enough, it’s purpose being to trace retweets of a given tweet via the Twitter API. 

For efficiency, the program connected to a Redis database within which followers of each user on the retweet trace path were stored, thereby allowing the program to jump straight into crawling next followers on subsequent executions.

The connection to the Redis instance was managed like this:

red = redis.Redis(unix_socket_path="/tmp/redis.sock")

I installed Redis on my Mac, together with the redis Python module. Everything until this point had gone swimmingly. 

I started an instance of Redis with $ redid-server and ran the Python program to take it for a test drive. The program started to work it’s magic, but failed half way through with an error saying that the path to unixsocket was incorrect and that redis.sock could not be found. 

After quite a bit of Googling around (and, probably only half reading the stuff that seemed remotely relevant to the issue), I resorted to some pretty crappy attempts to overcome the problem, all of which were hopeless. Then, I had a rummage in the Redis root directory and stumbled across the redis.conf file and the solution to the problem was revealed. If you’re having a similar problem, I hope this helps:

  1. Open redis.conf in a text editor.
  2. Scroll down the file until you get to the section dealing with Unix Sockets (around line 95 of the file in my version of Redis).
  3. At or around lines 101 and 102, you should see the following:
# unixsocket /tmp/redis.sock
# unixsocketperm 700

4. Uncomment both of these lines, so that they look like this:

unixsocket /tmp/redis.sock
unixsocketperm 700

5. Save the changes and exit.

6. Go to the terminal and start a new Redis instance with the following command:

$ redis-server redis.conf 

You should see this as the Redis server boots up:

32151:M 25 Jan 11:24:27.268 * The server is now ready to accept connections at /tmp/redis.sock

Success!

 

 

Converting XML to CSV

XMLutils is a neat little Python package for converting XML to various file formats like CSV and JSON. The particularly useful thing about is that it can be executed from the command line, which makes it quick and easy to start using. 

Installation

I installed XMLutils at the command line using:

sudo easy_install XMLutils

Using XMLutils at the command line

I had a sample XML file that I wanted to convert to CSV format. There was a lot in the XML file that I didn't really want going into the output CSV file, so I executed the following command:

$ xml2csv --input "/Users/danielhoadley/Library/Mobile Documents/com~apple~CloudDocs/Documents/Development/Stuff/Dockets/10519[DK]Davies_v_Davies_(msb)(sub-bpw).xml" --output "test.csv" --tag "CaseInfo" --ignore "CaseMain" "AllNCit" "AllECLI" "TempIxCardNo" "FullReportName" "AltName" "CaseJoint_IxCardNo_TempIxCardNo_FullReportName_CaseName" "LegalTopics" "Reportability"
  • xml2csv invokes the converter
  • Declare the input XML file with --input followed by the path to the file
  • Declare the output CSV file with --output followed by the path to output file
  • Declare the XML node that represents a record in the input file (in my case, the node was CaseInfo)

Running a command like this will be sufficient to do a straight conversion to CSV:

$ xml2csv --input "/Users/danielhoadley/Library/Mobile Documents/com~apple~CloudDocs/Documents/Development/Stuff/Dockets/10519[DK]Davies_v_Davies_(msb)(sub-bpw).xml" --output "test.csv" --tag "CaseInfo"

However, as I've said, there was quite a lot in the input file that I wanted to ignore. Ignoring tags is pretty straightforward: simply declare the tags you want to ignore after that --ignore flag, e.g:

--ignore "CaseMain" "AllNCit" "AllECLI" "TempIxCardNo" "FullReportName" "AltName" "CaseJoint_IxCardNo_TempIxCardNo_FullReportName_CaseName" "LegalTopics" "Reportability"

Note

Remember to enclose the names of tags in quotes!

Convert text files to lower case in Python

When mining text it often helps to convert the entire text to lower case as part of your pre-processing stage. Here's an example of a Python script that does just that with a directory of files consisting of one or many text files:

import os
from itertools import chain
from glob import glob

directory = '/path/to/the/directory'

for filename in os.listdir(directory):
    if filename.endswith(".txt"):
        f = open(filename, 'r')
        text = f.read()
        
        lines = [text.lower() for line in filename]
        with open(filename, 'w') as out:
            out.writelines(lines)

Iterating over files with Python

A short block of code to demonstrate how to iterate over files in a directory and do some action with them.

import os
directory = 'the/directory/you/want/to/use'

for filename in os.listdir(directory):
    if filename.endswith(".txt"):
        f = open(filename)
        lines = f.read()
        print (lines[10])
        continue
    else:
    continue

For example,

import gensim
import os
import logging
from gensim.summarization import summarize

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) 

directory = 'the/directory/you/want/to/use'

for filename in os.listdir(directory):
    if filename.endswith(".txt"):
        f = open(filename, 'r')
        text = f.read()
        print (summarize(text, word_count=20))
        continue
    else:
    continue

Note: something I've found with this is that you need to run the program from the directory defined in the directory variable (which is something I need to fix).

MongoDB and Node.js example

The following code provides a brief reference example of reading data from a MongoDB database using Node.js. 

Prerequisties

The following is already in place:

  • I've created a database in MongoDB called beetleJuice
  • Within the beetleJuice database, I've created a collection called bugs
  • The mongo dependency has been saved to the project directory
  • I have an instance of MongoDB running on port 27017
// Bring in the MongoDB dependency

var MongoClient = require('mongodb').MongoClient, assert = require('assert');

// Connect to the database

MongoClient.connect('mongodb://localhost:27017/beetleJuice', function (err, db) {
    
    assert.equal(null, err);
    
    // assign the bugs collection to var col
    
    var col = db.collection('bugs');
    
    // use the findOne method to search for a document where assignee is set to Daniel Hoadley
    
    col.findOne({"assignee" : "Daniel Hoadley"}, function (err, doc) {
        
        assert.equal(null, err);
        
    // Print the resulting document to the console
        
        console.log("Here is my doc: %j", doc);
        
    // Close the connection to the database
        
        db.close();
    })
})

Express Generator

The ExpressJS generator is really handy for quickly standing up the scaffold of a basic web application. 

To quickly get started, see the Getting Started guide.

Quick launch

1. Make sure the Express generator is installed:

sudo npm install express-generator -g

2. Run the generator (note, the default view engine is Pug (formerly, Jade).

 express --view=jade appname

3. Change into the directory created for the app, e.g:

cd appname

4. Install the dependencies in the app directory

npm install

5. Run the app.js file with node or nodemon

 

 

Convert JSON to CSV with plain Javascript

I've been exploring the excellent API provided by the Canadian case law database, CanLII, and needed to quickly convert the JSON I was puling back to CSV. 

The following code, which has been tailored to suit the structure of the data coming back from the API, got the job done:

// Include dependencies

var json2csv = require('json2csv');
var fs = require('fs');

// Set up columns in the CSV
var fields = ['databaseId', 'caseId.en', 'title', 'citation'];

// Give it the data

var cases = [
    {
        "databaseId": "csc-scc",
        "caseId": {
        "en": "2016scc56"
    },
        "title": "Canada (Attorney General) v. Fairmont Hotels Inc.",
        "citation": "2016 SCC 56 (CanLII)"
}, 
    {
        "databaseId": "csc-scc",
        "caseId": {
        "en": "2016scc55"
    },
        "title": "Jean Coutu Group (PJC) Inc. v. Canada (Attorney General)",
        "citation": "2016 SCC 55 (CanLII)"
    }
];

var csv = json2csv({ data: cases, fields: fields });

fs.writeFile('cases.csv', csv, function(err) {
    if (err) throw err;
        console.log('Case list converted and saved as CSV!!');
    }
);

Prettifying JSON

Useful resources for pretty-printing and uglifying JSON at the command line can be found here

A particularly useful method is to use Ruby's ppjson.

To use ppjson:

1. Install ppjson at the command line:

gem install ppjson

2. To pretty print the JSON and write the prettified version back to the file (as opposed to having it merely pour into the terminal console), run:

ppjson -fi abc123.json