Case law network graph

I recently posted a few images of a network graph I built with Neo4j depicting the connections between English cases. This article serves as a quick write up on how the graph database and the visualisations where produced.

graph_cluster_distant.png

Data

The data driving the network graph was derived from a subset of XML versions of cases reported by the Incorporated Council of Law Reporting for England and Wales. I used a simple Python script to iterate over the files and capture (a) the citation (e.g. [2010] 1 WLR 1) associated with the file -- source; and (b) all of the citations to other cases within this file -- each outward citation from the source is the target. This was pulled into CSV format, like so:

Source,Target
[2015] 1 WLR 3238,[2015] AC 129
[2015] 1 WLR 3238,[2013] 1 WLR 366
[2015] 1 WLR 3238,[2011] 1 WLR 980

In the snippet of data above, [2015] 1 WLR 3238 can be seen to have CITED three cases, [2015] AC 129, [2013] 1 WLR 366 and [2011] 1 WLR 980. Moreover, [2015] AC 129 can be seen to have been CITED_BY [2015] 1 WLR 3238.

Importing the data into Neo4J

The data was imported into Neo4j with the following CYPHER query:

USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:///citings.csv" AS row
MERGE (c:Case {Name:toString(row.Source)})
MERGE (d:Case {Name:toString(row.Target)})
MERGE (c) -[:CITED]-> (d)
MERGE (d) -[:CITED_BY] -> (c)

The query above is a standard import query that created a node (:Case) for each unique citation in the source data and then constructed two relationships, :CITED and :CITED_BY between each node where these relationships existed.

View of the a small portion of the graph from the Neo4j browser

View of the a small portion of the graph from the Neo4j browser

Calculating the transitive importance of the cases in the graph

With the graph pretty much built, I wanted to get a sense of the most important case in the graph and the PageRank algorithm was used to achieve this:

CALL algo.pageRank('Case', 'CITED_BY',{write: true, writeProperty:'pagerank'})

This stored each case's PageRank as a property, pagerank, on the case node.

It was then possible to identify the ten most important cases in the network by running:

MATCH (c:Case) 
RETURN c.Name, c.pagerank 
ORDER BY c.pagerank DESC LIMIT 10

Which returned:

c.Name,c.pagerank
[2014] 3 WLR 535,15.561027
[2016] Bus LR 1337,13.3335
[2009] 3 WLR 369,11.5683645
[2000] 1 WLR 2068,11.149255000000002
[2009] 3 WLR 351,10.952590499999998
[1996] 1 WLR 1460,10.657869999999999
[2002] 2 WLR 578,9.848398000000001
[2000] 3 WLR 1855,9.2526755
[2005] 1 WLR 2668,8.36525
[2005] 3 WLR 1320,7.990162000000001

Visualising the graph

To render the graph in the browser, I used [neovis.js][1]. The code for the browser render:

<html>
    <head>
        <title>DataViz</title>
        <style type="text/css">
            body {font-family: 'Gotham' !important}
            #viz {
                width: 900px;
                height: 700px;
            }
        </style>
        <script src="https://rawgit.com/neo4j-contrib/neovis.js/master/dist/neovis.js"></script>
    </head>   
    <script>
        function draw() {
            var config = {
                container_id: "viz",
                server_url: "bolt://localhost:7687",
                server_user: "beans",
                server_password: "sausages",
                labels: {
                    "Case": {
                        caption: "Name",
                        size: "pagerank",
                    }
                },
                relationships: {
                    "CITED_BY": {
                        caption: false,                           
                 }
                },
                initial_cypher: "MATCH p=(:Case)-[:CITED]->(:Case) RETURN p LIMIT 5000"
            }
            var viz = new NeoVis.default(config);
            viz.render();
        }
    </script>
    <body onload="draw()">
        <div id="viz"></div>
    </body>
</html>
Visualisation with neovis.js

Visualisation with neovis.js

To add colour to the various groups of cases in the graph, I used a hacky implementation of the label propogation community detection algorithm (I say hacky, because I didn't set any seed labels).

CALL algo.labelPropagation('Case', 'CITED_BY','OUTGOING',
  {iterations:10,partitionProperty:'partition', write:true})
YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, write, partitionProperty;

The neovis.js could then by updated with a "community" attribute to generate different colours for each community of cases:

<html>
    <head>
        <title>DataViz</title>
        <style type="text/css">
            body {font-family: 'Gotham' !important}
            #viz {
                width: 900px;
                height: 700px;
            }
        </style>
        <script src="https://rawgit.com/neo4j-contrib/neovis.js/master/dist/neovis.js"></script>
    </head>   
    <script>
        function draw() {
            var config = {
                container_id: "viz",
                server_url: "bolt://localhost:7687",
                server_user: "sausages",
                server_password: "beans",
                labels: {
                    "Case": {
                        caption: "Name",
                        size: "pagerank",
                        community: "partition"
                    }
                },
                relationships: {
                    "CITED_BY": {
                        caption: false,    
                    }
                },
                initial_cypher: "MATCH p=(:Case)-[:CITED]->(:Case) RETURN p LIMIT 5000"
            }
            var viz = new NeoVis.default(config);
            viz.render();
        }
    </script>
    <body onload="draw()">
        <div id="viz"></div>
    </body>
</html>