Legal Information


Artboard 2

As someone involved in the ongoing development of an online legal research system (the ICLR's ICLR.3 platform), I spend quite a bit of time thinking about the ways in which unstructured or partially structured legal texts can be enriched and brought to order, either to prepare the text for later processing in a content delivery pipeline or for some other form of data analysis. 

More often than not, rendering a text amenable for content delivery or data analysis involves a fair amount of wrangling with the text itself to markup entities of interest and to apply an overall schematic structure to document.

Legal publishers, such as ICLR, Justis, LexisNexis and Thomson Reuters use industrial-strength proprietary tools and teams of people to wrangle unstructured legal material into a form that can be used in their products and services. However, the pool of individuals and companies interested in leveraging legal texts has exploded well beyond a handful of well-established legal publishers. 

In my opinion, the more people playing with legal information and sharing their work the better. So, I've started development on my very first open source project to produce a suite of tools, written in Python, that can be used to perform a wide range of legal text enrichment operations. I call the project Blackstone.


The idea behind Blackstone is relatively simple: it should be easier to perform a standard set of extraction and enrichment tasks without first having to write custom code to get the job done. The objective of the library is to provide a free set of tools that can be used to:

  • Automatically segment the input text into sentences and mark them up

  • Identify and markup references to primary and secondary legislation

  • Identify and markup references to case law

  • Identify and markup axioms (e.g. where the author of the text postulates that such and such is an "established principle of law" etc)

  • Identify other types of entities peculiar to legal writing, such as courts, indictment numbers

  • Produce document level metrics, providing an overview of the document's structure, characteristics and content

  • Generation of visualisations

  • Other stuff I haven't thought of yet

Crucially, Blackstone is not intended to be a standalone service. Rather, the intention is to provide a suite of ready-baked Python tools that can be used out of the box in other development or data science pipelines. 

As an open source library, Blackstone stands on the shoulders of world-class, open Python technologies: spaCy, scikit-learn, BeautifulSoup, pandas, requests and, of course, Python's own standard library. Blackstone couples intuitive high-level abstractions of these underlying technologies with custom built constructs designed specifically to deal with legal content.

Progress and horizon

The plan is to get an initial Beta release out on GitHub and PyPi by the end of September 2018. To date, the following progress has been made:

  • Function to provide high-level abstraction over spaCy sentence segmentation (testing)

  • Function to assemble comprehensive list of UK statutes (complete)

  • Function to detect and markup primary legislation by reference to short title (complete)

  • Function to detect and markup primary legislation by reference to abbreviation (e.g. DPA or DPA 1998) (testing)

  • Function to resolve oblique references to primary legislation (e.g. the 1998 Act) (developing).

Once I've got a baseline level of functionality completed, I'll release the code on GitHub. More updates to follow.

If you'd like to get involved, share an idea or give me some help, drop me a line on Twitter.

Part 3: Open Access To English Case Law (The Raw Data)

I started writing in the spring of this year about the state of open access to case law in the UK, with a particular focus on judgments given in the courts of England and Wales. 

The gist of my assessment of the state of open access to judgments via the British open law apparatus is set out here, but boils down to:

  • Innovation in the open case law space in the UK is stuck in the mud
  • BAILII is lagging behind comparable projects taking place elsewhere in the common law world: CanLII and CaseText are excellent examples of what's possible.
  • Insufficient focus, if any, is being directed to improving open access to English case law.

In a subsequent article, I explored the value in providing open and free online access to the decisions of judges. I identified four bases upon which open access can be shown to be a worthwhile endeavour: (i) the promotion of the rule of law; (ii) equality of arms, particularly for self-represented litigants; (iii) legal dispute reduction; and (iv) transparency.

In the same article, I developed a rough and ready definition of what "open access to case law":

"Open access to case law" isn't a "thing", it's a goal. The goal, at least to my mind, boils down to providing access that is free at the point of delivery to the text of every judgment given in every case by every court of record (i.e. every court with the power to give judgments that have the potential to be binding on lower and co-ordinate courts) in the jurisdiction.

My overriding concern is that a significant number of judgments do not make their way to BAILII and are only accessible to paying subscribers of subscription databases, effectively creating a "have and have nots" scenario where comprehensive access to the decisions of judges depends on the ability to pay for it. The gaps in BAILII's coverage were discussed in this article.

In this article I go deeper into exploring how big the gaps are in BAILII's coverage when compared to the coverage of judgments provided by three subscription-based research platforms: JustisOne, LexisLibrary and WestlawUK. 


The aim of the study was gather data on the coverage provided by BAILII, JustisOne, LexisLibrary and WestlawUK of judgments given in the following courts between 2007 and 2017:

  • Administrative and Divisional Court
  • Chancery Division
  • Court of Appeal (Civil Division)
  • Court of Appeal (Criminal Division)
  • Commercial Court
  • Court of Protection
  • Family Court
  • Family Division
  • Patents Court
  • Queen's Bench Division
  • Technology and Construction Court


The way in which year-on-year counts of judgments given in a given court are handled by each of the four platforms varies from platform to platform. Accordingly, the following method was devised to extract the data from each platform:


BAILII provides an interface to browse its various databases. Within each database, it is possible to isolate a court and a year. The page for a given year of a given court sets out a list of the judgments for that year.

Each judgment appears in the underlying HTML as a list element (<li> ... </li>). For example,

<li><a href="/ew/cases/EWCA/Crim/2017/17.html">Abi-Khalil &amp; Anor, R v </a><a title="Link to BAILII version" href="/ew/cases/EWCA/Crim/2017/17.html">[2017] EWCA Crim 17</a> (13 January 2017)</li>

A count of the total number of each <li> ... </li> on each pages yields the total count of judgments.

Justisone, lexislibrary & westlawuk

The three subscriber platforms were approached differently. A list of search strategies based on the neutral citation for each court was constructed.

For example, to query judgments given in the Criminal Division of the Court of Appeal in 2017, the following query was constructed:

2017 ewca crim

A query for each court and each year was constructed and then submitted by the platform's "citation" search field. The total number of judgments yielded by the query was extracted by capturing the count of results from the platform's underlying HTML.

The Data

The data captured is available here in raw form. The code used to generate the visualisation in this article is available here as a Jupyter Notebook.

annual coverage by publisher

The following graph provides an overview of the annual coverage for all of the courts studied by publisher. The following points leap out of graph:

  • BAILII's coverage of judgments is far lower than that provided by the three subscription-based platforms, running on a rough average of between 2,500-3,000 judgments per year.
  • Save for a drop in LexisLibrary's favour in 2011, JustisOne consistently provides the most comprehensive coverage of judgments.
  • From 2012, Lexis has closely tracked JustisOne's coverage
  • There is a sharp and sudden proportional drop in coverage from 2014 across all four platforms.

The key takeaway from this graph is that a significant number of judgments never make it onto BAILII every year.


The following graph provides an alternative view of the same data. 


total coverage of court by publisher

This graph provides an overview of how each publisher fares in terms of coverage of the courts included in the study. By and large, there is a health degree of parity in coverage of the following courts across all four publishers:

  • Chancery Division
  • Commercial Court
  • Court of Protection
  • Family Court
  • Family Division
  • Technology and Construction Courts

However, BAILII is struggling to keep up with the levels of comprehensiveness provided by the commercial publishers in the Administrative Court, both divisions of the Court of Appeal and the Queen's Bench Division. 

The dearth in coverage of judgments from the Criminal Division on BAILII is especially startling, particularly given rise numbers of criminal defendants lacking representation at the sentencing stage. Intuitively (though I have not confirmed this), the deficit in BAILII's coverage of the Criminal Division will almost certainly be judgments following an appeal against sentence. 


(Interim) Conclusion

The data shows that BAILII is providing partial access to the overall corpus of judgments handed down in the courts studied. This, as I have previously been at pains to stress, is not down to any failing on BAILII's part. Rather, it is a symptom of how hopeless existing systems (such as they are) are at servicing BAILII with a comprehensive flow of cases to publish, particularly judgments given extempore. 

It also bears saying that the commercial publishers do not in any way obstruct BAILII from acquiring the material. A fuller discussion of the mechanics driving the problem will appear here soon.

Open access to English case law (a Primer)


  • Innovation in the open case law space in the UK is stuck in the mud
  • BAILII is lagging behind comparable projects taking place elsewhere in the common law world: CanLII and CaseText are excellent examples of what's possible.
  • Insufficient focus, if any, is being directed to improving open access to English case law

There is a tsunami of innovation happening in the legal space right now. The problem is, so far as I can tell, none of it is being directed towards improving the way the decisions of judges in the English courts are made accessible to the wider public. 

Innovation in the pursuit of achieving broader, more intuitive and freer access to English case law has laid stagnant for at least five years. It is true that the United Kingdom has BAILII and nothing that follows in this series of blog posts is intended to take anything away from how important BAILII is or how successful it has been in opening access to the decisions of judges. However, BAILII (through no fault of its own) has been unable to keep pace with the levels of really positive innovation I've observed in similar projects taking place outside the UK (notably BAILII's Canadian equivalent, CanLII, and the US freemium/premium case law platform, CaseText). 

Open access to case law in the United Kingdom suffers from the following weaknesses (this list is by no means exhaustive):

  1. Gaps in coverage: there are too many gaps in the legacy case law archive and there are too many gaps in ongoing coverage of new judgments, especially those that are given extempore. There is still a vast amount of retrospective and prospective material that can only be accessed via paid subscription services.
  2. User-friendliness: BAILII is simple enough to use if you're used to researching the law online, but there is a considerable amount that could be done to improve the service for the benefit of lay users. 
  3. Sustainability: plenty of people use BAILII, but very few of them make donations to help BAILII raise enough financial resource to pursue product development projects.
  4. No platform for experimentation or third-party development: unlike CanLII, BAILII doesn't have a public API. Third-party innovation has stalled because it is incredibly difficult to acquire access to the text of the cases.

The weaknesses I've set out above are a function of the following broader problems (again, this list isn't exhaustive):

  1. The supply chain that takes a judgment (whether handed down or given extempore) to the wider public is messy and poorly understood by the Ministry of Justice (which is worrying, because they control that supply chain).
  2. Intellectual property rights over the judgments themselves is needlessly uncertain.
  3. There is no solid model for translating the way the common law works to the sort of open case law system we need.
  4.  BAILII, in several key ways, itself acts like a publisher of proprietary content.

This post is a "primer" for a series of blogs posts I'm writing on the subject in the run-up to a talk I'll be giving at the British and Irish Association of Law Librarian's in June 2018.