CSV

Load a CSV as dataframe with Pandas

In days gone by, when it came to wrangling with tabular data, my first port of call would have been to load the data in Excel and slog it out for as long as it took. Now, I use Pandas to wrangle tabular data.

Having used Pandas for a while now, I've come to appreciate that dealing with larger quantities in data in Excel is made difficult by two things in particular. First, memory is an issue - try using Find and Replace to remove a few thousand commas and you'll notice that your computer begins to run out of puff. Second, Excel formulas aren't a patch on the power of the Python language, the power of which can be brought to bear on a Pandas dataframe.

Load a CSV as a dataframe

df = pd.read_csv('name_of_file.csv')

Load specific columns in the CSV as a dataframe

df = pd.read_csv('name_of_file', usecols=['ColumnName', 'AnotherColumnName', 'Etcetera'])

Drop NaN Values from the Dataframe

df = pd.read_csv('name_of_file', usecols=['ColumnName', 'AnotherColumnName', 'Etcetera']).dropna()

Load a CSV as a dataframe and specify a column as the index 

mydf = pd.read_csv('name_of_file.csv', index_col='ColumnName')

Converting XML to CSV

XMLutils is a neat little Python package for converting XML to various file formats like CSV and JSON. The particularly useful thing about is that it can be executed from the command line, which makes it quick and easy to start using. 

Installation

I installed XMLutils at the command line using:

sudo easy_install XMLutils

Using XMLutils at the command line

I had a sample XML file that I wanted to convert to CSV format. There was a lot in the XML file that I didn't really want going into the output CSV file, so I executed the following command:

$ xml2csv --input "/Users/danielhoadley/Library/Mobile Documents/com~apple~CloudDocs/Documents/Development/Stuff/Dockets/10519[DK]Davies_v_Davies_(msb)(sub-bpw).xml" --output "test.csv" --tag "CaseInfo" --ignore "CaseMain" "AllNCit" "AllECLI" "TempIxCardNo" "FullReportName" "AltName" "CaseJoint_IxCardNo_TempIxCardNo_FullReportName_CaseName" "LegalTopics" "Reportability"
  • xml2csv invokes the converter
  • Declare the input XML file with --input followed by the path to the file
  • Declare the output CSV file with --output followed by the path to output file
  • Declare the XML node that represents a record in the input file (in my case, the node was CaseInfo)

Running a command like this will be sufficient to do a straight conversion to CSV:

$ xml2csv --input "/Users/danielhoadley/Library/Mobile Documents/com~apple~CloudDocs/Documents/Development/Stuff/Dockets/10519[DK]Davies_v_Davies_(msb)(sub-bpw).xml" --output "test.csv" --tag "CaseInfo"

However, as I've said, there was quite a lot in the input file that I wanted to ignore. Ignoring tags is pretty straightforward: simply declare the tags you want to ignore after that --ignore flag, e.g:

--ignore "CaseMain" "AllNCit" "AllECLI" "TempIxCardNo" "FullReportName" "AltName" "CaseJoint_IxCardNo_TempIxCardNo_FullReportName_CaseName" "LegalTopics" "Reportability"

Note

Remember to enclose the names of tags in quotes!

Convert JSON to CSV with plain Javascript

I've been exploring the excellent API provided by the Canadian case law database, CanLII, and needed to quickly convert the JSON I was puling back to CSV. 

The following code, which has been tailored to suit the structure of the data coming back from the API, got the job done:

// Include dependencies

var json2csv = require('json2csv');
var fs = require('fs');

// Set up columns in the CSV
var fields = ['databaseId', 'caseId.en', 'title', 'citation'];

// Give it the data

var cases = [
    {
        "databaseId": "csc-scc",
        "caseId": {
        "en": "2016scc56"
    },
        "title": "Canada (Attorney General) v. Fairmont Hotels Inc.",
        "citation": "2016 SCC 56 (CanLII)"
}, 
    {
        "databaseId": "csc-scc",
        "caseId": {
        "en": "2016scc55"
    },
        "title": "Jean Coutu Group (PJC) Inc. v. Canada (Attorney General)",
        "citation": "2016 SCC 55 (CanLII)"
    }
];

var csv = json2csv({ data: cases, fields: fields });

fs.writeFile('cases.csv', csv, function(err) {
    if (err) throw err;
        console.log('Case list converted and saved as CSV!!');
    }
);