Command Line

Using grep to search in files

I recently needed to find out which files in a directory contained a particular string of text. After fiddling around with a Python script for far longer than I ought to have done, I realised I could achieve my goal using grep at the command line. 

For example, say I have a folder full of text files and I want to know which of them contain the string 'CRIMINAL DIVISION', the following command will bring back the list I need at the console:

grep -R "CRIMINAL DIVISION" *.txt 

Converting XML to CSV

XMLutils is a neat little Python package for converting XML to various file formats like CSV and JSON. The particularly useful thing about is that it can be executed from the command line, which makes it quick and easy to start using. 

Installation

I installed XMLutils at the command line using:

sudo easy_install XMLutils

Using XMLutils at the command line

I had a sample XML file that I wanted to convert to CSV format. There was a lot in the XML file that I didn't really want going into the output CSV file, so I executed the following command:

$ xml2csv --input "/Users/danielhoadley/Library/Mobile Documents/com~apple~CloudDocs/Documents/Development/Stuff/Dockets/10519[DK]Davies_v_Davies_(msb)(sub-bpw).xml" --output "test.csv" --tag "CaseInfo" --ignore "CaseMain" "AllNCit" "AllECLI" "TempIxCardNo" "FullReportName" "AltName" "CaseJoint_IxCardNo_TempIxCardNo_FullReportName_CaseName" "LegalTopics" "Reportability"
  • xml2csv invokes the converter
  • Declare the input XML file with --input followed by the path to the file
  • Declare the output CSV file with --output followed by the path to output file
  • Declare the XML node that represents a record in the input file (in my case, the node was CaseInfo)

Running a command like this will be sufficient to do a straight conversion to CSV:

$ xml2csv --input "/Users/danielhoadley/Library/Mobile Documents/com~apple~CloudDocs/Documents/Development/Stuff/Dockets/10519[DK]Davies_v_Davies_(msb)(sub-bpw).xml" --output "test.csv" --tag "CaseInfo"

However, as I've said, there was quite a lot in the input file that I wanted to ignore. Ignoring tags is pretty straightforward: simply declare the tags you want to ignore after that --ignore flag, e.g:

--ignore "CaseMain" "AllNCit" "AllECLI" "TempIxCardNo" "FullReportName" "AltName" "CaseJoint_IxCardNo_TempIxCardNo_FullReportName_CaseName" "LegalTopics" "Reportability"

Note

Remember to enclose the names of tags in quotes!

Prettifying JSON

Useful resources for pretty-printing and uglifying JSON at the command line can be found here

A particularly useful method is to use Ruby's ppjson.

To use ppjson:

1. Install ppjson at the command line:

gem install ppjson

2. To pretty print the JSON and write the prettified version back to the file (as opposed to having it merely pour into the terminal console), run:

ppjson -fi abc123.json