I recently needed to converse a large collection of fairly complex XML files into JSON format to make them ready for leading into an Elasticsearch cluster.
After a bit of searching around, I found a really nice Python library called
xmltodict, which worked perfectly.
You can download
$ pip3 install xmltodict
xmltodict worked well with all the XML files I threw at it, so I'm not going to talk about the form and structure of the XML I was working with. The code below merely assumes that you have directory consisting of XML files and that you'd like to convert them to JSON format and write them to a new JSON file.
import xmltodict, json import os #path to the folder holding the XML directory = '/path/to/folder' #iterate over the XML files in the folder for filename in os.listdir(directory): if filename.endswith(".xml"): f = open(filename) XML_content = f.read() #parse the content of each file using xmltodict x = xmltodict.parse(XML_content) j = json.dumps(x) print (filename) filename = filename.replace('.xml', '') output_file = open(filename + '.json', 'w') output_file.write(j)