Load JSON files into Elasticsearch

The following example provides a simple guide for loading JSON files into Elasticsearch using the official elasticsearch API in Python.

Import dependencies

import requests, json, os
from elasticsearch import Elasticsearch

Set the path to the directory containing the JSON files to be loaded

directory = '/path/to/files/'

Connect to the Elasticsearch server

By default, the Elasticsearch instance will listen on port 9200

res = requests.get('http://localhost:9200')
print (res.content)
es = Elasticsearch([{'host': 'localhost', 'port': '9200'}])

Create an index value object

Because I want Elasticsearch to use a bog-standard integer at the unique _id for each document being loaded, I'm setting this up now outside the for loop I'm going to use to interate over the JSON files for loading into Elasticsearch

i = 1

Iterate over each JSON file and load it into Elasticsearch

for filename in os.listdir(directory):
    if filename.endswith(".json"):
        f = open(filename)
        docket_content = f.read()

Load each file into an Elasticsearch index

The following line is the line that's actually sending the content into Elasticsearch.

es.index(index='myindex', ignore=400, doc_type='docket', id=i, body=json.loads(docket_content))

        i = i + 1

There are a few things worth pointing out here:

  • index= is the name of the index we're creating, this can be anything you like
  • ignore=400 is flagging that I want to loader to ignore instances in which Elasticsearch is complaining about the format of any of the fields in the source JSON data (date fields, I get the feeling, are a commom offender here). I just want to Elasticsearch to receive the data as is without second guessing it.
  • doc_type is just a label we're assigning to each document being loaded
  • id=i is the unique index value being assigned to each document as it's loaded. You can leave this out as Elasticsearch will apply its own id.