./ahmedhashim

Reindexing Elasticsearch with Zero-Downtime

When you need to change a field mapping in Elasticsearch, you’ll quickly discover a limitation: you can’t modify mappings in place. But using aliases and atomic operations on the reindex API, you can make it happen with zero downtime.

Say you’ve got a books index and you need to change how the description field works - maybe switch analyzers or change the field type entirely. Elasticsearch won’t let you modify existing mappings, so you’re stuck with reindexing. To avoid any downtime, you’ll need to use aliases to seamlessly switch between indices.

Step 1: Define the new Index Mapping

First, create a new index with your updated mapping:

curl -X PUT "http://localhost:9200/books_v2" \
  -H "Content-Type: application/json" \
  -d '{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase", "stop"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "standard"
      },
      "author": {
        "type": "keyword"
      },
      "description": {
        "type": "text",
        "analyzer": "english"
      },
      "publication_year": {
        "type": "integer"
      }
    }
  }
}'

Step 2: Start the Reindex

Pause any pipelines populating the old index before proceeding. This is to avoid any gaps in the data while the new index is being created. Depending on how long reindexing takes, the documents in your corpus may get stale during this time.

Now kick off the reindex operation asynchronously:

curl -X POST "http://localhost:9200/_reindex?wait_for_completion=false" \
  -H "Content-Type: application/json" \
  -d '{
  "source": {
    "index": "books"
  },
  "dest": {
    "index": "books_v2"
  }
}'

For very large indexes, you can throttle via requests_per_second to ease the load:

curl -X POST "http://localhost:9200/_reindex?wait_for_completion=false" \
  -H "Content-Type: application/json" \
  -d '{
  "source": {
    "index": "books",
    "size": 5000
  },
  "dest": {
    "index": "books_v2"
  },
  "requests_per_second": 1000
}'

You’ll get back a TASK_ID you can use to monitor progress.

Step 3: Monitor Progress

Check the reindex status using the TASK_ID you got back in the previous step:

curl -X GET "http://localhost:9200/_tasks/TASK_ID"

Or if you want to watch it in real-time (pretty satisfying for large reindexes):

# Watch progress every 5 seconds
watch -n 5 'curl -s -X GET "http://localhost:9200/_tasks/TASK_ID" | jq ".task.status"'

You can also view all running reindex operations:

curl -X GET "http://localhost:9200/_tasks?actions=*reindex&detailed=true"

Step 4: Verify Everything Copied Over

Now verify that all documents made it to the new index:

# Original index count
curl -X GET "http://localhost:9200/books/_count"

# New index count
curl -X GET "http://localhost:9200/books_v2/_count"

If the numbers don’t match, it’s likely the original index was updated during reindexing. Go back to step 2, and pause your data pipelines before trying again.

Step 5: The Atomic Switch

This is the key part: In one atomic operation, you’ll delete the old index and create an alias:

curl -X POST "http://localhost:9200/_aliases" \
  -H "Content-Type: application/json" \
  -d '{
  "actions": [
    {
      "remove_index": {
        "index": "books"
      }
    },
    {
      "add": {
        "index": "books_v2",
        "alias": "books"
      }
    }
  ]
}'

Zero downtime achieved! All clients hitting the books index shouldn’t even be aware of the change.

Step 6: Trust but verify

# Check the alias is pointing to the right index
curl -X GET "http://localhost:9200/_cat/aliases/books?v"

# Test a search on your updated field
curl -X GET "http://localhost:9200/books/_search" \
  -H "Content-Type: application/json" \
  -d '{
  "query": {
    "match": {
      "description": "adventure"
    }
  }
}'

Finally, once you switch to using aliases stick with them! It makes future migrations way easier.