Painless Enrichment

January 4, 2024

Search requirements change faster than index schemas. In a retail product catalog, you may need new derived fields for filtering or reporting after the products index is already live.

Painless is the built-in scripting language for Elasticsearch. If the base fields already exist (price, list_price, cost, inventory_count), /_update_by_query can enrich documents in place without a full reindex.

Add target fields to the mapping

Before touching documents, add the derived fields to the existing products mapping. This does not modify existing source fields.

PUT /products/_mapping
{
  "properties": {
    "discount_pct": { "type": "integer" },
    "margin_pct":   { "type": "integer" },
    "price_band":   { "type": "keyword" },
    "stock_state":  { "type": "keyword" }
  }
}

Dry run with inline script execution

Before running updates at scale, use /_scripts/painless/_execute with sample values. It catches math and type issues early.

POST /_scripts/painless/_execute
{
  "script": {
    "source": """
      double discount = ((params.list_price - params.price) / params.list_price) * 100.0;
      double margin = ((params.price - params.cost) / params.price) * 100.0;

      return [
        "discount_pct": (int) Math.round(discount),
        "margin_pct": (int) Math.round(margin)
      ];
    """,
    "params": {
      "price": 79.99,
      "list_price": 99.99,
      "cost": 42.00
    }
  }
}

Store the enrichment script

Once the formula looks right, store a reusable script in cluster state so update jobs can call it by ID. This version adds null checks, computes numeric fields, and sets simple categories.

PUT /_scripts/products-enrichment-v1
{
  "script": {
    "lang": "painless",
    "source": """
      if (ctx._source.list_price != null &&
          ctx._source.list_price > 0 &&
          ctx._source.price != null) {
        double discount =
          ((ctx._source.list_price - ctx._source.price) / ctx._source.list_price) * 100.0;
        ctx._source.discount_pct = (int) Math.round(discount);
      }

      if (ctx._source.price != null &&
          ctx._source.price > 0 &&
          ctx._source.cost != null) {
        double margin =
          ((ctx._source.price - ctx._source.cost) / ctx._source.price) * 100.0;
        ctx._source.margin_pct = (int) Math.round(margin);
      }

      if (ctx._source.price != null) {
        if (ctx._source.price < 25) {
          ctx._source.price_band = "budget";
        } else if (ctx._source.price < 100) {
          ctx._source.price_band = "mid";
        } else {
          ctx._source.price_band = "premium";
        }
      }

      if (ctx._source.inventory_count != null) {
        if (ctx._source.inventory_count > 0) {
          ctx._source.stock_state = "in_stock";
        } else {
          ctx._source.stock_state = "out_of_stock";
        }
      }
    """
  }
}

Backfill existing products without reindexing

Now run /_update_by_query asynchronously so the request returns immediately with a task id. The query targets only documents missing at least one derived field so reruns stay efficient.

POST /products/_update_by_query?conflicts=proceed&wait_for_completion=false&slices=auto
{
  "query": {
    "bool": {
      "should": [
        { "bool": { "must_not": { "exists": { "field": "discount_pct" } } } },
        { "bool": { "must_not": { "exists": { "field": "margin_pct" } } } },
        { "bool": { "must_not": { "exists": { "field": "price_band" } } } },
        { "bool": { "must_not": { "exists": { "field": "stock_state" } } } }
      ],
      "minimum_should_match": 1
    }
  },
  "script": {
    "id": "products-enrichment-v1"
  }
}

The response will include a task id, for example:

{
  "task": "r1A2WoRbTwKZ516z6NEs5A:36619"
}

Then monitor progress every 5 seconds until the task reports "completed": true.

watch -n 5 'curl -s "http://your-cluster:9200/_tasks/r1A2WoRbTwKZ516z6NEs5A:36619" | jq "{completed: .completed, updated: .task.status.updated, version_conflicts: .task.status.version_conflicts, failures: .response.failures}"'

Verify

Finally, run a small search and inspect _source to confirm the enriched fields are present. Keep the result size low for a quick sanity check.

GET /products/_search
{
  "size": 3,
  "_source": [
    "name",
    "price",
    "discount_pct",
    "margin_pct",
    "price_band",
    "stock_state"
  ],
  "query": {
    "match_all": {}
  }
}

This is a simple maintenance loop: add fields, backfill safely, and avoid reindexing when the index structure stays the same.