Layered Ranking in OpenSearch

October 27, 2024

Building a custom ranking function is usually less about inventing a clever formula and more about separating concerns.

To illustrate, let’s say we have a products index consisting of furniture documents like this:

{
  "title": "Vinyl Record Cabinet",
  "description": "Mid-century inspired storage cabinet for vinyl records with sliding doors and adjustable shelves.",
  "date_added": "2024-10-01T00:00:00Z",
  "price": 1249.0,
  "materials": "walnut, iron",
  "stock_available": 12,
  "amount_sold": 184
}

Text relevance answers one question: does this product match the words the shopper typed? Business signals answer a different one: which relevant product should rank a little higher? When those get mixed together too early, search gets weird fast. Cheap products float to the top for the wrong reasons. Popular products bury exact matches. New arrivals disappear because old bestsellers have too much accumulated weight.

In OpenSearch, I like a simple shape for this. Let BM25 do the first pass over text fields. Use a boosting query to softly demote results you still want to keep. Then apply function_score to add bounded business signals like popularity or recency, with inventory layered on top.

Lexical Relevance

The base query should still look like search, not merchandising.

For most catalogs, title deserves a stronger field boost than description. Titles are short and curated. Descriptions are noisy, often padded with marketing copy. If a shopper searches for record cabinet, a title hit should beat a product that only mentions it halfway down the description.

A good starting point looks like this:

{
  "multi_match": {
    "query": "record cabinet",
    "fields": ["title^4", "description"],
    "type": "best_fields"
  }
}

I usually treat materials as a contextual boost instead of part of the base query. If the query clearly contains a material hint, add a small should clause for it. If not, leave it alone. A global boost on materials can distort results for broad queries.

For a query like walnut record cabinet, extract the material hint and add a should clause. For a broader query like record storage cabinet, leave it out. The resulting bool query looks like this:

{
  "bool": {
    "must": [
      {
        "multi_match": {
          "query": "walnut record cabinet",
          "fields": ["title^4", "description"],
          "type": "best_fields"
        }
      }
    ],
    "should": [
      {
        "match": {
          "materials": {
            "query": "walnut",
            "boost": 2.0
          }
        }
      }
    ]
  }
}

Boosting

boosting is useful when a document should remain eligible but rank lower.

An out-of-stock product is a good example. For a narrow query, showing the product page may still be better than showing nothing. But if there’s an in-stock alternative with similar relevance, that alternative should almost always win.

That’s exactly what boosting does. The document still has to match the positive query. If it also matches the negative query, its score is multiplied by negative_boost.

{
  "boosting": {
    "positive": { ... },
    "negative": {
      "range": {
        "stock_available": {
          "lte": 0
        }
      }
    },
    "negative_boost": 0.2
  }
}

A negative_boost of 0.2 is a strong demotion. I usually start somewhere between 0.1 and 0.3, then tune from real queries. If stock_available is a boolean in your index instead of a quantity, switch the negative query to a term query.

Function Score

Once the candidate set is textually relevant, function_score is where business ranking belongs.

OpenSearch gives you a few knobs here:

weight multiplies a score by a fixed value, so it’s the blunt instrument for simple rules.
field_value_factor derives score from a numeric field, which makes it a good fit for signals like amount_sold.
Decay functions like gauss and exp reduce score as a value moves away from an origin, which is useful for dates or numeric distance.
random_score assigns a pseudo-random score, which is better for discovery surfaces than primary search.
script_score calculates score from a custom script, so it’s the escape hatch when the built-ins aren’t enough, though that flexibility comes with extra computation cost and usually more query latency.

In retail search, I can usually go a long way with three signals.

Popularity matters, but it needs diminishing returns. Raw sales counts are too steep, so log1p or sqrt are better modifiers. Freshness matters too, especially for catalogs with seasonal inventory. Inventory status should matter a lot, but I prefer to model that as a strong demotion plus a small positive boost for in-stock products rather than a hard filter in the default search path.

Price Signals

price is one of the most tempting fields to rank on and one of the easiest to misuse.

A blanket price boost often teaches the ranker that cheaper is always better. That might work for a bargain page. It usually fails for the general search box. Someone searching for a solid walnut cabinet probably doesn’t want the cheapest cabinet first just because it’s cheap.

I find price works better in two cases. The first is explicit user intent, like a selected budget or price filter. The second is a known merchandising target, like a category page where products near a target price point convert better.

When you have that context, a decay function around a price anchor works well:

{
  "_name": "price_fit",
  "gauss": {
    "price": {
      "origin": 1200,
      "scale": 300,
      "decay": 0.5
    }
  },
  "weight": 0.75
}

That says “closer to the target is a bit better” instead of “cheaper always wins”.

Usage

Putting it all together, we can query using our custom ranking function like this:

from opensearchpy import OpenSearch


client = OpenSearch(
    hosts=[{"host": "localhost", "port": 9200}],
    http_compress=True,
)


def build_product_query(
    user_query: str,
    material_hint: str | None = None,
    price_anchor: float | None = None,
) -> dict:
    positive_query = {
        "bool": {
            "must": [
                {
                    "multi_match": {
                        "query": user_query,
                        "fields": ["title^4", "description"],
                        "type": "best_fields",
                    }
                }
            ]
        }
    }

    if material_hint:
        positive_query["bool"]["should"] = [
            {
                "match": {
                    "materials": {
                        "query": material_hint,
                        "boost": 2.0,
                    }
                }
            }
        ]

    functions = [
        {
            "_name": "in_stock_bonus",
            "filter": {"range": {"stock_available": {"gt": 0}}},
            "weight": 2.0,
        },
        {
            "_name": "popular_products",
            "field_value_factor": {
                "field": "amount_sold",
                "factor": 0.15,
                "modifier": "log1p",
                "missing": 0,
            },
        },
        {
            "_name": "fresh_products",
            "gauss": {
                "date_added": {
                    "origin": "now",
                    "offset": "7d",
                    "scale": "30d",
                    "decay": 0.5,
                }
            },
            "weight": 1.25,
        },
    ]

    if price_anchor is not None:
        functions.append(
            {
                "_name": "price_fit",
                "gauss": {
                    "price": {
                        "origin": price_anchor,
                        "scale": max(price_anchor * 0.25, 10),
                        "decay": 0.5,
                    }
                },
                "weight": 0.75,
            }
        )

    return {
        "query": {
            "function_score": {
                "query": {
                    "boosting": {
                        "positive": positive_query,
                        "negative": {
                            "range": {
                                "stock_available": {
                                    "lte": 0,
                                }
                            }
                        },
                        "negative_boost": 0.2,
                    }
                },
                "functions": functions,
                "score_mode": "sum",
                "boost_mode": "multiply",
                "max_boost": 8,
                "min_score": 0.1,
            }
        }
    }


response = client.search(
    index="products",
    body=build_product_query(
        user_query="walnut record cabinet",
        material_hint="walnut",
        price_anchor=1200,
    ),
)

for hit in response["hits"]["hits"]:
    print(f"{hit['_score']:.3f} - {hit['_source']['title']}")

A few choices in this function are doing most of the work:

field_value_factor on amount_sold rewards proven products, but log1p keeps a bestseller from crushing the rest of the catalog.
The gauss decay on date_added gives newly added products a window to surface, then lets the boost fade naturally.
score_mode: "sum" makes each business signal additive, which is easier to reason about than multiplying every function together.
boost_mode: "multiply" keeps lexical relevance in charge because the business signals still operate on top of the query score instead of replacing it.
max_boost acts as a safety rail so merchandising signals can’t completely overpower the text match.

OpenSearch gives you more modes than you’ll usually need. sum and max are the easiest to reason about for score_mode. I almost never use boost_mode: "replace" in a text search system because it throws away the query score entirely, which is usually the wrong trade.

Explaining Scores

When tuning ranking, guesswork gets expensive fast. Name the functions with _name and ask OpenSearch to explain the score for a sample hit. That turns the query into something you can actually debug.

debug_query = build_product_query(
    user_query="walnut record cabinet",
    material_hint="walnut",
)
debug_query["explain"] = True

response = client.search(index="products", body=debug_query)
top_hit = response["hits"]["hits"][0]

print(top_hit["_source"]["title"])
print(top_hit["_explanation"])

That explanation will include the function names, so you can see whether popular_products is dominating too much or whether fresh_products is barely moving the score at all. A simplified excerpt looks like this:

{
  "value": 12.84,
  "description": "function score, product of:",
  "details": [
    {
      "value": 3.91,
      "description": "query score from multi_match(title^4, description)"
    },
    {
      "value": 3.28,
      "description": "function score, score mode [sum]",
      "details": [
        {
          "value": 1.57,
          "description": "field value function(_name: popular_products): log1p(doc['amount_sold'].value * factor=0.15)"
        },
        {
          "value": 0.46,
          "description": "Function for field date_added, _name: fresh_products"
        },
        {
          "value": 2.0,
          "description": "weight(_name: in_stock_bonus)"
        }
      ]
    }
  ]
}

Tuning

The biggest mistake I see in ranking work is tuning against vibes.

Keep a small evaluation set of real queries. Include a few head terms, some long-tail queries, and a couple of ambiguous ones. Then adjust one signal at a time. If popularity helps broad queries but hurts exact matches, the factor is too strong. If new products never surface, the freshness window is too short. If out-of-stock items still win too often, the demotion isn’t strong enough.

The main idea is simple. Use boosting to apply soft penalties. Use function_score to layer bounded business signals on top of lexical relevance. Keep each signal interpretable. If you can still explain why the first result won, you’re probably on the right track.