Jitterbug

March 3, 2024

I was once debugging a production API that returned periodic bursts of 500s. The pattern was odd: traffic was steady most of the day, then errors spiked on a regular cadence.

The service itself was simple. It read from a database, returned JSON, and cached full responses in Redis using the request URL as the cache key with a fixed TTL. Clients tended to ramp traffic around the same time, so a large batch of keys was created in the same window. When those TTLs expired, they expired together.

That synchronized expiry caused a thundering herd. Cache misses surged, the web server had to refill many keys at once, database connections got saturated, and some requests timed out as 500s.

Rather than overprovisioning database connections for a predictable burst, I added jitter to cache TTLs so keys would expire over a range instead of all at once. That change worked immediately.

This post is a deeper look at that fix using a reproducible load test I built called Jitterbug, where I compare static TTL and jittered TTL side by side on Grafana at high request volume.

Experiment setup

The service is intentionally simple. It has two endpoints backed by Redis:

/static/data?id=N
/jitter/data?id=N

Both endpoints cache the full response payload in Redis using the full URL as the cache key, including query parameters. The only difference is TTL behavior.

The static endpoint uses a fixed TTL of 10 seconds for every key, while the jitter endpoint uses a TTL between 10 and 15 seconds:

from typing import Literal
import random

Mode = Literal["static", "jitter"]

STATIC_TTL = 10  # seconds
MAX_JITTER = 5   # seconds


def _ttl_ms(mode: Mode) -> int:
    if mode == "static":
        return STATIC_TTL * 1000
    jittered_ttl = STATIC_TTL + random.uniform(0, MAX_JITTER)
    return int(jittered_ttl * 1000)

The ID space is finite at 0..999, so each endpoint has 1000 possible keys. On a cache miss, the app simulates backend work with a 20 to 50ms delay before writing to cache.

Load and observability

The full stack runs in Docker and consists of:

FastAPI: web server for the /static/data and /jitter/data endpoints
Redis: cache store for full response payloads keyed by request URL
Locust: load generator that drives traffic at a fixed target rate
VictoriaMetrics: metrics backend that scrapes /metrics every second
Grafana: dashboard used to visualize misses, latency, and miss ratio

  graph LR
    L["Locust<br/>1500 users, 4 processes"]
    W["FastAPI<br/>/static/data?id=N<br/>/jitter/data?id=N"]
    R[("Redis<br/>Cache")]
    VM["VictoriaMetrics<br/>scrape every 1s"]
    G["Grafana"]

    L -->|"HTTP load"| W
    W -->|"cache read/write by full URL key"| R
    W -->|"/metrics"| VM
    G -->|"PromQL queries"| VM

Traffic was split evenly between static and jitter endpoints. Locust ran with 1500 users across 4 processes and targeted about 10.5k requests per second total. In steady state, each endpoint sat around 5.2k req/s.

The app exported Prometheus metrics for cache hits and misses per mode, plus request duration histograms per mode. The scrape interval was set to 1 second so short TTL cycles are visible.

The dashboard

dashboard

The first row in the dashboard tracks cache misses per second for each endpoint. Static TTL shows recurring spikes because many keys expire around the same time. Jitter TTL stays in a narrower band because expirations are spread out.

The second row tracks latency (p50, p95, p99) for each endpoint. On static TTL, latency peaks line up with miss spikes. On jitter TTL, latency stays steadier with smaller peaks.

The third row provides direct comparisons. The misses overlay makes the spike contrast obvious, and the miss ratio panel normalizes by request volume so the difference is easier to compare.

cache-miss-rate

Results

In a representative 10 minute run on my machine, the system handled about 6.25 million requests with zero failures. Total throughput was about 10.4k req/s, split evenly at about 5.2k req/s per endpoint.

Latency stayed low at this load level, with p50 around 16ms, p95 around 50ms, and p99 around 74ms.

Cache behavior is where the difference stood out. Static recorded about 73k misses while jitter recorded about 59k misses. That puts miss ratio at about 2.35% for static versus 1.88% for jitter, or roughly 20% lower miss ratio with jitter.

At the same traffic level, jitter produced fewer misses and a less bursty profile, so the system could sustain the same load more predictably.

Why jitter works

With fixed TTLs, keys that are created around the same time also expire around the same time. That synchronization creates short miss storms.

With jittered TTLs, expiration times are intentionally spread out. The backend still handles misses, but misses arrive in a smoother stream.

At high request rates, peak load matters more than average load. Jitter helps by cutting those peaks.

Takeaways

If you cache hot API responses with Redis and a fixed TTL, adding a small random offset to TTL is a cheap win. It smooths cache miss pressure, helps stabilize latency, and reduces burst contention in downstream systems.

If you want to reproduce this yourself, clone the repo, run the stack, and watch the miss and latency panels side by side. The difference shows up quickly.