Rate Limiting with Redis
Having a good rate limiting strategy is a crucial part of managing an API. When developing one, two important questions to keep in mind are:
- What algorithm suits your traffic patterns?
- How do you store client activity?
Naive approach
Use a fixed window algorithm that stores client information in memory. You can keep track of clients with a map of IP address to the number of requests in the current window. This is sufficient for most low traffic applications, and you can likely stop here.
However, a flaw with the algorithm is that it doesn’t account
for bursts on the edges of the window, allowing clients to overcome the
limit if timed correctly. For instance with a rate of 100 req/minute
a client
could make 100 requests at 11:59:59 and another 100 at 12:00:01, effectively
getting 200 requests in 2 seconds despite the limit.
In addition to that, if you’re not pruning the client data periodically from memory then it may grow indefinitely to eat into your application’s resources.
Smoothing the curve
An improved rate limiting algorithm is the sliding window which takes into account the last time a request for a client was counted, and continuously moves the window with each incoming request.
This mitigates bursty traffic by smoothing the rate of requests over time. It models request patterns for higher traffic services more accurately at the expense of using more resources to decide whether to drop a request.
Offloading to Redis
To mitigate resource consumption you can move the logic from the service into a fast & ephemeral data store such as Redis. It can both store client data, and handle the rate limiting logic via Lua scripts.
Here’s an implementation using Go:
package ratelimit
import (
"context"
"fmt"
"time"
"github.com/redis/go-redis/v9"
)
type Limiter struct {
client *redis.Client
burst int
window time.Duration
}
// Lua script for atomic sliding window implementation
const limiterScript = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window_start = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
local window_ms = tonumber(ARGV[4])
-- Remove old entries
redis.call('ZREMRANGEBYSCORE', key, 0, window_start)
-- Count current window
local current = redis.call('ZCOUNT', key, window_start, now)
-- Set key expiration
redis.call('EXPIRE', key, window_ms / 1000)
if current < limit then
-- Add the current request with unique identifier
-- Use timestamp as score, but unique member to handle concurrent requests
local request_id = now .. ':' .. redis.call('INCR', key .. ':seq')
redis.call('ZADD', key, now, request_id)
redis.call('EXPIRE', key, window_ms / 1000)
redis.call('EXPIRE', key .. ':seq', window_ms / 1000)
return 1
end
return 0
`
// NewLimiter creates a new Redis-backed sliding window rate limiter
// requestsPerSecond: allowed requests per second (e.g., 10 req/s)
// burst: maximum requests allowed in the window
func NewLimiter(redisAddr string, requestsPerSecond float64, burst int) *Limiter {
return &Limiter{
client: redis.NewClient(&redis.Options{
Addr: redisAddr,
}),
burst: burst,
window: time.Duration(float64(burst)/requestsPerSecond) * time.Second,
}
}
// Allow checks if a request from the given identifier should be allowed
func (l *Limiter) Allow(identifier string) (bool, error) {
ctx := context.Background()
now := time.Now().UnixMilli()
windowStart := now - int64(l.window.Milliseconds())
result, err := l.client.Eval(ctx, limiterScript,
[]string{identifier},
now,
windowStart,
l.burst,
l.window.Milliseconds(),
).Int()
if err != nil {
return false, fmt.Errorf("rate limiter error: %w", err)
}
return result == 1, nil
}
The window duration is calculated from your rate and burst parameters. For
example, with requestsPerSecond=10
and burst=50
, you get a 5-second sliding
window allowing up to 50 requests.
This implementation tracks requests per client and sets an explicit TTL for all keys regardless of whether the request was dropped, ensuring Redis memory doesn’t grow unbounded.
You’ve now got a robust and lightning-fast rate limiting strategy that follows best practices of service statelessness. The approach scales horizontally across multiple application servers and can handle thousands of requests per second with sub-millisecond latency.