Design a Rate Limiter

Clients are identified by user_id, Ip address or other modalities where server side rate limiters offer more control and not prone to manipulations over their client side variations.

RateLimiters have jurisdiction of the whole system and are usually placed in a centralized location like an API Gateway. They can filter by urls (Layer 7) or by IPs via IPTables (Layer 4).

Algorithms:

Token bucket: request per token and automatically refilled at a predetermined rate with max bucket size. Requires a bucket per limit and might be hard to tune with only two parameters(fill rate and bucket size) but memory efficient and allows burst traffic.
Leaking bucket: implemented as a queue fixed length queue which drops excess requests and are processed at a fixed rate. Good for stable processing rates however burst traffic could fill up the queue and delay recent requests. There are only two parameters available for tuning (queue size and number of workers processing variable length requests)
Fixed window counter: Timeline is split into fixed sized windows with a predefined threshold which if a request counter exceeds is dropped. Burst traffic at the falling/starting edge of consecutive windows could breach the max threshold
Sliding window log: timestamps of requests are logged. If the log size for a time window exceeds a predetermined threshold its dropped. Capable of accuracy at the expense of excessive memory for timestamps of all (even rejected) requests.
Sliding windows counter: A predefined threshold is configured. The current request count is calculated as a weighted percentage current window’s sub interval, incremented by weighted percentage of the previous window’s interval. Since this is an approximation, thresholds could be exceed for bursty traffic.

Rules are stored on disk as configurations that can be periodically synced or pushed on update to a centralized cache used by the rate limiter middleware. Use in a single distributed memory cache like redis to avoid synchronization issues across rate limiter applications. Batch operations with pipelining and leverage sorted data sets so increment, decrement, expire and query values to avoid race conditions.

User X-RateLimit-[ Remaining | Limit | Retry-After ] headers informing clients of their limits are or when they can try back after a HTTP 429 “Too many Requests” response.

Avoid being rate limited using client side cache, honor RateLimit headers and add sufficient backoff time for any retry logic.

This article is part of the system design series where I am summarizing chapters from The System Design Interview: Volume 1 / Volume 2 amongst other related content