Rate Limiting APIs in PHP: Protecting Your Application

20 Jun 2026
Darwinbark Team
Web Development

An API with no rate limiting is one slow afternoon away from an outage — a buggy client retrying too aggressively, a scraper hammering an endpoint, or a single misbehaving integration partner can each consume enough capacity to degrade service for every other legitimate user. Rate limiting is not primarily about stopping malicious actors, though it does that too; it is about protecting shared capacity from being monopolized by any single consumer, intentional or not.

The Token Bucket Pattern

A common, well-understood rate-limiting approach: each client gets a "bucket" that holds a maximum number of tokens, one token is consumed per request, and tokens refill at a steady rate over time. This naturally allows brief bursts (as long as the bucket has tokens available) while still enforcing a steady long-run average rate.

public function allow(string $key, int $maxTokens, int $refillSeconds): bool
{
    $bucket = Cache::get($key, ['tokens' => $maxTokens, 'updated' => time()]);
    $elapsed = time() - $bucket['updated'];
    $refilled = min($maxTokens, $bucket['tokens'] + ($elapsed / $refillSeconds) * $maxTokens);

    if ($refilled < 1) {
        Cache::put($key, ['tokens' => $refilled, 'updated' => time()], 3600);
        return false;
    }

    Cache::put($key, ['tokens' => $refilled - 1, 'updated' => time()], 3600);
    return true;
}

Rate-Limit by the Right Key

Rate limiting purely by IP address breaks down quickly — many legitimate users can share one IP behind corporate NAT or a mobile carrier's gateway, and a determined abuser can rotate IPs trivially. For an authenticated API, rate-limiting by API key or user id is almost always more accurate. For unauthenticated endpoints, combining IP with other signals (a fixed per-endpoint global ceiling, alongside per-IP limits) gives better protection than either alone.

Respond With the Right Headers and Status Code

A rate-limited response should return HTTP 429 (Too Many Requests), not a generic 400 or 500, so well-behaved clients can distinguish "you are being throttled, slow down" from "your request was malformed" or "the server is broken." Including standard rate-limit headers lets clients self-regulate proactively instead of discovering the limit only by hitting it repeatedly.

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 23
X-RateLimit-Reset: 1750000000
Retry-After: 30

Different Endpoints Need Different Limits

A cheap, cacheable read endpoint can usually tolerate a far higher rate limit than an expensive write endpoint or one that triggers a costly downstream operation (sending an SMS, calling a paid third-party API). Applying one blanket rate limit across an entire API either over-restricts the cheap endpoints or under-protects the expensive ones; per-endpoint limits, set according to each endpoint's actual cost, are worth the extra configuration.

Rate Limiting Is Not a Substitute for Authentication

Rate limiting slows down abuse; it does not prevent unauthorized access on its own. An endpoint with no authentication but a generous rate limit is still an endpoint anyone can call — just more slowly. Treat rate limiting as one layer in a broader security posture, not a replacement for actually verifying who is making a request in the first place.

Sliding Window Versus Fixed Window

A fixed window (reset the counter every 60 seconds on the clock) allows a burst of nearly double the intended limit right at the window boundary — the full limit just before reset, and the full limit again just after. A sliding window, tracking requests over a continuously moving time range rather than a fixed clock boundary, avoids this boundary-burst problem at the cost of slightly more bookkeeping per request.

Distributed Rate Limiting Across Multiple Servers

Rate limiting using each server's own local memory works fine for a single server but silently fails once an application runs behind a load balancer across multiple servers — each server enforces its own separate limit, so the effective combined limit becomes (limit × number of servers). A shared store (Redis) that every server checks against is necessary once an application runs on more than one instance.

Graceful Degradation Instead of Hard Rejection

Returning a flat 429 for every request over the limit is simple but can be unnecessarily harsh for some use cases. For some endpoints, queuing the excess request briefly and processing it after a short delay, rather than rejecting it outright, gives a better experience for a client that briefly bursts over the limit by a small margin without meaningfully compromising the protection rate limiting is meant to provide.

Communicating Limits to API Consumers

A rate limit that is undocumented forces every API consumer to discover it empirically, usually at the worst possible time — during a production incident when their own traffic spikes and starts getting throttled with no warning. Documenting limits clearly per endpoint, and including the rate-limit headers on every response (not just rejected ones), lets well-behaved API consumers build their own backoff logic proactively instead of reactively.

Case Study: The Partner Integration That Took the API Down

An integration partner's newly-deployed sync script had a bug causing it to retry every failed request immediately with no backoff, and a separate bug meant nearly every request was failing validation and triggering exactly that retry path. Within twenty minutes the partner's single misbehaving script was generating more traffic than the rest of the API's entire normal customer base combined, degrading response times for everyone. The API had no rate limiting in place at the time. The fix shipped that day was IP-and-API-key-based rate limiting with a 429 response and Retry-After header; the partner's script, once it started receiving 429s, correctly began backing off on its own without any human intervention needed on either side.

A Glossary for This Topic

Rate limiting: restricting how many requests a client can make within a given time period. Token bucket: a rate-limiting algorithm allowing bursts up to a capped reserve while enforcing a steady average rate. 429: the HTTP status code meaning "too many requests." Retry-After: a response header telling a client how long to wait before retrying. Sliding window: a rate-limiting approach tracking requests over a continuously moving time range rather than fixed clock intervals.

Frequently Asked Questions

Should rate limits differ for authenticated versus anonymous users? Generally yes — authenticated users are identifiable and accountable, so they can often be trusted with higher limits than anonymous traffic.

What is a reasonable default rate limit? There is no universal number; it depends entirely on what each endpoint costs to serve and what legitimate usage patterns actually look like for your specific API.

Does rate limiting protect against DDoS attacks? Only partially — application-level rate limiting helps against moderate abuse, but a large-scale distributed attack usually needs protection further upstream, like a CDN or dedicated DDoS mitigation service.

Step-by-Step: Adding Rate Limiting to an Existing API

First, identify your highest-cost endpoints and your most abused ones, since these need limits most urgently and should not wait for a blanket rollout. Second, choose a rate-limit key appropriate to each endpoint's authentication model (API key for authenticated, IP for anonymous). Third, implement limits initially in a logging-only mode, recording what would have been rejected without actually rejecting it, to validate your limits against real traffic before enforcing them. Fourth, review that data, adjust limits that would have blocked legitimate usage, and only then switch to enforcing mode. Fifth, monitor 429 rates after enforcement begins and keep tuning.

A Comparison Table: Rate Limiting Algorithms

Algorithm	Allows Bursts	Complexity
Fixed window	Yes, at window boundaries (a flaw)	Low
Sliding window	Smoothly, no boundary flaw	Medium
Token bucket	Yes, up to bucket capacity	Medium
Leaky bucket	No, strictly steady output rate	Medium

Security Considerations Checklist

Never rely on rate limiting alone as your only defense against credential-stuffing or brute-force login attempts — combine it with account lockout policies and anomaly detection, since a sufficiently patient attacker can stay just under a rate limit indefinitely. Ensure rate-limit storage itself (the Redis keys tracking counts) cannot be manipulated by a client, keeping all rate-limit state server-side and never trusting any client-supplied count or token. Be cautious about leaking sensitive information in rate-limit error responses, like revealing whether a specific username exists based on different limit behavior for valid versus invalid accounts.

Accessibility Considerations

A blanket, generic 429 error page with no further explanation can be confusing for any user encountering it, but particularly so for users relying on screen readers if the error page itself is not properly structured with a clear, announced heading explaining what happened and what to do next, rather than just a bare status code with no accessible context.

How This Plays Out at Different Scales

A small internal API with a handful of known consumers may not need aggressive rate limiting at all initially. A public-facing API with external consumers needs the per-endpoint, per-key limiting described throughout this guide as a baseline, non-optional protection. A large-scale public API serving many thousands of consumers typically needs the distributed, Redis-backed limiting described earlier as a hard requirement, plus dedicated abuse-detection tooling beyond simple rate limiting alone.

What to Do When You Inherit an API With No Rate Limiting at All

Inheriting a production API that has run for years with no rate limiting, purely on luck and the absence of any sufficiently abusive client so far, is a ticking risk rather than a stable state. Follow the step-by-step rollout described earlier exactly: start in logging-only mode to understand real traffic patterns before enforcing anything, since rolling out enforcement blind against unknown traffic risks blocking legitimate, longstanding integration partners who have never had to respect a limit before and may not handle a sudden 429 gracefully.

Final Checklist Before Trusting an API's Rate Limiting

Every endpoint has an explicit, deliberately chosen limit, not a single blanket default applied without consideration. Limits are enforced server-side using shared, distributed storage if running on more than one server. Responses include standard rate-limit headers on every request, not just rejected ones. 429 responses are distinguishable from other error types and documented for API consumers. Rate limiting has been validated against real traffic in logging-only mode before enforcement began.

Closing Thought, Revisited

Rate limiting is one of the few defensive measures that protects an API from itself as much as from any external bad actor — a legitimate but buggy client can do as much damage as a malicious one, and a well-designed limit treats both the same way: a clear, fast, informative rejection rather than a degraded experience for everyone else sharing the same infrastructure.

Rate Limiting GraphQL Differently Than REST

A single GraphQL endpoint can serve wildly different-cost queries through the same URL, unlike REST where each endpoint has a roughly predictable cost — simple request-counting rate limiting misses this entirely, since one expensive, deeply-nested query can cost far more than ten simple ones combined. Query complexity analysis, assigning a cost score to a query based on its structure before executing it and rate-limiting against that cost rather than raw request count, is the more accurate approach for GraphQL specifically.

Testing Rate Limits in CI

Rate-limiting logic deserves its own automated tests, not just manual verification — asserting that the 101st request within a window correctly returns 429, that the count resets after the window elapses, and that the correct headers are present on both allowed and rejected responses, catches regressions in this security-relevant logic the same way any other tested business logic would be caught.

Per-User Versus Per-Plan Rate Limits

A SaaS API serving multiple pricing tiers often wants higher rate limits for paying customers on higher tiers than for free-tier users, which means rate-limit configuration needs to be looked up per user rather than applied as one static value across the whole API. Storing the limit as an attribute resolved from the authenticated user's plan, rather than hardcoding a single number into the rate-limiting middleware, keeps this tier-based differentiation maintainable as pricing plans change.

Burst Allowances for Webhooks

An incoming webhook from a third party can legitimately arrive in a tight burst — a batch of order updates delivered all at once after a brief outage on the sender's side, for instance. Rate limiting webhook endpoints with the same strictness as a typical client-facing endpoint can cause legitimate, expected bursts to be rejected; webhook endpoints often warrant a higher burst allowance specifically to accommodate this normal delivery pattern.

Rate Limiting and Caching Work Together

An endpoint protected by aggressive caching naturally needs a less restrictive rate limit, since most requests are served from cache rather than hitting the expensive underlying logic the limit was designed to protect in the first place. Reviewing rate limits alongside caching strategy, rather than setting them independently, avoids a limit calibrated for an uncached cost profile being needlessly strict once caching is added later.

Rate Limiting Internal Service-to-Service Calls

Internal services calling each other are sometimes treated as exempt from rate limiting since both ends are "trusted," but a bug in one internal service can still generate a damaging flood of requests against another internal service just as easily as an external client could. Applying rate limits internally too, even if more generous than external-facing ones, contains the blast radius of an internal bug to one service rather than letting it cascade into degrading every service it calls.

Rate Limit Configuration Should Live Outside Code

Hardcoding specific limit numbers directly into application code means every adjustment requires a full deploy, which is too slow when an active abuse incident calls for tightening a limit immediately. Keeping limits in configuration that can be changed and reloaded without a full code deploy lets a team respond to an active incident in minutes rather than waiting on a deploy pipeline.

Web Developer

Mobile Apps

SaaS Products

WhatsApp Solutions