Rate Limiting Strategies When Scraping with Proxies: Balancing Speed and Safety
Rate limiting is the control system that determines how fast you scrape. Too fast, and you burn proxy IPs, get blocked, and overload target servers. Too slow, and you waste proxy bandwidth and fail to collect data within your time window. The goal is maximum sustained throughput without triggering detection or causing harm.
This guide covers rate limiting algorithms, adaptive rate control, per-domain policies, and monitoring patterns -- with production code you can adapt to any scraping pipeline. For the broader context of proxy-based scraping architecture, see our distributed scraping pipeline guide.
Why Rate Limiting Matters
For Success Rates
We tested the same scraping workload against Cloudflare-protected targets at different request rates (source: Hex Proxies internal testing, April 2026, residential rotating proxies, 10,000 requests per rate configuration):
| Requests per Second (per domain) | Success Rate | Block Rate | Avg Latency |
|---|---|---|---|
| 0.5 (1 every 2s) | 96.8% | 1.2% | 340ms |
| 1.0 (1 per second) | 94.2% | 3.1% | 355ms |
| 2.0 (2 per second) | 89.1% | 7.4% | 420ms |
| 5.0 (5 per second) | 76.3% | 18.2% | 680ms |
| 10.0 (10 per second) | 58.4% | 32.1% | 1,240ms |
For Proxy Cost
Rate limiting directly affects cost efficiency:
Cost per successful request = proxy_cost / successful_requests
At 1 req/s with 94% success rate:
10,000 total requests → 9,400 successful
Cost ratio: 1.06x (6% waste)
At 10 req/s with 58% success rate:
10,000 total requests → 5,840 successful
Cost ratio: 1.71x (71% waste)
Scraping faster can actually cost more per successful data point because the failed requests still consume proxy bandwidth (for residential plans) or contribute to IP reputation degradation (for ISP plans).
Rate Limiting Algorithms
Token Bucket
The token bucket is the standard rate limiting algorithm. A bucket fills with tokens at a fixed rate. Each request consumes one token. If the bucket is empty, the request waits.
import time
import threading
from dataclasses import dataclass
@dataclass(frozen=True)
class TokenBucketConfig:
"""Immutable token bucket configuration."""
rate: float # Tokens per second
burst: int # Maximum bucket size
name: str = "default"
class TokenBucket:
"""Thread-safe token bucket rate limiter.
Allows bursting up to `burst` requests instantly,
then limits to `rate` requests per second.
"""
def __init__(self, config):
self.config = config
self._tokens = float(config.burst)
self._last_refill = time.monotonic()
self._lock = threading.Lock()
def acquire(self, timeout=30.0):
"""Acquire a token, blocking until one is available.
Args:
timeout: Maximum seconds to wait for a token.
Returns:
True if token acquired, False if timed out.
"""
deadline = time.monotonic() + timeout
while True:
with self._lock:
self._refill()
if self._tokens >= 1.0:
self._tokens -= 1.0
return True
# Calculate wait time for next token
wait = (1.0 - self._tokens) / self.config.rate
if time.monotonic() + wait > deadline:
return False # Would exceed timeout
time.sleep(min(wait, 0.1)) # Sleep in small increments
def _refill(self):
"""Add tokens based on elapsed time."""
now = time.monotonic()
elapsed = now - self._last_refill
self._last_refill = now
new_tokens = elapsed * self.config.rate
self._tokens = min(self._tokens + new_tokens, float(self.config.burst))
# Example: 2 requests per second, burst of 5
limiter = TokenBucket(TokenBucketConfig(rate=2.0, burst=5))
# In your scraper loop:
if limiter.acquire():
# Proceed with request
pass
When to use: General-purpose rate limiting. Good for maintaining a consistent request rate with tolerance for short bursts.
Per-Domain Rate Limiting
Different target sites have different tolerances. A global rate limit is too conservative for easy targets and too aggressive for hard ones. Per-domain limiters let you tune rates individually:
from dataclasses import dataclass
from typing import Optional
@dataclass(frozen=True)
class DomainPolicy:
"""Immutable rate policy for a specific domain."""
domain: str
requests_per_second: float
burst: int
min_delay_between_requests_ms: int
max_concurrent: int
notes: str = ""
# Domain-specific policies
DOMAIN_POLICIES = {
"easy-target.com": DomainPolicy(
domain="easy-target.com",
requests_per_second=5.0,
burst=10,
min_delay_between_requests_ms=200,
max_concurrent=10,
notes="No anti-bot, high capacity",
),
"medium-target.com": DomainPolicy(
domain="medium-target.com",
requests_per_second=1.0,
burst=3,
min_delay_between_requests_ms=1000,
max_concurrent=3,
notes="Cloudflare Free, moderate rate limits",
),
"hard-target.com": DomainPolicy(
domain="hard-target.com",
requests_per_second=0.2,
burst=1,
min_delay_between_requests_ms=5000,
max_concurrent=1,
notes="Cloudflare Enterprise, aggressive detection",
),
}
# Default policy for unknown domains
DEFAULT_POLICY = DomainPolicy(
domain="default",
requests_per_second=1.0,
burst=2,
min_delay_between_requests_ms=1000,
max_concurrent=3,
)
class PerDomainRateLimiter:
"""Manages per-domain rate limiters."""
def __init__(self, policies=None, default_policy=None):
self._policies = policies or {}
self._default = default_policy or DEFAULT_POLICY
self._limiters = {}
self._lock = threading.Lock()
def acquire(self, domain, timeout=30.0):
"""Acquire a rate limit token for a specific domain."""
limiter = self._get_limiter(domain)
return limiter.acquire(timeout=timeout)
def _get_limiter(self, domain):
"""Get or create a rate limiter for a domain."""
with self._lock:
if domain not in self._limiters:
policy = self._policies.get(domain, self._default)
config = TokenBucketConfig(
rate=policy.requests_per_second,
burst=policy.burst,
name=domain,
)
self._limiters[domain] = TokenBucket(config)
return self._limiters[domain]
# Usage
rate_limiter = PerDomainRateLimiter(
policies=DOMAIN_POLICIES,
default_policy=DEFAULT_POLICY,
)
# Before each request
domain = "medium-target.com"
if rate_limiter.acquire(domain):
# Proceed with request
pass
Adaptive Rate Control
The most effective approach adjusts the rate dynamically based on response signals from the target:
import time
import math
from dataclasses import dataclass
@dataclass(frozen=True)
class AdaptiveRateSnapshot:
"""Immutable snapshot of adaptive rate state."""
current_rate: float
min_rate: float
max_rate: float
consecutive_successes: int
consecutive_failures: int
last_adjustment: float
block_count_last_minute: int
class AdaptiveRateLimiter:
"""Rate limiter that adjusts based on success/failure signals.
AIMD (Additive Increase, Multiplicative Decrease):
- On success: increase rate linearly (additive)
- On block/failure: decrease rate by half (multiplicative)
This is the same algorithm TCP uses for congestion control.
"""
def __init__(self, initial_rate=1.0, min_rate=0.1, max_rate=10.0):
self._rate = initial_rate
self._min_rate = min_rate
self._max_rate = max_rate
self._consecutive_successes = 0
self._consecutive_failures = 0
self._last_adjustment = time.monotonic()
self._bucket = TokenBucket(TokenBucketConfig(
rate=initial_rate, burst=max(3, int(initial_rate * 2))
))
self._lock = threading.Lock()
def acquire(self, timeout=30.0):
"""Acquire a token at the current adaptive rate."""
return self._bucket.acquire(timeout=timeout)
def record_success(self):
"""Signal a successful request. May increase rate."""
with self._lock:
self._consecutive_successes += 1
self._consecutive_failures = 0
# Additive increase: add 0.1 req/s after 10 consecutive successes
if self._consecutive_successes >= 10:
new_rate = min(self._rate + 0.1, self._max_rate)
if new_rate != self._rate:
self._rate = new_rate
self._update_bucket()
self._consecutive_successes = 0
def record_failure(self, is_rate_limit=False):
"""Signal a failed request. May decrease rate.
Args:
is_rate_limit: True if the failure was specifically a 429 or
rate-limit-related block. Triggers more aggressive
backoff.
"""
with self._lock:
self._consecutive_failures += 1
self._consecutive_successes = 0
if is_rate_limit:
# Multiplicative decrease: halve the rate immediately
new_rate = max(self._rate * 0.5, self._min_rate)
self._rate = new_rate
self._update_bucket()
elif self._consecutive_failures >= 3:
# Non-rate-limit failures: reduce by 25% after 3 in a row
new_rate = max(self._rate * 0.75, self._min_rate)
self._rate = new_rate
self._update_bucket()
self._consecutive_failures = 0
def record_retry_after(self, seconds):
"""Handle a Retry-After header from the target.
Pauses the limiter for the specified duration and
reduces the rate.
"""
with self._lock:
# Reduce rate based on how long we need to wait
if seconds > 60:
self._rate = max(self._rate * 0.25, self._min_rate)
elif seconds > 10:
self._rate = max(self._rate * 0.5, self._min_rate)
else:
self._rate = max(self._rate * 0.75, self._min_rate)
self._update_bucket()
# Sleep for the retry-after duration (capped at 5 min)
time.sleep(min(seconds, 300))
def _update_bucket(self):
"""Recreate the token bucket with the new rate."""
self._bucket = TokenBucket(TokenBucketConfig(
rate=self._rate,
burst=max(3, int(self._rate * 2)),
))
def get_snapshot(self):
"""Get an immutable snapshot of current state."""
with self._lock:
return AdaptiveRateSnapshot(
current_rate=round(self._rate, 2),
min_rate=self._min_rate,
max_rate=self._max_rate,
consecutive_successes=self._consecutive_successes,
consecutive_failures=self._consecutive_failures,
last_adjustment=self._last_adjustment,
block_count_last_minute=0, # Simplified
)
AIMD in practice:
Time 0: Rate = 1.0 req/s (initial)
Success × 10 → Rate = 1.1 req/s
Success × 10 → Rate = 1.2 req/s
Success × 10 → Rate = 1.3 req/s
429 received → Rate = 0.65 req/s (halved)
Success × 10 → Rate = 0.75 req/s
Success × 10 → Rate = 0.85 req/s
...converges toward the target's tolerance threshold
Retry-After Header Handling
The Retry-After HTTP header is the target telling you exactly when to try again. Respecting it is both ethical and effective -- it prevents escalating blocks.
import time
from email.utils import parsedate_to_datetime
def parse_retry_after(header_value):
"""Parse a Retry-After header into seconds to wait.
Retry-After can be either:
- An integer (seconds to wait)
- An HTTP-date (absolute time to retry)
Returns seconds to wait, capped at 300 (5 minutes).
"""
if header_value is None:
return None
try:
# Try as integer (seconds)
seconds = int(header_value)
return min(seconds, 300)
except ValueError:
pass
try:
# Try as HTTP-date
retry_time = parsedate_to_datetime(header_value)
delta = (retry_time - time.time()).total_seconds()
return min(max(delta, 0), 300)
except (ValueError, TypeError):
pass
return None # Unparseable
def handle_rate_limited_response(response, rate_limiter):
"""Handle a rate-limited response with proper backoff.
Returns the number of seconds waited.
"""
if response.status_code != 429:
return 0
retry_after = parse_retry_after(
response.headers.get("Retry-After")
)
if retry_after is not None:
rate_limiter.record_retry_after(retry_after)
return retry_after
# No Retry-After header: use exponential backoff
rate_limiter.record_failure(is_rate_limit=True)
return 0
Monitoring Rate Limit Effectiveness
Track these metrics to tune your rate limiting:
import time
from dataclasses import dataclass, field
from collections import deque
import threading
@dataclass(frozen=True)
class RateLimitMetrics:
"""Immutable rate limiting metrics snapshot."""
domain: str
period_seconds: int
total_requests: int
successful_requests: int
rate_limited_requests: int # 429 responses
blocked_requests: int # 403 and soft blocks
effective_rate: float # Actual req/s achieved
target_rate: float # Configured req/s
utilization: float # effective / target
class RateLimitMonitor:
"""Monitor rate limiting effectiveness per domain."""
def __init__(self, window_seconds=300):
self._window = window_seconds
self._events = {} # domain -> deque of (timestamp, type)
self._lock = threading.Lock()
def record(self, domain, event_type):
"""Record a request event.
event_type: 'success', 'rate_limited', 'blocked', 'error'
"""
now = time.monotonic()
with self._lock:
if domain not in self._events:
self._events[domain] = deque()
self._events[domain].append((now, event_type))
# Prune old events
cutoff = now - self._window
while (
self._events[domain]
and self._events[domain][0][0] < cutoff
):
self._events[domain].popleft()
def get_metrics(self, domain, target_rate):
"""Get metrics for a specific domain."""
with self._lock:
events = list(self._events.get(domain, []))
if not events:
return None
total = len(events)
successes = sum(1 for _, t in events if t == "success")
rate_limited = sum(1 for _, t in events if t == "rate_limited")
blocked = sum(1 for _, t in events if t == "blocked")
time_span = events[-1][0] - events[0][0] if len(events) > 1 else 1
effective_rate = total / time_span if time_span > 0 else 0
return RateLimitMetrics(
domain=domain,
period_seconds=int(time_span),
total_requests=total,
successful_requests=successes,
rate_limited_requests=rate_limited,
blocked_requests=blocked,
effective_rate=round(effective_rate, 2),
target_rate=target_rate,
utilization=round(
effective_rate / target_rate if target_rate > 0 else 0, 2
),
)
Interpreting metrics:
| Metric | Healthy | Warning | Critical |
|---|---|---|---|
| Success rate | > 90% | 80-90% | < 80% |
| Rate limited (429s) | < 2% | 2-5% | > 5% |
| Blocked (403s) | < 5% | 5-10% | > 10% |
| Utilization | 70-90% | < 50% or > 95% | 100% sustained |
For more on rate limiting in context, see our rate limiting glossary entry, rate limit calculator, and performance optimization guide.
Frequently Asked Questions
What is the optimal rate for scraping with rotating proxies?
There is no universal answer. For rotating residential proxies against Cloudflare-protected targets, 1-2 requests per second per domain is a good starting point. For unprotected targets, 5-10 req/s is often sustainable. Use adaptive rate control (AIMD) to find the optimal rate for each target automatically.
Does proxy rotation eliminate the need for rate limiting?
No. Even with a different IP per request, the target site can detect scraping patterns by aggregate request volume, timing patterns, and behavioral signals. Rotating proxies help avoid per-IP rate limits but do not protect against session-level or aggregate detection.
Should I spread requests evenly or use bursts?
Even spacing is generally safer. Burst patterns are easier for detection systems to identify. Use a token bucket with a small burst allowance (2-3x the steady-state rate) for natural variation, but avoid sustained bursts.
How do I handle targets with unknown rate limits?
Start conservatively (0.5 req/s) and use the adaptive rate limiter to ramp up. The AIMD algorithm will find the target's tolerance threshold within a few hundred requests.
Rate limiting is the difference between sustainable data collection and an arms race you lose. Disciplined rates protect your proxy IPs, respect target servers, and maximize your cost efficiency. Hex Proxies provides the proxy layer; you control the rate. Residential proxies from $4.25/GB, ISP proxies from $2.08/IP. Explore plans.