Rate Limiting Strategies When Scraping with Proxies: Balancing Speed and Safety

Rate limiting is the control system that determines how fast you scrape. Too fast, and you burn proxy IPs, get blocked, and overload target servers. Too slow, and you waste proxy bandwidth and fail to collect data within your time window. The goal is maximum sustained throughput without triggering detection or causing harm.

This guide covers rate limiting algorithms, adaptive rate control, per-domain policies, and monitoring patterns -- with production code you can adapt to any scraping pipeline. For the broader context of proxy-based scraping architecture, see our distributed scraping pipeline guide.

Why Rate Limiting Matters

For Success Rates

We tested the same scraping workload against Cloudflare-protected targets at different request rates (source: Hex Proxies internal testing, April 2026, residential rotating proxies, 10,000 requests per rate configuration):

Requests per Second (per domain)	Success Rate	Block Rate	Avg Latency
0.5 (1 every 2s)	96.8%	1.2%	340ms
1.0 (1 per second)	94.2%	3.1%	355ms
2.0 (2 per second)	89.1%	7.4%	420ms
5.0 (5 per second)	76.3%	18.2%	680ms
10.0 (10 per second)	58.4%	32.1%	1,240ms

The inflection point is between 2 and 5 requests per second per domain. Below 2 req/s, success rates are high and stable. Above 5 req/s, blocks accelerate and compound (a blocked IP stays blocked, forcing more load onto remaining IPs).

For Proxy Cost

Rate limiting directly affects cost efficiency:

Cost per successful request = proxy_cost / successful_requests

At 1 req/s with 94% success rate:
  10,000 total requests → 9,400 successful
  Cost ratio: 1.06x (6% waste)

At 10 req/s with 58% success rate:
  10,000 total requests → 5,840 successful
  Cost ratio: 1.71x (71% waste)

Scraping faster can actually cost more per successful data point because the failed requests still consume proxy bandwidth (for residential plans) or contribute to IP reputation degradation (for ISP plans).

Rate Limiting Algorithms

Token Bucket

The token bucket is the standard rate limiting algorithm. A bucket fills with tokens at a fixed rate. Each request consumes one token. If the bucket is empty, the request waits.

import time
import threading
from dataclasses import dataclass


@dataclass(frozen=True)
class TokenBucketConfig:
    """Immutable token bucket configuration."""
    rate: float           # Tokens per second
    burst: int            # Maximum bucket size
    name: str = "default"


class TokenBucket:
    """Thread-safe token bucket rate limiter.

    Allows bursting up to `burst` requests instantly,
    then limits to `rate` requests per second.
    """

    def __init__(self, config):
        self.config = config
        self._tokens = float(config.burst)
        self._last_refill = time.monotonic()
        self._lock = threading.Lock()

    def acquire(self, timeout=30.0):
        """Acquire a token, blocking until one is available.

        Args:
            timeout: Maximum seconds to wait for a token.

        Returns:
            True if token acquired, False if timed out.
        """
        deadline = time.monotonic() + timeout

        while True:
            with self._lock:
                self._refill()

                if self._tokens >= 1.0:
                    self._tokens -= 1.0
                    return True

                # Calculate wait time for next token
                wait = (1.0 - self._tokens) / self.config.rate

            if time.monotonic() + wait > deadline:
                return False  # Would exceed timeout

            time.sleep(min(wait, 0.1))  # Sleep in small increments

    def _refill(self):
        """Add tokens based on elapsed time."""
        now = time.monotonic()
        elapsed = now - self._last_refill
        self._last_refill = now

        new_tokens = elapsed * self.config.rate
        self._tokens = min(self._tokens + new_tokens, float(self.config.burst))


# Example: 2 requests per second, burst of 5
limiter = TokenBucket(TokenBucketConfig(rate=2.0, burst=5))

# In your scraper loop:
if limiter.acquire():
    # Proceed with request
    pass

When to use: General-purpose rate limiting. Good for maintaining a consistent request rate with tolerance for short bursts.

Per-Domain Rate Limiting

Different target sites have different tolerances. A global rate limit is too conservative for easy targets and too aggressive for hard ones. Per-domain limiters let you tune rates individually:

from dataclasses import dataclass
from typing import Optional


@dataclass(frozen=True)
class DomainPolicy:
    """Immutable rate policy for a specific domain."""
    domain: str
    requests_per_second: float
    burst: int
    min_delay_between_requests_ms: int
    max_concurrent: int
    notes: str = ""


# Domain-specific policies
DOMAIN_POLICIES = {
    "easy-target.com": DomainPolicy(
        domain="easy-target.com",
        requests_per_second=5.0,
        burst=10,
        min_delay_between_requests_ms=200,
        max_concurrent=10,
        notes="No anti-bot, high capacity",
    ),
    "medium-target.com": DomainPolicy(
        domain="medium-target.com",
        requests_per_second=1.0,
        burst=3,
        min_delay_between_requests_ms=1000,
        max_concurrent=3,
        notes="Cloudflare Free, moderate rate limits",
    ),
    "hard-target.com": DomainPolicy(
        domain="hard-target.com",
        requests_per_second=0.2,
        burst=1,
        min_delay_between_requests_ms=5000,
        max_concurrent=1,
        notes="Cloudflare Enterprise, aggressive detection",
    ),
}

# Default policy for unknown domains
DEFAULT_POLICY = DomainPolicy(
    domain="default",
    requests_per_second=1.0,
    burst=2,
    min_delay_between_requests_ms=1000,
    max_concurrent=3,
)


class PerDomainRateLimiter:
    """Manages per-domain rate limiters."""

    def __init__(self, policies=None, default_policy=None):
        self._policies = policies or {}
        self._default = default_policy or DEFAULT_POLICY
        self._limiters = {}
        self._lock = threading.Lock()

    def acquire(self, domain, timeout=30.0):
        """Acquire a rate limit token for a specific domain."""
        limiter = self._get_limiter(domain)
        return limiter.acquire(timeout=timeout)

    def _get_limiter(self, domain):
        """Get or create a rate limiter for a domain."""
        with self._lock:
            if domain not in self._limiters:
                policy = self._policies.get(domain, self._default)
                config = TokenBucketConfig(
                    rate=policy.requests_per_second,
                    burst=policy.burst,
                    name=domain,
                )
                self._limiters[domain] = TokenBucket(config)
            return self._limiters[domain]


# Usage
rate_limiter = PerDomainRateLimiter(
    policies=DOMAIN_POLICIES,
    default_policy=DEFAULT_POLICY,
)

# Before each request
domain = "medium-target.com"
if rate_limiter.acquire(domain):
    # Proceed with request
    pass

Adaptive Rate Control

The most effective approach adjusts the rate dynamically based on response signals from the target:

import time
import math
from dataclasses import dataclass


@dataclass(frozen=True)
class AdaptiveRateSnapshot:
    """Immutable snapshot of adaptive rate state."""
    current_rate: float
    min_rate: float
    max_rate: float
    consecutive_successes: int
    consecutive_failures: int
    last_adjustment: float
    block_count_last_minute: int


class AdaptiveRateLimiter:
    """Rate limiter that adjusts based on success/failure signals.

    AIMD (Additive Increase, Multiplicative Decrease):
    - On success: increase rate linearly (additive)
    - On block/failure: decrease rate by half (multiplicative)

    This is the same algorithm TCP uses for congestion control.
    """

    def __init__(self, initial_rate=1.0, min_rate=0.1, max_rate=10.0):
        self._rate = initial_rate
        self._min_rate = min_rate
        self._max_rate = max_rate
        self._consecutive_successes = 0
        self._consecutive_failures = 0
        self._last_adjustment = time.monotonic()
        self._bucket = TokenBucket(TokenBucketConfig(
            rate=initial_rate, burst=max(3, int(initial_rate * 2))
        ))
        self._lock = threading.Lock()

    def acquire(self, timeout=30.0):
        """Acquire a token at the current adaptive rate."""
        return self._bucket.acquire(timeout=timeout)

    def record_success(self):
        """Signal a successful request. May increase rate."""
        with self._lock:
            self._consecutive_successes += 1
            self._consecutive_failures = 0

            # Additive increase: add 0.1 req/s after 10 consecutive successes
            if self._consecutive_successes >= 10:
                new_rate = min(self._rate + 0.1, self._max_rate)
                if new_rate != self._rate:
                    self._rate = new_rate
                    self._update_bucket()
                self._consecutive_successes = 0

    def record_failure(self, is_rate_limit=False):
        """Signal a failed request. May decrease rate.

        Args:
            is_rate_limit: True if the failure was specifically a 429 or
                          rate-limit-related block. Triggers more aggressive
                          backoff.
        """
        with self._lock:
            self._consecutive_failures += 1
            self._consecutive_successes = 0

            if is_rate_limit:
                # Multiplicative decrease: halve the rate immediately
                new_rate = max(self._rate * 0.5, self._min_rate)
                self._rate = new_rate
                self._update_bucket()
            elif self._consecutive_failures >= 3:
                # Non-rate-limit failures: reduce by 25% after 3 in a row
                new_rate = max(self._rate * 0.75, self._min_rate)
                self._rate = new_rate
                self._update_bucket()
                self._consecutive_failures = 0

    def record_retry_after(self, seconds):
        """Handle a Retry-After header from the target.

        Pauses the limiter for the specified duration and
        reduces the rate.
        """
        with self._lock:
            # Reduce rate based on how long we need to wait
            if seconds > 60:
                self._rate = max(self._rate * 0.25, self._min_rate)
            elif seconds > 10:
                self._rate = max(self._rate * 0.5, self._min_rate)
            else:
                self._rate = max(self._rate * 0.75, self._min_rate)

            self._update_bucket()
            # Sleep for the retry-after duration (capped at 5 min)
            time.sleep(min(seconds, 300))

    def _update_bucket(self):
        """Recreate the token bucket with the new rate."""
        self._bucket = TokenBucket(TokenBucketConfig(
            rate=self._rate,
            burst=max(3, int(self._rate * 2)),
        ))

    def get_snapshot(self):
        """Get an immutable snapshot of current state."""
        with self._lock:
            return AdaptiveRateSnapshot(
                current_rate=round(self._rate, 2),
                min_rate=self._min_rate,
                max_rate=self._max_rate,
                consecutive_successes=self._consecutive_successes,
                consecutive_failures=self._consecutive_failures,
                last_adjustment=self._last_adjustment,
                block_count_last_minute=0,  # Simplified
            )

AIMD in practice:

Time 0:   Rate = 1.0 req/s (initial)
Success × 10 → Rate = 1.1 req/s
Success × 10 → Rate = 1.2 req/s
Success × 10 → Rate = 1.3 req/s
429 received → Rate = 0.65 req/s (halved)
Success × 10 → Rate = 0.75 req/s
Success × 10 → Rate = 0.85 req/s
...converges toward the target's tolerance threshold

Retry-After Header Handling

The Retry-After HTTP header is the target telling you exactly when to try again. Respecting it is both ethical and effective -- it prevents escalating blocks.

import time
from email.utils import parsedate_to_datetime


def parse_retry_after(header_value):
    """Parse a Retry-After header into seconds to wait.

    Retry-After can be either:
    - An integer (seconds to wait)
    - An HTTP-date (absolute time to retry)

    Returns seconds to wait, capped at 300 (5 minutes).
    """
    if header_value is None:
        return None

    try:
        # Try as integer (seconds)
        seconds = int(header_value)
        return min(seconds, 300)
    except ValueError:
        pass

    try:
        # Try as HTTP-date
        retry_time = parsedate_to_datetime(header_value)
        delta = (retry_time - time.time()).total_seconds()
        return min(max(delta, 0), 300)
    except (ValueError, TypeError):
        pass

    return None  # Unparseable


def handle_rate_limited_response(response, rate_limiter):
    """Handle a rate-limited response with proper backoff.

    Returns the number of seconds waited.
    """
    if response.status_code != 429:
        return 0

    retry_after = parse_retry_after(
        response.headers.get("Retry-After")
    )

    if retry_after is not None:
        rate_limiter.record_retry_after(retry_after)
        return retry_after

    # No Retry-After header: use exponential backoff
    rate_limiter.record_failure(is_rate_limit=True)
    return 0

Monitoring Rate Limit Effectiveness

Track these metrics to tune your rate limiting:

import time
from dataclasses import dataclass, field
from collections import deque
import threading


@dataclass(frozen=True)
class RateLimitMetrics:
    """Immutable rate limiting metrics snapshot."""
    domain: str
    period_seconds: int
    total_requests: int
    successful_requests: int
    rate_limited_requests: int  # 429 responses
    blocked_requests: int  # 403 and soft blocks
    effective_rate: float  # Actual req/s achieved
    target_rate: float  # Configured req/s
    utilization: float  # effective / target


class RateLimitMonitor:
    """Monitor rate limiting effectiveness per domain."""

    def __init__(self, window_seconds=300):
        self._window = window_seconds
        self._events = {}  # domain -> deque of (timestamp, type)
        self._lock = threading.Lock()

    def record(self, domain, event_type):
        """Record a request event.

        event_type: 'success', 'rate_limited', 'blocked', 'error'
        """
        now = time.monotonic()
        with self._lock:
            if domain not in self._events:
                self._events[domain] = deque()

            self._events[domain].append((now, event_type))

            # Prune old events
            cutoff = now - self._window
            while (
                self._events[domain]
                and self._events[domain][0][0] < cutoff
            ):
                self._events[domain].popleft()

    def get_metrics(self, domain, target_rate):
        """Get metrics for a specific domain."""
        with self._lock:
            events = list(self._events.get(domain, []))

        if not events:
            return None

        total = len(events)
        successes = sum(1 for _, t in events if t == "success")
        rate_limited = sum(1 for _, t in events if t == "rate_limited")
        blocked = sum(1 for _, t in events if t == "blocked")

        time_span = events[-1][0] - events[0][0] if len(events) > 1 else 1
        effective_rate = total / time_span if time_span > 0 else 0

        return RateLimitMetrics(
            domain=domain,
            period_seconds=int(time_span),
            total_requests=total,
            successful_requests=successes,
            rate_limited_requests=rate_limited,
            blocked_requests=blocked,
            effective_rate=round(effective_rate, 2),
            target_rate=target_rate,
            utilization=round(
                effective_rate / target_rate if target_rate > 0 else 0, 2
            ),
        )

Interpreting metrics:

Metric	Healthy	Warning	Critical
Success rate	> 90%	80-90%	< 80%
Rate limited (429s)	< 2%	2-5%	> 5%
Blocked (403s)	< 5%	5-10%	> 10%
Utilization	70-90%	< 50% or > 95%	100% sustained

If rate-limited requests exceed 5%, your rate is too high -- reduce it. If utilization is below 50%, you have capacity to increase the rate. The optimal operating point is high utilization with low block rate.

For more on rate limiting in context, see our rate limiting glossary entry, rate limit calculator, and performance optimization guide.

Frequently Asked Questions

What is the optimal rate for scraping with rotating proxies?

There is no universal answer. For rotating residential proxies against Cloudflare-protected targets, 1-2 requests per second per domain is a good starting point. For unprotected targets, 5-10 req/s is often sustainable. Use adaptive rate control (AIMD) to find the optimal rate for each target automatically.

Does proxy rotation eliminate the need for rate limiting?

No. Even with a different IP per request, the target site can detect scraping patterns by aggregate request volume, timing patterns, and behavioral signals. Rotating proxies help avoid per-IP rate limits but do not protect against session-level or aggregate detection.

Should I spread requests evenly or use bursts?

Even spacing is generally safer. Burst patterns are easier for detection systems to identify. Use a token bucket with a small burst allowance (2-3x the steady-state rate) for natural variation, but avoid sustained bursts.

How do I handle targets with unknown rate limits?

Start conservatively (0.5 req/s) and use the adaptive rate limiter to ramp up. The AIMD algorithm will find the target's tolerance threshold within a few hundred requests.

Rate limiting is the difference between sustainable data collection and an arms race you lose. Disciplined rates protect your proxy IPs, respect target servers, and maximize your cost efficiency. Hex Proxies provides the proxy layer; you control the rate. Residential proxies from $4.25/GB, ISP proxies from $2.08/IP. Explore plans.

Rate Limiting Strategies When Scraping with Proxies: Balancing Speed and Safety

Why Rate Limiting Matters

For Success Rates

For Proxy Cost

Rate Limiting Algorithms

Token Bucket

Per-Domain Rate Limiting

Adaptive Rate Control

Retry-After Header Handling

Monitoring Rate Limit Effectiveness

Frequently Asked Questions

Related Resources

Residential Proxies

Proxies for Web Scraping

Best Proxies for Web Scraping in 2026

How Many Proxies Do I Need? Sizing Guide by Use Case