How DDoS Mitigation Systems Distinguish Legitimate Proxy Traffic from Attacks
Cloudflare, Akamai, Imperva, and Fastly process a combined fraction of global HTTP traffic that is difficult to estimate but certainly exceeds 25%. Every proxy-based workload that reaches a meaningful website hits one of these mitigation layers before it reaches the origin. Understanding how they classify traffic is the difference between a 95% success rate and a 30% success rate on the same target.
This article walks through the classification pipeline shared by most major edge platforms, the signals that cause proxy traffic to be treated as an attack, and what operators can do to keep their traffic in the "legitimate" bucket.
The Five-Stage Classification Pipeline
The vendors differ in detail, but every modern edge mitigation system processes inbound requests through approximately the same five stages. A request that fails any stage is either blocked, challenged, or rate-throttled.
1. Network layer: volumetric and L3/L4 filtering
Before HTTP even enters the picture, traffic passes through L3/L4 DDoS protection that drops obvious volumetric attacks (UDP floods, SYN floods, amplification reflections) and applies IP reputation scoring. This stage rarely affects proxy traffic because proxies by their nature generate well-formed TCP connections. A residential proxy at 40 requests per second from a single IP is not remotely close to the thresholds that trigger L3/L4 mitigation.
IP reputation scoring is the exception. Cloudflare's Threat Score, Akamai's Client Reputation, and Imperva's IP Reputation database track IPs that have previously sourced malicious traffic across all protected sites. A residential IP that was used by another operator for credential stuffing last month will carry that reputation when your request arrives.
2. TLS fingerprinting: JA3, JA4, and JA4H
Every TLS client produces a distinctive fingerprint from the ClientHello message: the TLS version, cipher suites offered, extensions, elliptic curves, and EC point formats. Salesforce's JA3 hashing standard, published in 2017, became the industry baseline. JA4, released in 2023, is the current standard used by most modern mitigation systems and is more resistant to minor-version drift.
The classification logic is simple: a request claiming to be Chrome 131 on Windows should produce a JA4 fingerprint that matches the actual Chrome 131 ClientHello. A request with a Chrome user-agent and a Python requests library fingerprint is flagged instantly. JA4H extends this to HTTP/2 and HTTP/3 by including frame settings and SETTINGS priority, which Python requests does not send at all.
This is the stage where most poorly-configured scrapers fail. Libraries like requests, httpx, aiohttp, and urllib produce fingerprints that no real browser has ever produced. Patched forks (curl_cffi, tls-client, pyCurlImpersonate) produce real-browser fingerprints but require specific configuration to stay current.
3. HTTP/2 and HTTP/3 frame analysis
HTTP/2 introduced binary framing, and the order and content of SETTINGS, WINDOW_UPDATE, and HEADERS frames vary across real browsers. Akamai's research on HTTP/2 fingerprinting (published 2020) identified that roughly 40 distinct HTTP/2 fingerprint categories cover real-world browsers, while Python libraries generate fingerprints that do not match any of them. Cloudflare publicly documented its HTTP/2 fingerprinting at the DEF CON 29 bot talk in 2021.
If a client's TLS fingerprint says "Chrome 131" but its HTTP/2 frame sequence is "generic h2 library," the classification system sees an inconsistency and downgrades the trust score. Trust score adjustments compound across stages.
4. Behavioral analysis at the session level
Once a session is established, the edge platform scores behavior: request rate, path traversal patterns, Referer header consistency, cookie retention, and the timing distribution between requests. The signals that indicate automation include:
- Request inter-arrival time: Humans produce lognormal distributions with most page views spaced by 2 to 30 seconds. Scrapers often produce uniform intervals or no delay at all.
- Path patterns: Humans follow links from one page to another. Scrapers often hit URLs by pattern enumeration.
- Cookie and session handling: Browsers retain and send cookies across requests. Clients that drop cookies between requests are classified as automation.
- Missing sub-resources: A page load includes the HTML plus typically 30 to 100 sub-resources (CSS, JS, images, fonts). A client that requests only the HTML and never fetches sub-resources is flagged.
5. Client-side attestation
The highest-trust tier uses JavaScript or WebAssembly challenges that execute in the browser and submit a signed proof of execution. Cloudflare Turnstile, Akamai Bot Manager Standard, and hCaptcha Enterprise all operate at this level. The challenge typically measures execution fingerprints (Canvas, AudioContext, WebGL parameters) that are difficult to fake outside a real browser engine. Traffic that fails the attestation does not reach the origin regardless of the other signals.
Where Proxy Traffic Ranks
A proxy is a network-layer tool. It affects stage 1 (source IP reputation) and nothing else directly. The choice of client library determines stages 2, 3, and 4. Stage 5 is determined by the automation framework (real browser via Playwright/Puppeteer versus HTTP library).
This is the single most important insight for scraping operators: the proxy does not make your client look legitimate. A premium residential proxy with a consumer ISP trust score routing traffic from a Python requests client still produces a Python requests JA4 fingerprint, which is instantly flagged. The operator who spends $15 per GB on residential while running requests is getting less benefit than an operator who spends $1 per GB on datacenter while running curl_cffi with proper browser impersonation.
Rate-Limiting Thresholds in Practice
Every major edge platform applies both static and adaptive rate limits. Published defaults include:
- Cloudflare: Default rate limit rules are customer-configurable; the default Managed Ruleset applies a sensitivity level that flags sustained rates above approximately 10-30 requests per minute per IP on standard plans, lower on protected domains. The Bot Fight Mode tightens this considerably.
- Akamai Bot Manager: Thresholds are policy-driven per customer. The default managed policy for "aggressive bot" classification triggers at approximately 100 requests per minute sustained, but site operators often tighten this to 20 or lower on login and checkout endpoints.
- Imperva (formerly Incapsula): Applies per-IP request-per-second thresholds that vary by URL path. Login, checkout, and API endpoints are typically limited to 1-3 requests per second per IP.
These numbers matter for proxy pool sizing. If a target applies a 30-request-per-minute threshold and you need to issue 10,000 requests per minute, you need at least 334 rotating IPs to stay under the threshold on average, and closer to 500-600 to account for reputation warming and burst tolerance.
Why Whitelisting Matters
Most mid- and large-sized platforms maintain a whitelist of known legitimate automation partners: payment processors, search engine crawlers, affiliate networks, monitoring services, and enterprise data partners. Whitelisted traffic bypasses most of the classification pipeline. A proxy provider that has relationships with major platforms and can route traffic through whitelisted agreements is operationally different from one that does not.
This is rare and expensive. Most operators will not get a whitelist arrangement directly; they rely on sources that are indistinguishable from ordinary consumer traffic, which is why residential trust scores are valuable.
What Operators Can Actually Do
A short list of high-leverage configuration changes for proxy-based scraping against mitigation platforms:
- Use a client library with real browser TLS and HTTP/2 fingerprints. curl_cffi, tls-client, or Playwright with browser impersonation. Drop plain Python requests for any protected target.
- Match TLS and user-agent. A Chrome 131 user-agent with a Chrome 131 JA4 fingerprint is consistent; a mismatch is a trust-score hit.
- Retain cookies across the session. Do not discard the session after each request.
- Fetch sub-resources selectively. If the target is well-protected, use a real browser and fetch CSS/JS to avoid the "no sub-resources" signal.
- Rate limit below the target's threshold. Aggregate rate across a pool is not the same as per-IP rate. Measure and configure per-IP.
- Rotate on signal, not on a timer. Rotate the IP when a trust-score degradation is detected (403, CAPTCHA, or slowdown), not at a fixed interval that may leak pattern.
- Pair residential trust with residential-quality clients. Paying for residential IPs while running a datacenter-quality client wastes the premium.
- Track the reputation of your IPs over time. A provider that shares IPs across many customers carries the reputation of their worst-behaved user.
Where Hex Proxies Sits in This Stack
Hex ISP IPs are owned, announced from our own ASN, and subleased through ISP arrangements that give them consumer trust scores at the L3/L4 reputation layer. The residential pool (whitelabel, sourced from compliant SDK networks) carries the trust score advantage of consumer endpoints. Neither product affects your TLS fingerprint, your HTTP/2 framing, or your behavioral signals; the client-side configuration is the operator's responsibility, and we publish integration guides in the docs section for Playwright, curl_cffi, and tls-client.
Further Reading
- John Althouse, "JA3 and JA3S," Salesforce Engineering blog, 2017.
- John Althouse, "JA4+ Network Fingerprinting," FoxIO, 2023.
- Akamai, "HTTP/2 Fingerprinting" research paper, 2020.
- Cloudflare, DEF CON 29 talk on bot detection, 2021.
- Imperva, Threat Research Bot Report, 2024.
- OWASP Automated Threats to Web Applications (OAT), 2023 edition.