Proxy Latency Percentiles: Why p99 Is the Number That Matters
A proxy provider advertising "50 ms average latency" is telling you almost nothing useful about what your scraper will experience in production. Averages hide tail behavior, and tail behavior is where proxy infrastructure fails. A pool that averages 50 ms but has a p99 of 3,200 ms will produce retry storms and timeout cascades that an average-focused buyer would never predict. This post covers how to measure proxy latency properly, what the percentiles actually look like across ISP, residential, and datacenter proxies in real testing, and why engineering teams building production data pipelines should be planning against p99 rather than p50.
Percentiles in One Paragraph
Given a sorted list of measurements, the p50 (median) is the value at the 50th percentile: half of measurements are faster, half are slower. The p95 is the value below which 95 percent of measurements fall. The p99 is the value below which 99 percent fall. The p99.9 is the 99.9th percentile. Each additional nine captures an order of magnitude fewer events but a disproportionate share of the pain, because those events are the ones that time out, retry, and cascade.
The relationship between percentiles depends on the distribution. A perfectly Gaussian process has p99 roughly 2.3 standard deviations above the mean. A realistic Internet latency distribution is heavy-tailed: p99 is often 10x to 100x the median, not 2.3 standard deviations. This is why "average latency" is actively misleading.
Where Proxy Latency Comes From
When you send a request through a proxy, the total latency is the sum of several components:
- Client to proxy RTT. Your network to the proxy entry point. Usually 5-50 ms depending on geography and whether the provider operates PoPs near you.
- Proxy authentication and session setup. For sticky sessions, negligible after the first request. For per-request authentication, 1-5 ms of overhead at the proxy.
- Proxy to exit node (for multi-hop architectures). If your residential provider routes through a central gateway before exiting, add 20-80 ms.
- Exit node to target RTT. This is the part you care about. The exit node's network path to the target website.
- TLS handshake. For a new TLS connection, 1 round trip for TLS 1.3 or 2 round trips for TLS 1.2. Each round trip is one exit-to-target RTT.
- Target server processing. The target's own response time. Out of your control but part of what you measure.
A well-designed benchmark isolates these components so you can see where time is going. A lazy benchmark reports only the total and leaves you guessing.
What the Numbers Look Like
Hex Proxies internal testing in April 2026 ran 100,000 requests per configuration against a simple HTTP endpoint hosted in AWS us-east-1, from client infrastructure in Frankfurt. The endpoint returned a 1 KB JSON response. TLS 1.3, HTTP/1.1, persistent connection reused across requests to isolate transport from handshake. Results:
| Proxy Type | p50 | p95 | p99 | p99.9 |
|---|---|---|---|---|
| Datacenter (Frankfurt exit) | 92 ms | 108 ms | 141 ms | 290 ms |
| ISP (Ashburn exit) | 98 ms | 115 ms | 168 ms | 412 ms |
| Residential (US rotating) | 186 ms | 480 ms | 1,240 ms | 4,800 ms |
| Residential sticky (US) | 168 ms | 312 ms | 720 ms | 2,100 ms |
Three things jump out. First, datacenter and ISP proxies have tight distributions: the p99 is less than 2x the p50. Second, residential proxies have a long tail: the p99 is 6.7x the p50 for the rotating pool. Third, sticky sessions cut the residential tail roughly in half because they avoid per-request IP selection and the associated peer-availability variance.
The Histogram, Not Just the Numbers
A histogram of the residential rotating distribution looks approximately like this: a peak around 170 ms accounting for 55 percent of requests, a long plateau from 250 to 600 ms accounting for another 35 percent, and a spread-out tail from 600 ms to 5 seconds accounting for the remaining 10 percent. The tail is where real-peer variance lives: some exits are on 4G with poor signal, some are momentarily congested, some are geographically distant from the target even though they are marketed as "US."
The ISP distribution is bimodal in a different way: a sharp peak at 95 to 105 ms accounting for 90 percent of requests, and a small secondary bump around 150 ms corresponding to connections that got a slightly slower transit path. Almost nothing past 250 ms.
Why p99 Breaks Your Pipeline
Consider a scraper that processes 10 requests per second and has a 2-second per-request timeout. At 10 rps, you are seeing one p99-tier request every 10 seconds on average, which is six per minute. With residential rotating at a p99 of 1,240 ms, those p99 requests complete within the timeout. Fine.
Now scale to 100 rps. You are seeing one p99 request per second. The p99.9 is the one per 10 seconds. At 4.8 seconds for p99.9, those requests exceed your 2-second timeout and trigger retries. Retries add load to an already-stressed exit pool, which degrades latency further, pushing more requests into the tail, triggering more retries. This is retry amplification, and it is how a healthy-looking proxy pool collapses under load.
The correct response is to size timeouts against p99 plus a margin, not against an average. If your p99 is 1,240 ms, your timeout should be at least 2,500 ms, and your retry budget should be small (one retry max) so that a spike cannot cascade.
How to Measure Your Own
A minimal latency benchmark in Python using httpx:
import asyncio
import time
import httpx
import numpy as np
async def one(client, url):
t0 = time.monotonic()
r = await client.get(url)
return (time.monotonic() - t0) * 1000, r.status_code
async def main():
async with httpx.AsyncClient(
proxies='http://user:pass@proxy.example:8080',
timeout=10.0,
) as c:
results = await asyncio.gather(*[one(c, 'https://httpbin.org/ip') for _ in range(1000)])
latencies = [r[0] for r in results if r[1] == 200]
print(f'n={len(latencies)}')
print(f'p50={np.percentile(latencies, 50):.0f}ms')
print(f'p95={np.percentile(latencies, 95):.0f}ms')
print(f'p99={np.percentile(latencies, 99):.0f}ms')
print(f'p99.9={np.percentile(latencies, 99.9):.0f}ms')
asyncio.run(main())
Run this against your current provider. A run of 1,000 requests is enough for a rough p99 estimate but too few for p99.9. For p99.9 you need at least 10,000 samples, preferably 100,000. Do not average a single run with another. Instead, pool the raw measurements and compute percentiles on the combined dataset.
HdrHistogram and Coordinated Omission
A subtle but important measurement bug: if your benchmark fires requests at a fixed rate but a slow request delays the next fire, you have "coordinated omission." You are only measuring latency when the system is responding, not when it is stalled. Gil Tene's HdrHistogram library addresses this by tracking both the latency of the response and the delay between intended and actual send time. For serious proxy benchmarking, use wrk2 or an HdrHistogram-based tool rather than a naive loop.
What to Ask Providers
When evaluating a proxy provider for a latency-sensitive workload:
- What is the p99 from your PoP nearest to my client to a target in [region X]?
- How is that measured, with what sample size, and over what time window?
- Do you publish real-time latency dashboards or historical SLA data?
- Under load of N rps per session, how does p99 change?
Providers that answer these questions with specific numbers are measuring what matters. Providers that answer with "our average is 50 ms" are telling you what sounds good. For price monitoring pipelines and other latency-sensitive workloads, the difference determines whether your scraper runs clean or drowns in retries.
Conclusion
Averages describe the middle of a distribution. Production infrastructure fails at the tails. If you are buying proxy capacity for any workload with nontrivial throughput, measure p99 and p99.9 against your actual targets before committing, and size your timeouts and retry budgets against those tails rather than the advertised mean. The number that matters is not the one your provider puts on the landing page.