Proxy Benchmark Methodology: How We Test Speed, Uptime, and Success Rate
Most proxy benchmark reports are marketing material disguised as data. They cherry-pick favorable metrics, test under unrealistic conditions, and omit methodology entirely. Without knowing how a benchmark was conducted, the numbers are meaningless.
This post documents the exact methodology Hex Proxies uses for all published speed tests and benchmark reports. Every test on our benchmark methodology page follows this protocol. We publish it so you can evaluate our data critically and replicate the approach when testing any provider.
Why Proxy Benchmarks Fail
Before describing what we do, it helps to understand what most benchmarks get wrong.
Common Methodological Failures
Single-region testing. Many benchmarks test from one location (usually US East) to one target. This tells you nothing about global performance. A provider can have excellent US infrastructure and 500ms+ latency to Asia.
Insufficient sample size. Testing 100 requests and reporting the average is statistically meaningless. Network performance follows heavy-tailed distributions -- you need thousands of samples to capture P95 and P99 latency accurately.
No target diversity. Testing against httpbin.org or a provider's own endpoint does not reflect real-world conditions. Real targets have varying response times, rate limits, and anti-bot systems that affect success rates.
Missing time dimension. A snapshot benchmark taken at 2 AM UTC tells you nothing about performance during peak hours. Network congestion, target server load, and proxy pool utilization all vary by time of day.
Ignoring connection failures. Many benchmarks report "average response time" but exclude failed requests from the calculation. If 15% of requests fail, the average of the successful 85% is misleading.
Our Test Infrastructure
Test Nodes
We run benchmark agents from eight geographic locations:
| Node Location | Cloud Provider | Purpose |
|---|---|---|
| US East (Virginia) | AWS | Primary US target proximity |
| US West (Oregon) | AWS | Cross-region US latency |
| EU West (Frankfurt) | AWS | European performance |
| EU South (London) | AWS | UK-specific targets |
| Asia East (Tokyo) | AWS | East Asia performance |
| Asia South (Singapore) | AWS | Southeast Asia + Oceania |
| South America (Sao Paulo) | AWS | LATAM performance |
| Middle East (Bahrain) | AWS | MENA region coverage |
c5.xlarge instances (4 vCPU, 8 GB RAM) with enhanced networking enabled. We use dedicated instances -- never burstable -- to eliminate compute variability from the results.
Target Sites
We test against a rotating panel of 50 target URLs across five categories:
| Category | Example Targets | Why Included |
|---|---|---|
| Static content | Major CDN-served pages | Baseline latency (minimal server processing) |
| Dynamic web apps | E-commerce product pages, search results | Realistic web scraping targets |
| API endpoints | Public REST APIs with rate limits | API-scraping use case |
| Anti-bot protected | Sites using Cloudflare, Akamai, PerimeterX | Success rate under protection |
| Regional content | Country-specific news sites, local marketplaces | Geo-targeting accuracy |
Test Parameters
Each benchmark run uses these parameters:
# benchmark-config.yaml
test_parameters:
requests_per_target: 500
targets_per_category: 10
total_requests_per_run: 25,000 # 500 * 50 targets
concurrency: 50 # parallel connections
timeout: 30s # per-request timeout
retry: 0 # no retries (measure raw performance)
protocol: HTTPS # CONNECT method
rotation: per_request # new IP every request
session_test_duration: 5min # for sticky session benchmarks
schedule:
frequency: every_6_hours # 4 runs per day
duration: 7_days # per benchmark period
total_runs: 28 # per provider per period
total_requests: 700,000 # per provider per period
This produces 700,000 data points per provider per benchmark period -- enough to calculate statistically significant percentile metrics.
Measurement Methodology
Latency Measurement
We measure four distinct latency components for every request:
┌──────────────────────────────────────────────────────────────┐
│ Total Request Time │
│ │
│ ┌─────────┐ ┌─────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ DNS │ │ TCP │ │ TLS │ │ HTTP │ │
│ │ Resolve │→ │ Connect │→ │ Handshake│→ │ Transfer │ │
│ │ │ │ │ │ │ │ (TTFB+body) │ │
│ └─────────┘ └─────────┘ └──────────┘ └──────────────┘ │
│ │
│ t_dns t_connect t_tls t_transfer │
└──────────────────────────────────────────────────────────────┘
DNS resolution time (t_dns): Time to resolve the proxy endpoint hostname. We cache DNS after the first resolution to isolate proxy performance from DNS infrastructure.
TCP connect time (t_connect): Time from SYN to SYN-ACK with the proxy server. This measures network latency to the proxy.
TLS handshake time (t_tls): Time to complete the TLS 1.3 handshake with the proxy (for HTTPS-to-proxy connections) or the CONNECT + target TLS handshake.
HTTP transfer time (t_transfer): Time from sending the HTTP request to receiving the complete response body. This includes the proxy's internal routing to the target, the target's processing time, and the response relay.
We report:
- Time to First Byte (TTFB):
t_dns + t_connect + t_tls + time_to_first_response_byte - Total Response Time: The complete request-response cycle including body download
- Proxy Overhead: Measured by comparing direct-to-target requests with proxied requests to the same target from the same node
Success Rate Calculation
A request is classified into one of four outcomes:
| Outcome | Definition | Counted As |
|---|---|---|
| Success | HTTP 200 with expected content | Success |
| Soft block | HTTP 200 but CAPTCHA/challenge page | Failure |
| Hard block | HTTP 403, 429, or connection refused | Failure |
| Timeout | No response within 30 seconds | Failure |
def classify_response(response, expected_min_size=1024):
"""Classify a proxy response into success or failure category.
Args:
response: HTTP response object with status_code, text, elapsed
expected_min_size: Minimum expected response body size in bytes
Returns:
Tuple of (outcome: str, details: dict)
"""
if response is None:
return ("timeout", {"reason": "no_response"})
status = response.status_code
body = response.text
body_size = len(body.encode('utf-8'))
# Hard blocks
if status in (403, 429, 503):
return ("hard_block", {"status": status})
if status == 407:
return ("auth_failure", {"status": status})
# Check for soft blocks in 200 responses
if status == 200:
block_indicators = [
"captcha",
"cf-challenge",
"challenge-platform",
"managed-challenge",
"px-captcha",
"distil_r_captcha",
"arkoselabs.com",
"recaptcha/api",
"hcaptcha.com",
]
body_lower = body.lower()
for indicator in block_indicators:
if indicator in body_lower:
return ("soft_block", {
"indicator": indicator,
"body_size": body_size,
})
# Suspiciously small response
if body_size < expected_min_size:
return ("suspect_small", {
"body_size": body_size,
"expected_min": expected_min_size,
})
return ("success", {
"body_size": body_size,
"latency_ms": response.elapsed.total_seconds() * 1000,
})
return ("other_failure", {"status": status})
Success rate formula:
success_rate = successful_requests / total_requests * 100
We do not exclude timeouts or errors from the denominator. If you sent 10,000 requests and 1,500 timed out, your success rate is 85%, not "100% of completed requests."
Uptime Measurement
We distinguish between two uptime metrics:
Gateway uptime: Can we establish a TCP connection to the proxy endpoint? We ping the proxy gateway every 60 seconds from all eight test nodes. If any node cannot connect for two consecutive checks (2 minutes), we record a downtime event.
Effective uptime: Of the requests that reached the proxy, what percentage received a response (regardless of success or failure)? This measures whether the proxy is functional, not whether it is unblocked.
gateway_uptime = (1 - total_downtime_minutes / total_monitored_minutes) * 100
effective_uptime = requests_with_any_response / total_requests_sent * 100
A provider can have 99.99% gateway uptime but 95% effective uptime if 5% of requests hang indefinitely within the proxy network.
Statistical Reporting
We report these percentile metrics for latency:
| Metric | What It Tells You |
|---|---|
| P50 (median) | Typical request performance |
| P75 | Performance for most requests |
| P90 | Performance under moderate load |
| P95 | Tail latency -- slow but not extreme |
| P99 | Worst-case scenario (excluding outliers) |
| Mean | Useful only for throughput calculations |
| Std Dev | Consistency -- low std dev means predictable |
For success rates, we report the rate with a 95% confidence interval:
If observed success rate = 92.3% over 25,000 requests:
95% CI = 92.3% +/- 0.33%
Reported as: 92.3% (95% CI: 92.0% - 92.6%)
How to Replicate Our Methodology
You do not need our infrastructure to run meaningful benchmarks. Here is a simplified version you can run from a single machine.
Minimum Viable Benchmark
import time
import statistics
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
from dataclasses import dataclass, field
@dataclass(frozen=True)
class BenchmarkConfig:
"""Immutable benchmark configuration."""
proxy_url: str
targets: tuple # tuple of URL strings
requests_per_target: int = 100
concurrency: int = 10
timeout: int = 30
@dataclass(frozen=True)
class RequestResult:
"""Immutable result of a single benchmark request."""
target: str
status: str # "success", "soft_block", "hard_block", "timeout", "error"
latency_ms: float
status_code: int
body_size: int
def run_single_request(proxy_url, target, timeout):
"""Execute a single benchmarked request. Returns an immutable result."""
proxies = {
"http": proxy_url,
"https": proxy_url,
}
start = time.monotonic()
try:
resp = requests.get(
target,
proxies=proxies,
timeout=timeout,
allow_redirects=True,
)
elapsed_ms = (time.monotonic() - start) * 1000
body_size = len(resp.content)
# Soft block detection
body_lower = resp.text.lower()
soft_block_markers = ["captcha", "cf-challenge", "px-captcha"]
is_soft_block = any(m in body_lower for m in soft_block_markers)
if is_soft_block:
status = "soft_block"
elif resp.status_code == 200:
status = "success"
elif resp.status_code in (403, 429):
status = "hard_block"
else:
status = "other"
return RequestResult(
target=target,
status=status,
latency_ms=elapsed_ms,
status_code=resp.status_code,
body_size=body_size,
)
except requests.Timeout:
elapsed_ms = (time.monotonic() - start) * 1000
return RequestResult(
target=target,
status="timeout",
latency_ms=elapsed_ms,
status_code=0,
body_size=0,
)
except requests.RequestException as exc:
elapsed_ms = (time.monotonic() - start) * 1000
return RequestResult(
target=target,
status="error",
latency_ms=elapsed_ms,
status_code=0,
body_size=0,
)
def run_benchmark(config):
"""Run the full benchmark and return all results."""
results = []
tasks = []
with ThreadPoolExecutor(max_workers=config.concurrency) as pool:
for target in config.targets:
for _ in range(config.requests_per_target):
future = pool.submit(
run_single_request,
config.proxy_url,
target,
config.timeout,
)
tasks.append(future)
for future in as_completed(tasks):
results.append(future.result())
return tuple(results) # return immutable tuple
def summarize(results):
"""Produce summary statistics from benchmark results."""
total = len(results)
successes = [r for r in results if r.status == "success"]
latencies = [r.latency_ms for r in successes]
success_rate = len(successes) / total * 100 if total > 0 else 0.0
if not latencies:
return {"total": total, "success_rate": 0.0}
sorted_latencies = sorted(latencies)
return {
"total": total,
"successes": len(successes),
"success_rate": round(success_rate, 2),
"latency_p50": round(sorted_latencies[len(sorted_latencies) // 2], 1),
"latency_p95": round(sorted_latencies[int(len(sorted_latencies) * 0.95)], 1),
"latency_p99": round(sorted_latencies[int(len(sorted_latencies) * 0.99)], 1),
"latency_mean": round(statistics.mean(latencies), 1),
"latency_stdev": round(statistics.stdev(latencies), 1) if len(latencies) > 1 else 0,
"timeouts": len([r for r in results if r.status == "timeout"]),
"blocks": len([r for r in results if r.status in ("soft_block", "hard_block")]),
}
# Example usage
config = BenchmarkConfig(
proxy_url="http://USER:PASS@gate.hexproxies.com:8080",
targets=(
"https://httpbin.org/ip",
"https://httpbin.org/headers",
"https://httpbin.org/get",
),
requests_per_target=100,
concurrency=10,
timeout=30,
)
results = run_benchmark(config)
summary = summarize(results)
for key, value in summary.items():
print(f"{key}: {value}")
Key Principles for Valid Benchmarks
- Run at least 1,000 requests per test. Below this, percentile metrics are unreliable.
- Test over multiple days. A single run captures a moment; 7 days captures the real distribution.
- Include diverse targets. Do not benchmark against only
httpbin.org. - Test at realistic concurrency. If your production workload runs 50 concurrent connections, benchmark at 50.
- Never exclude failed requests from latency calculations. Report them separately but include them in success rate.
- Control for your own network. Run a direct (no-proxy) baseline against the same targets from the same machine.
How Our Published Benchmarks Use This Methodology
Every benchmark on Hex Proxies follows this methodology:
- Our speed test pages use the multi-region test node setup described above
- Our network uptime page reports gateway and effective uptime
- Our quarterly benchmark reports (like the upcoming Q2 2026 report) test multiple providers using identical methodology
Frequently Asked Questions
How often do you run benchmarks?
Continuous monitoring runs every 6 hours, 7 days a week. Published benchmark reports aggregate data from 28 runs (7 days at 4 runs/day) per provider per report. Internal monitoring for our own infrastructure runs every 60 seconds.
Do you test competitors with their knowledge?
We purchase standard retail plans from each provider we benchmark. We do not use trial accounts, demo environments, or special arrangements. This ensures the results reflect what a paying customer would experience.
Why don't you report average latency as the primary metric?
Average latency is misleading for network performance data. If 99 requests take 100ms and 1 request takes 10,000ms, the average is 199ms -- but no request actually took 199ms. The median (P50) tells you what a typical request looks like; P95 and P99 tell you about the tail. We report averages for completeness but emphasize percentiles.
Can I use your benchmark code for my own testing?
Yes. The code in this post is provided under MIT license. Adapt it to test any provider. We encourage you to benchmark Hex Proxies alongside competitors using your own methodology -- if our infrastructure is as fast as we claim, independent testing will confirm it.
How do you handle providers with different proxy formats?
Each provider gets a thin adapter that normalizes their proxy format (user:pass@host:port) into our standard test interface. The test logic, targets, timing, and classification code is identical across all providers.
Transparent methodology is the foundation of credible benchmarks. For the latest benchmark data produced using this methodology, visit our benchmark results page or explore regional speed tests. ISP proxies start at $2.08/IP and residential proxies at $4.25/GB -- see current pricing.