v1.8.91-d84675c
BenchmarksGuide

Proxy Benchmark Methodology: How We Test Speed, Uptime, and Success Rate

12 min read

By Hex Proxies Engineering Team

Proxy Benchmark Methodology: How We Test Speed, Uptime, and Success Rate

Most proxy benchmark reports are marketing material disguised as data. They cherry-pick favorable metrics, test under unrealistic conditions, and omit methodology entirely. Without knowing how a benchmark was conducted, the numbers are meaningless.

This post documents the exact methodology Hex Proxies uses for all published speed tests and benchmark reports. Every test on our benchmark methodology page follows this protocol. We publish it so you can evaluate our data critically and replicate the approach when testing any provider.

Why Proxy Benchmarks Fail

Before describing what we do, it helps to understand what most benchmarks get wrong.

Common Methodological Failures

Single-region testing. Many benchmarks test from one location (usually US East) to one target. This tells you nothing about global performance. A provider can have excellent US infrastructure and 500ms+ latency to Asia.

Insufficient sample size. Testing 100 requests and reporting the average is statistically meaningless. Network performance follows heavy-tailed distributions -- you need thousands of samples to capture P95 and P99 latency accurately.

No target diversity. Testing against httpbin.org or a provider's own endpoint does not reflect real-world conditions. Real targets have varying response times, rate limits, and anti-bot systems that affect success rates.

Missing time dimension. A snapshot benchmark taken at 2 AM UTC tells you nothing about performance during peak hours. Network congestion, target server load, and proxy pool utilization all vary by time of day.

Ignoring connection failures. Many benchmarks report "average response time" but exclude failed requests from the calculation. If 15% of requests fail, the average of the successful 85% is misleading.

Our Test Infrastructure

Test Nodes

We run benchmark agents from eight geographic locations:

Node LocationCloud ProviderPurpose
US East (Virginia)AWSPrimary US target proximity
US West (Oregon)AWSCross-region US latency
EU West (Frankfurt)AWSEuropean performance
EU South (London)AWSUK-specific targets
Asia East (Tokyo)AWSEast Asia performance
Asia South (Singapore)AWSSoutheast Asia + Oceania
South America (Sao Paulo)AWSLATAM performance
Middle East (Bahrain)AWSMENA region coverage
Each node runs identical test software on c5.xlarge instances (4 vCPU, 8 GB RAM) with enhanced networking enabled. We use dedicated instances -- never burstable -- to eliminate compute variability from the results.

Target Sites

We test against a rotating panel of 50 target URLs across five categories:

CategoryExample TargetsWhy Included
Static contentMajor CDN-served pagesBaseline latency (minimal server processing)
Dynamic web appsE-commerce product pages, search resultsRealistic web scraping targets
API endpointsPublic REST APIs with rate limitsAPI-scraping use case
Anti-bot protectedSites using Cloudflare, Akamai, PerimeterXSuccess rate under protection
Regional contentCountry-specific news sites, local marketplacesGeo-targeting accuracy
We do not disclose specific target URLs to prevent target sites from whitelisting our test infrastructure, which would invalidate the results.

Test Parameters

Each benchmark run uses these parameters:

# benchmark-config.yaml
test_parameters:
  requests_per_target: 500
  targets_per_category: 10
  total_requests_per_run: 25,000  # 500 * 50 targets
  concurrency: 50                  # parallel connections
  timeout: 30s                     # per-request timeout
  retry: 0                         # no retries (measure raw performance)
  protocol: HTTPS                  # CONNECT method
  rotation: per_request            # new IP every request
  session_test_duration: 5min      # for sticky session benchmarks
  
schedule:
  frequency: every_6_hours         # 4 runs per day
  duration: 7_days                 # per benchmark period
  total_runs: 28                   # per provider per period
  total_requests: 700,000          # per provider per period

This produces 700,000 data points per provider per benchmark period -- enough to calculate statistically significant percentile metrics.

Measurement Methodology

Latency Measurement

We measure four distinct latency components for every request:

┌──────────────────────────────────────────────────────────────┐
│                    Total Request Time                         │
│                                                              │
│  ┌─────────┐  ┌─────────┐  ┌──────────┐  ┌──────────────┐  │
│  │  DNS    │  │  TCP    │  │   TLS    │  │  HTTP        │  │
│  │ Resolve │→ │ Connect │→ │ Handshake│→ │  Transfer    │  │
│  │         │  │         │  │          │  │  (TTFB+body) │  │
│  └─────────┘  └─────────┘  └──────────┘  └──────────────┘  │
│                                                              │
│  t_dns        t_connect    t_tls         t_transfer          │
└──────────────────────────────────────────────────────────────┘

DNS resolution time (t_dns): Time to resolve the proxy endpoint hostname. We cache DNS after the first resolution to isolate proxy performance from DNS infrastructure.

TCP connect time (t_connect): Time from SYN to SYN-ACK with the proxy server. This measures network latency to the proxy.

TLS handshake time (t_tls): Time to complete the TLS 1.3 handshake with the proxy (for HTTPS-to-proxy connections) or the CONNECT + target TLS handshake.

HTTP transfer time (t_transfer): Time from sending the HTTP request to receiving the complete response body. This includes the proxy's internal routing to the target, the target's processing time, and the response relay.

We report:

  • Time to First Byte (TTFB): t_dns + t_connect + t_tls + time_to_first_response_byte
  • Total Response Time: The complete request-response cycle including body download
  • Proxy Overhead: Measured by comparing direct-to-target requests with proxied requests to the same target from the same node

Success Rate Calculation

A request is classified into one of four outcomes:

OutcomeDefinitionCounted As
SuccessHTTP 200 with expected contentSuccess
Soft blockHTTP 200 but CAPTCHA/challenge pageFailure
Hard blockHTTP 403, 429, or connection refusedFailure
TimeoutNo response within 30 secondsFailure
Critical detail: We check response bodies for soft blocks, not just status codes. Many anti-bot systems return HTTP 200 with a CAPTCHA or JavaScript challenge. Counting these as successes inflates the reported success rate. Our test client parses response bodies and checks for known block indicators (CAPTCHA iframes, challenge script patterns, empty bodies under 1 KB for pages expected to be larger).
def classify_response(response, expected_min_size=1024):
    """Classify a proxy response into success or failure category.
    
    Args:
        response: HTTP response object with status_code, text, elapsed
        expected_min_size: Minimum expected response body size in bytes
    
    Returns:
        Tuple of (outcome: str, details: dict)
    """
    if response is None:
        return ("timeout", {"reason": "no_response"})
    
    status = response.status_code
    body = response.text
    body_size = len(body.encode('utf-8'))
    
    # Hard blocks
    if status in (403, 429, 503):
        return ("hard_block", {"status": status})
    
    if status == 407:
        return ("auth_failure", {"status": status})
    
    # Check for soft blocks in 200 responses
    if status == 200:
        block_indicators = [
            "captcha",
            "cf-challenge",
            "challenge-platform",
            "managed-challenge",
            "px-captcha",
            "distil_r_captcha",
            "arkoselabs.com",
            "recaptcha/api",
            "hcaptcha.com",
        ]
        
        body_lower = body.lower()
        for indicator in block_indicators:
            if indicator in body_lower:
                return ("soft_block", {
                    "indicator": indicator,
                    "body_size": body_size,
                })
        
        # Suspiciously small response
        if body_size < expected_min_size:
            return ("suspect_small", {
                "body_size": body_size,
                "expected_min": expected_min_size,
            })
        
        return ("success", {
            "body_size": body_size,
            "latency_ms": response.elapsed.total_seconds() * 1000,
        })
    
    return ("other_failure", {"status": status})

Success rate formula:

success_rate = successful_requests / total_requests * 100

We do not exclude timeouts or errors from the denominator. If you sent 10,000 requests and 1,500 timed out, your success rate is 85%, not "100% of completed requests."

Uptime Measurement

We distinguish between two uptime metrics:

Gateway uptime: Can we establish a TCP connection to the proxy endpoint? We ping the proxy gateway every 60 seconds from all eight test nodes. If any node cannot connect for two consecutive checks (2 minutes), we record a downtime event.

Effective uptime: Of the requests that reached the proxy, what percentage received a response (regardless of success or failure)? This measures whether the proxy is functional, not whether it is unblocked.

gateway_uptime = (1 - total_downtime_minutes / total_monitored_minutes) * 100

effective_uptime = requests_with_any_response / total_requests_sent * 100

A provider can have 99.99% gateway uptime but 95% effective uptime if 5% of requests hang indefinitely within the proxy network.

Statistical Reporting

We report these percentile metrics for latency:

MetricWhat It Tells You
P50 (median)Typical request performance
P75Performance for most requests
P90Performance under moderate load
P95Tail latency -- slow but not extreme
P99Worst-case scenario (excluding outliers)
MeanUseful only for throughput calculations
Std DevConsistency -- low std dev means predictable
We never report only the mean for latency. Mean latency is distorted by outliers and hides the tail distribution that matters most for user experience and timeout configuration.

For success rates, we report the rate with a 95% confidence interval:

If observed success rate = 92.3% over 25,000 requests:
95% CI = 92.3% +/- 0.33%
Reported as: 92.3% (95% CI: 92.0% - 92.6%)

How to Replicate Our Methodology

You do not need our infrastructure to run meaningful benchmarks. Here is a simplified version you can run from a single machine.

Minimum Viable Benchmark

import time
import statistics
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
from dataclasses import dataclass, field


@dataclass(frozen=True)
class BenchmarkConfig:
    """Immutable benchmark configuration."""
    proxy_url: str
    targets: tuple  # tuple of URL strings
    requests_per_target: int = 100
    concurrency: int = 10
    timeout: int = 30


@dataclass(frozen=True)
class RequestResult:
    """Immutable result of a single benchmark request."""
    target: str
    status: str  # "success", "soft_block", "hard_block", "timeout", "error"
    latency_ms: float
    status_code: int
    body_size: int


def run_single_request(proxy_url, target, timeout):
    """Execute a single benchmarked request. Returns an immutable result."""
    proxies = {
        "http": proxy_url,
        "https": proxy_url,
    }
    
    start = time.monotonic()
    try:
        resp = requests.get(
            target,
            proxies=proxies,
            timeout=timeout,
            allow_redirects=True,
        )
        elapsed_ms = (time.monotonic() - start) * 1000
        body_size = len(resp.content)
        
        # Soft block detection
        body_lower = resp.text.lower()
        soft_block_markers = ["captcha", "cf-challenge", "px-captcha"]
        is_soft_block = any(m in body_lower for m in soft_block_markers)
        
        if is_soft_block:
            status = "soft_block"
        elif resp.status_code == 200:
            status = "success"
        elif resp.status_code in (403, 429):
            status = "hard_block"
        else:
            status = "other"
        
        return RequestResult(
            target=target,
            status=status,
            latency_ms=elapsed_ms,
            status_code=resp.status_code,
            body_size=body_size,
        )
    
    except requests.Timeout:
        elapsed_ms = (time.monotonic() - start) * 1000
        return RequestResult(
            target=target,
            status="timeout",
            latency_ms=elapsed_ms,
            status_code=0,
            body_size=0,
        )
    except requests.RequestException as exc:
        elapsed_ms = (time.monotonic() - start) * 1000
        return RequestResult(
            target=target,
            status="error",
            latency_ms=elapsed_ms,
            status_code=0,
            body_size=0,
        )


def run_benchmark(config):
    """Run the full benchmark and return all results."""
    results = []
    tasks = []
    
    with ThreadPoolExecutor(max_workers=config.concurrency) as pool:
        for target in config.targets:
            for _ in range(config.requests_per_target):
                future = pool.submit(
                    run_single_request,
                    config.proxy_url,
                    target,
                    config.timeout,
                )
                tasks.append(future)
        
        for future in as_completed(tasks):
            results.append(future.result())
    
    return tuple(results)  # return immutable tuple


def summarize(results):
    """Produce summary statistics from benchmark results."""
    total = len(results)
    successes = [r for r in results if r.status == "success"]
    latencies = [r.latency_ms for r in successes]
    
    success_rate = len(successes) / total * 100 if total > 0 else 0.0
    
    if not latencies:
        return {"total": total, "success_rate": 0.0}
    
    sorted_latencies = sorted(latencies)
    
    return {
        "total": total,
        "successes": len(successes),
        "success_rate": round(success_rate, 2),
        "latency_p50": round(sorted_latencies[len(sorted_latencies) // 2], 1),
        "latency_p95": round(sorted_latencies[int(len(sorted_latencies) * 0.95)], 1),
        "latency_p99": round(sorted_latencies[int(len(sorted_latencies) * 0.99)], 1),
        "latency_mean": round(statistics.mean(latencies), 1),
        "latency_stdev": round(statistics.stdev(latencies), 1) if len(latencies) > 1 else 0,
        "timeouts": len([r for r in results if r.status == "timeout"]),
        "blocks": len([r for r in results if r.status in ("soft_block", "hard_block")]),
    }


# Example usage
config = BenchmarkConfig(
    proxy_url="http://USER:PASS@gate.hexproxies.com:8080",
    targets=(
        "https://httpbin.org/ip",
        "https://httpbin.org/headers",
        "https://httpbin.org/get",
    ),
    requests_per_target=100,
    concurrency=10,
    timeout=30,
)

results = run_benchmark(config)
summary = summarize(results)

for key, value in summary.items():
    print(f"{key}: {value}")

Key Principles for Valid Benchmarks

  1. Run at least 1,000 requests per test. Below this, percentile metrics are unreliable.
  2. Test over multiple days. A single run captures a moment; 7 days captures the real distribution.
  3. Include diverse targets. Do not benchmark against only httpbin.org.
  4. Test at realistic concurrency. If your production workload runs 50 concurrent connections, benchmark at 50.
  5. Never exclude failed requests from latency calculations. Report them separately but include them in success rate.
  6. Control for your own network. Run a direct (no-proxy) baseline against the same targets from the same machine.

How Our Published Benchmarks Use This Methodology

Every benchmark on Hex Proxies follows this methodology:

  • Our speed test pages use the multi-region test node setup described above
  • Our network uptime page reports gateway and effective uptime
  • Our quarterly benchmark reports (like the upcoming Q2 2026 report) test multiple providers using identical methodology
All historical benchmark data is retained and comparable across periods because the methodology does not change between runs.

Frequently Asked Questions

How often do you run benchmarks?

Continuous monitoring runs every 6 hours, 7 days a week. Published benchmark reports aggregate data from 28 runs (7 days at 4 runs/day) per provider per report. Internal monitoring for our own infrastructure runs every 60 seconds.

Do you test competitors with their knowledge?

We purchase standard retail plans from each provider we benchmark. We do not use trial accounts, demo environments, or special arrangements. This ensures the results reflect what a paying customer would experience.

Why don't you report average latency as the primary metric?

Average latency is misleading for network performance data. If 99 requests take 100ms and 1 request takes 10,000ms, the average is 199ms -- but no request actually took 199ms. The median (P50) tells you what a typical request looks like; P95 and P99 tell you about the tail. We report averages for completeness but emphasize percentiles.

Can I use your benchmark code for my own testing?

Yes. The code in this post is provided under MIT license. Adapt it to test any provider. We encourage you to benchmark Hex Proxies alongside competitors using your own methodology -- if our infrastructure is as fast as we claim, independent testing will confirm it.

How do you handle providers with different proxy formats?

Each provider gets a thin adapter that normalizes their proxy format (user:pass@host:port) into our standard test interface. The test logic, targets, timing, and classification code is identical across all providers.


Transparent methodology is the foundation of credible benchmarks. For the latest benchmark data produced using this methodology, visit our benchmark results page or explore regional speed tests. ISP proxies start at $2.08/IP and residential proxies at $4.25/GB -- see current pricing.

Cookie Preferences

We use cookies to ensure the best experience. You can customize your preferences below. Learn more