Proxy Benchmark Methodology: How We Test Speed, Uptime, and Success Rate

Most proxy benchmark reports are marketing material disguised as data. They cherry-pick favorable metrics, test under unrealistic conditions, and omit methodology entirely. Without knowing how a benchmark was conducted, the numbers are meaningless.

This post documents the exact methodology Hex Proxies uses for all published speed tests and benchmark reports. Every test on our benchmark methodology page follows this protocol. We publish it so you can evaluate our data critically and replicate the approach when testing any provider.

Why Proxy Benchmarks Fail

Before describing what we do, it helps to understand what most benchmarks get wrong.

Common Methodological Failures

Single-region testing. Many benchmarks test from one location (usually US East) to one target. This tells you nothing about global performance. A provider can have excellent US infrastructure and 500ms+ latency to Asia.

Insufficient sample size. Testing 100 requests and reporting the average is statistically meaningless. Network performance follows heavy-tailed distributions -- you need thousands of samples to capture P95 and P99 latency accurately.

No target diversity. Testing against httpbin.org or a provider's own endpoint does not reflect real-world conditions. Real targets have varying response times, rate limits, and anti-bot systems that affect success rates.

Missing time dimension. A snapshot benchmark taken at 2 AM UTC tells you nothing about performance during peak hours. Network congestion, target server load, and proxy pool utilization all vary by time of day.

Ignoring connection failures. Many benchmarks report "average response time" but exclude failed requests from the calculation. If 15% of requests fail, the average of the successful 85% is misleading.

Our Test Infrastructure

Test Nodes

We run benchmark agents from eight geographic locations:

Node Location	Cloud Provider	Purpose
US East (Virginia)	AWS	Primary US target proximity
US West (Oregon)	AWS	Cross-region US latency
EU West (Frankfurt)	AWS	European performance
EU South (London)	AWS	UK-specific targets
Asia East (Tokyo)	AWS	East Asia performance
Asia South (Singapore)	AWS	Southeast Asia + Oceania
South America (Sao Paulo)	AWS	LATAM performance
Middle East (Bahrain)	AWS	MENA region coverage

Each node runs identical test software on c5.xlarge instances (4 vCPU, 8 GB RAM) with enhanced networking enabled. We use dedicated instances -- never burstable -- to eliminate compute variability from the results.

Target Sites

We test against a rotating panel of 50 target URLs across five categories:

Category	Example Targets	Why Included
Static content	Major CDN-served pages	Baseline latency (minimal server processing)
Dynamic web apps	E-commerce product pages, search results	Realistic web scraping targets
API endpoints	Public REST APIs with rate limits	API-scraping use case
Anti-bot protected	Sites using Cloudflare, Akamai, PerimeterX	Success rate under protection
Regional content	Country-specific news sites, local marketplaces	Geo-targeting accuracy

We do not disclose specific target URLs to prevent target sites from whitelisting our test infrastructure, which would invalidate the results.

Test Parameters

Each benchmark run uses these parameters:

# benchmark-config.yaml
test_parameters:
  requests_per_target: 500
  targets_per_category: 10
  total_requests_per_run: 25,000  # 500 * 50 targets
  concurrency: 50                  # parallel connections
  timeout: 30s                     # per-request timeout
  retry: 0                         # no retries (measure raw performance)
  protocol: HTTPS                  # CONNECT method
  rotation: per_request            # new IP every request
  session_test_duration: 5min      # for sticky session benchmarks

schedule:
  frequency: every_6_hours         # 4 runs per day
  duration: 7_days                 # per benchmark period
  total_runs: 28                   # per provider per period
  total_requests: 700,000          # per provider per period

This produces 700,000 data points per provider per benchmark period -- enough to calculate statistically significant percentile metrics.

Measurement Methodology

Latency Measurement

We measure four distinct latency components for every request:

┌──────────────────────────────────────────────────────────────┐
│                    Total Request Time                         │
│                                                              │
│  ┌─────────┐  ┌─────────┐  ┌──────────┐  ┌──────────────┐  │
│  │  DNS    │  │  TCP    │  │   TLS    │  │  HTTP        │  │
│  │ Resolve │→ │ Connect │→ │ Handshake│→ │  Transfer    │  │
│  │         │  │         │  │          │  │  (TTFB+body) │  │
│  └─────────┘  └─────────┘  └──────────┘  └──────────────┘  │
│                                                              │
│  t_dns        t_connect    t_tls         t_transfer          │
└──────────────────────────────────────────────────────────────┘

DNS resolution time (t_dns): Time to resolve the proxy endpoint hostname. We cache DNS after the first resolution to isolate proxy performance from DNS infrastructure.

TCP connect time (t_connect): Time from SYN to SYN-ACK with the proxy server. This measures network latency to the proxy.

TLS handshake time (t_tls): Time to complete the TLS 1.3 handshake with the proxy (for HTTPS-to-proxy connections) or the CONNECT + target TLS handshake.

HTTP transfer time (t_transfer): Time from sending the HTTP request to receiving the complete response body. This includes the proxy's internal routing to the target, the target's processing time, and the response relay.

We report:

Time to First Byte (TTFB): t_dns + t_connect + t_tls + time_to_first_response_byte
Total Response Time: The complete request-response cycle including body download
Proxy Overhead: Measured by comparing direct-to-target requests with proxied requests to the same target from the same node

Success Rate Calculation

A request is classified into one of four outcomes:

Outcome	Definition	Counted As
Success	HTTP 200 with expected content	Success
Soft block	HTTP 200 but CAPTCHA/challenge page	Failure
Hard block	HTTP 403, 429, or connection refused	Failure
Timeout	No response within 30 seconds	Failure

Critical detail: We check response bodies for soft blocks, not just status codes. Many anti-bot systems return HTTP 200 with a CAPTCHA or JavaScript challenge. Counting these as successes inflates the reported success rate. Our test client parses response bodies and checks for known block indicators (CAPTCHA iframes, challenge script patterns, empty bodies under 1 KB for pages expected to be larger).

def classify_response(response, expected_min_size=1024):
    """Classify a proxy response into success or failure category.

    Args:
        response: HTTP response object with status_code, text, elapsed
        expected_min_size: Minimum expected response body size in bytes

    Returns:
        Tuple of (outcome: str, details: dict)
    """
    if response is None:
        return ("timeout", {"reason": "no_response"})

    status = response.status_code
    body = response.text
    body_size = len(body.encode('utf-8'))

    # Hard blocks
    if status in (403, 429, 503):
        return ("hard_block", {"status": status})

    if status == 407:
        return ("auth_failure", {"status": status})

    # Check for soft blocks in 200 responses
    if status == 200:
        block_indicators = [
            "captcha",
            "cf-challenge",
            "challenge-platform",
            "managed-challenge",
            "px-captcha",
            "distil_r_captcha",
            "arkoselabs.com",
            "recaptcha/api",
            "hcaptcha.com",
        ]

        body_lower = body.lower()
        for indicator in block_indicators:
            if indicator in body_lower:
                return ("soft_block", {
                    "indicator": indicator,
                    "body_size": body_size,
                })

        # Suspiciously small response
        if body_size < expected_min_size:
            return ("suspect_small", {
                "body_size": body_size,
                "expected_min": expected_min_size,
            })

        return ("success", {
            "body_size": body_size,
            "latency_ms": response.elapsed.total_seconds() * 1000,
        })

    return ("other_failure", {"status": status})

Success rate formula:

success_rate = successful_requests / total_requests * 100

We do not exclude timeouts or errors from the denominator. If you sent 10,000 requests and 1,500 timed out, your success rate is 85%, not "100% of completed requests."

Uptime Measurement

We distinguish between two uptime metrics:

Gateway uptime: Can we establish a TCP connection to the proxy endpoint? We ping the proxy gateway every 60 seconds from all eight test nodes. If any node cannot connect for two consecutive checks (2 minutes), we record a downtime event.

Effective uptime: Of the requests that reached the proxy, what percentage received a response (regardless of success or failure)? This measures whether the proxy is functional, not whether it is unblocked.

gateway_uptime = (1 - total_downtime_minutes / total_monitored_minutes) * 100

effective_uptime = requests_with_any_response / total_requests_sent * 100

A provider can have 99.99% gateway uptime but 95% effective uptime if 5% of requests hang indefinitely within the proxy network.

Statistical Reporting

We report these percentile metrics for latency:

Metric	What It Tells You
P50 (median)	Typical request performance
P75	Performance for most requests
P90	Performance under moderate load
P95	Tail latency -- slow but not extreme
P99	Worst-case scenario (excluding outliers)
Mean	Useful only for throughput calculations
Std Dev	Consistency -- low std dev means predictable

We never report only the mean for latency. Mean latency is distorted by outliers and hides the tail distribution that matters most for user experience and timeout configuration.

For success rates, we report the rate with a 95% confidence interval:

If observed success rate = 92.3% over 25,000 requests:
95% CI = 92.3% +/- 0.33%
Reported as: 92.3% (95% CI: 92.0% - 92.6%)

How to Replicate Our Methodology

You do not need our infrastructure to run meaningful benchmarks. Here is a simplified version you can run from a single machine.

Minimum Viable Benchmark

import time
import statistics
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
from dataclasses import dataclass, field


@dataclass(frozen=True)
class BenchmarkConfig:
    """Immutable benchmark configuration."""
    proxy_url: str
    targets: tuple  # tuple of URL strings
    requests_per_target: int = 100
    concurrency: int = 10
    timeout: int = 30


@dataclass(frozen=True)
class RequestResult:
    """Immutable result of a single benchmark request."""
    target: str
    status: str  # "success", "soft_block", "hard_block", "timeout", "error"
    latency_ms: float
    status_code: int
    body_size: int


def run_single_request(proxy_url, target, timeout):
    """Execute a single benchmarked request. Returns an immutable result."""
    proxies = {
        "http": proxy_url,
        "https": proxy_url,
    }

    start = time.monotonic()
    try:
        resp = requests.get(
            target,
            proxies=proxies,
            timeout=timeout,
            allow_redirects=True,
        )
        elapsed_ms = (time.monotonic() - start) * 1000
        body_size = len(resp.content)

        # Soft block detection
        body_lower = resp.text.lower()
        soft_block_markers = ["captcha", "cf-challenge", "px-captcha"]
        is_soft_block = any(m in body_lower for m in soft_block_markers)

        if is_soft_block:
            status = "soft_block"
        elif resp.status_code == 200:
            status = "success"
        elif resp.status_code in (403, 429):
            status = "hard_block"
        else:
            status = "other"

        return RequestResult(
            target=target,
            status=status,
            latency_ms=elapsed_ms,
            status_code=resp.status_code,
            body_size=body_size,
        )

    except requests.Timeout:
        elapsed_ms = (time.monotonic() - start) * 1000
        return RequestResult(
            target=target,
            status="timeout",
            latency_ms=elapsed_ms,
            status_code=0,
            body_size=0,
        )
    except requests.RequestException as exc:
        elapsed_ms = (time.monotonic() - start) * 1000
        return RequestResult(
            target=target,
            status="error",
            latency_ms=elapsed_ms,
            status_code=0,
            body_size=0,
        )


def run_benchmark(config):
    """Run the full benchmark and return all results."""
    results = []
    tasks = []

    with ThreadPoolExecutor(max_workers=config.concurrency) as pool:
        for target in config.targets:
            for _ in range(config.requests_per_target):
                future = pool.submit(
                    run_single_request,
                    config.proxy_url,
                    target,
                    config.timeout,
                )
                tasks.append(future)

        for future in as_completed(tasks):
            results.append(future.result())

    return tuple(results)  # return immutable tuple


def summarize(results):
    """Produce summary statistics from benchmark results."""
    total = len(results)
    successes = [r for r in results if r.status == "success"]
    latencies = [r.latency_ms for r in successes]

    success_rate = len(successes) / total * 100 if total > 0 else 0.0

    if not latencies:
        return {"total": total, "success_rate": 0.0}

    sorted_latencies = sorted(latencies)

    return {
        "total": total,
        "successes": len(successes),
        "success_rate": round(success_rate, 2),
        "latency_p50": round(sorted_latencies[len(sorted_latencies) // 2], 1),
        "latency_p95": round(sorted_latencies[int(len(sorted_latencies) * 0.95)], 1),
        "latency_p99": round(sorted_latencies[int(len(sorted_latencies) * 0.99)], 1),
        "latency_mean": round(statistics.mean(latencies), 1),
        "latency_stdev": round(statistics.stdev(latencies), 1) if len(latencies) > 1 else 0,
        "timeouts": len([r for r in results if r.status == "timeout"]),
        "blocks": len([r for r in results if r.status in ("soft_block", "hard_block")]),
    }


# Example usage
config = BenchmarkConfig(
    proxy_url="http://USER:PASS@gate.hexproxies.com:8080",
    targets=(
        "https://httpbin.org/ip",
        "https://httpbin.org/headers",
        "https://httpbin.org/get",
    ),
    requests_per_target=100,
    concurrency=10,
    timeout=30,
)

results = run_benchmark(config)
summary = summarize(results)

for key, value in summary.items():
    print(f"{key}: {value}")

Key Principles for Valid Benchmarks

Run at least 1,000 requests per test. Below this, percentile metrics are unreliable.
Test over multiple days. A single run captures a moment; 7 days captures the real distribution.
Include diverse targets. Do not benchmark against only httpbin.org.
Test at realistic concurrency. If your production workload runs 50 concurrent connections, benchmark at 50.
Never exclude failed requests from latency calculations. Report them separately but include them in success rate.
Control for your own network. Run a direct (no-proxy) baseline against the same targets from the same machine.

How Our Published Benchmarks Use This Methodology

Every benchmark on Hex Proxies follows this methodology:

Our speed test pages use the multi-region test node setup described above
Our network uptime page reports gateway and effective uptime
Our quarterly benchmark reports (like the upcoming Q2 2026 report) test multiple providers using identical methodology

All historical benchmark data is retained and comparable across periods because the methodology does not change between runs.

Frequently Asked Questions

How often do you run benchmarks?

Continuous monitoring runs every 6 hours, 7 days a week. Published benchmark reports aggregate data from 28 runs (7 days at 4 runs/day) per provider per report. Internal monitoring for our own infrastructure runs every 60 seconds.

Do you test competitors with their knowledge?

We purchase standard retail plans from each provider we benchmark. We do not use trial accounts, demo environments, or special arrangements. This ensures the results reflect what a paying customer would experience.

Why don't you report average latency as the primary metric?

Average latency is misleading for network performance data. If 99 requests take 100ms and 1 request takes 10,000ms, the average is 199ms -- but no request actually took 199ms. The median (P50) tells you what a typical request looks like; P95 and P99 tell you about the tail. We report averages for completeness but emphasize percentiles.

Can I use your benchmark code for my own testing?

Yes. The code in this post is provided under MIT license. Adapt it to test any provider. We encourage you to benchmark Hex Proxies alongside competitors using your own methodology -- if our infrastructure is as fast as we claim, independent testing will confirm it.

How do you handle providers with different proxy formats?

Each provider gets a thin adapter that normalizes their proxy format (user:pass@host:port) into our standard test interface. The test logic, targets, timing, and classification code is identical across all providers.

Transparent methodology is the foundation of credible benchmarks. For the latest benchmark data produced using this methodology, visit our benchmark results page or explore regional speed tests. ISP proxies start at $2.08/IP and residential proxies at $4.25/GB -- see current pricing.