v1.10.90-0e025b8
Skip to main content
ProxiesTutorial

Proxy Failover Patterns: Automatic Provider Switching for High Availability

12 min read

By Hex Proxies Engineering Team

Proxy Failover Patterns: Automatic Provider Switching for High Availability

Last updated: April 2026 | Author: Hex Proxies Team

TL;DR: Production scraping and data collection systems need proxy failover to maintain uptime when individual proxies or providers experience issues. This guide covers five failover patterns — from simple retry logic to sophisticated multi-provider mesh architectures — with production code examples. Hex Proxies residential ($1.70/GB via gate.hexproxies.com:8080) and ISP ($0.83/IP) proxies support these patterns through session-based routing and standard HTTP proxy protocols.

Every proxy will fail eventually. Residential IPs get banned by target sites. ISP proxies experience occasional downtime. Entire proxy providers can have outages. The question is not whether your proxy infrastructure will fail — it is whether your system handles the failure gracefully or loses data.

This guide presents five proxy failover patterns, progressing from simple to sophisticated. Each pattern addresses a different scale and reliability requirement. Choose the simplest pattern that meets your availability needs.

Understanding Proxy Failure Modes

Before implementing failover, understand what can go wrong:

Failure ModeDetection SignalTypical DurationFrequency
IP blocked by target403/429 response, CAPTCHAMinutes to hoursCommon
Proxy connection timeoutConnection timeout errorSecondsOccasional
Proxy authentication failure407 responseUntil credentials fixedRare
Provider gateway downConnection refusedMinutes to hoursRare
Geographic routing failureWrong geo in responseVariableOccasional
Bandwidth exhausted402/429 from providerUntil top-upDepends on plan

Pattern 1: Simple Retry with Session Rotation

The most basic failover pattern: when a request fails, retry with a new proxy session (which gives a new IP).

import httpx
import uuid
import time
from dataclasses import dataclass
from typing import Optional

@dataclass(frozen=True)
class ProxyResponse:
    success: bool
    status_code: int
    body: Optional[str]
    attempts: int
    final_session: str

def request_with_retry(
    url: str,
    base_user: str,
    password: str,
    country: str = "us",
    max_retries: int = 3,
    retry_delay: float = 2.0
) -> ProxyResponse:
    """Make a request with automatic proxy session rotation on failure."""
    for attempt in range(1, max_retries + 1):
        session_id = uuid.uuid4().hex[:8]
        username = f"{base_user}-country-{country}-sessid-{session_id}"
        proxy_url = (
            f"http://{username}:{password}@gate.hexproxies.com:8080"
        )

        try:
            response = httpx.get(
                url,
                proxies=proxy_url,
                timeout=15.0,
                follow_redirects=True
            )

            if response.status_code == 200:
                return ProxyResponse(
                    True, 200, response.text, attempt, session_id
                )

            if response.status_code in (403, 429):
                # Blocked — retry with new session
                time.sleep(retry_delay * attempt)
                continue

            # Other status codes — return as-is
            return ProxyResponse(
                False, response.status_code, response.text,
                attempt, session_id
            )

        except httpx.RequestError:
            time.sleep(retry_delay * attempt)
            continue

    return ProxyResponse(False, 0, None, max_retries, "")

Pros: Simple, handles the most common failure (IP block). Cons: Does not handle provider-level outages.

Pattern 2: Proxy Type Escalation

Escalate through proxy types when lower-cost options fail. Start with residential (broadest compatibility), fall back to ISP (most reliable).

from dataclasses import dataclass
from typing import List, Optional
import httpx
import uuid

@dataclass(frozen=True)
class ProxyTier:
    name: str
    proxy_url: str
    timeout: float

def create_escalation_chain(
    resi_user: str, resi_pass: str,
    isp_ips: List[str], isp_user: str, isp_pass: str,
    country: str = "us"
) -> List[ProxyTier]:
    """Create a proxy escalation chain."""
    session = uuid.uuid4().hex[:8]
    resi_username = f"{resi_user}-country-{country}-sessid-{session}"
    tiers = [
        ProxyTier(
            "residential-rotating",
            f"http://{resi_user}-country-{country}:{resi_pass}"
            f"@gate.hexproxies.com:8080",
            15.0
        ),
        ProxyTier(
            "residential-sticky",
            f"http://{resi_username}:{resi_pass}"
            f"@gate.hexproxies.com:8080",
            20.0
        ),
    ]
    for ip in isp_ips:
        tiers.append(
            ProxyTier(
                f"isp-{ip}",
                f"http://{isp_user}:{isp_pass}@{ip}:8080",
                25.0
            )
        )
    return tiers

def request_with_escalation(
    url: str, tiers: List[ProxyTier]
) -> Optional[httpx.Response]:
    """Try each proxy tier in order until one succeeds."""
    for tier in tiers:
        try:
            response = httpx.get(
                url,
                proxies=tier.proxy_url,
                timeout=tier.timeout,
                follow_redirects=True
            )
            if response.status_code == 200:
                return response
        except httpx.RequestError:
            continue
    return None

Pros: Optimizes cost (cheaper options tried first) while maintaining reliability. Cons: Sequential attempts add latency.

Pattern 3: Health-Aware Routing

Track proxy health metrics and route requests to the healthiest available proxy. Unhealthy proxies are temporarily removed from the pool.

import time
from dataclasses import dataclass, field
from typing import Dict, Optional, List
import threading
import httpx

@dataclass
class ProxyHealth:
    proxy_id: str
    proxy_url: str
    success_count: int = 0
    failure_count: int = 0
    last_success: float = 0.0
    last_failure: float = 0.0
    circuit_open: bool = False
    circuit_open_until: float = 0.0

    @property
    def success_rate(self) -> float:
        total = self.success_count + self.failure_count
        if total == 0:
            return 1.0
        return self.success_count / total

    @property
    def is_available(self) -> bool:
        if self.circuit_open:
            if time.time() > self.circuit_open_until:
                self.circuit_open = False
                return True
            return False
        return True


class HealthAwareProxyRouter:
    def __init__(
        self,
        proxies: List[str],
        failure_threshold: int = 5,
        circuit_timeout: float = 60.0
    ):
        self._lock = threading.Lock()
        self.failure_threshold = failure_threshold
        self.circuit_timeout = circuit_timeout
        self.health: Dict[str, ProxyHealth] = {}
        for i, proxy_url in enumerate(proxies):
            pid = f"proxy-{i}"
            self.health[pid] = ProxyHealth(pid, proxy_url)

    def get_best_proxy(self) -> Optional[ProxyHealth]:
        """Return the healthiest available proxy."""
        with self._lock:
            available = [
                h for h in self.health.values() if h.is_available
            ]
            if not available:
                # All circuits open — try the one that closes soonest
                return min(
                    self.health.values(),
                    key=lambda h: h.circuit_open_until
                )
            return max(available, key=lambda h: h.success_rate)

    def record_success(self, proxy_id: str) -> None:
        with self._lock:
            h = self.health[proxy_id]
            self.health[proxy_id] = ProxyHealth(
                proxy_id, h.proxy_url,
                h.success_count + 1, h.failure_count,
                time.time(), h.last_failure,
                False, 0.0
            )

    def record_failure(self, proxy_id: str) -> None:
        with self._lock:
            h = self.health[proxy_id]
            new_failures = h.failure_count + 1
            circuit_open = new_failures >= self.failure_threshold
            circuit_until = (
                time.time() + self.circuit_timeout
                if circuit_open else 0.0
            )
            self.health[proxy_id] = ProxyHealth(
                proxy_id, h.proxy_url,
                h.success_count, new_failures,
                h.last_success, time.time(),
                circuit_open, circuit_until
            )

    def request(self, url: str, max_attempts: int = 3):
        """Route request through healthiest proxy."""
        for _ in range(max_attempts):
            proxy = self.get_best_proxy()
            if proxy is None:
                break
            try:
                response = httpx.get(
                    url,
                    proxies=proxy.proxy_url,
                    timeout=15.0
                )
                if response.status_code == 200:
                    self.record_success(proxy.proxy_id)
                    return response
                self.record_failure(proxy.proxy_id)
            except httpx.RequestError:
                self.record_failure(proxy.proxy_id)
        return None

The circuit breaker pattern prevents the system from repeatedly hitting a known-bad proxy. After 5 consecutive failures, a proxy is taken out of rotation for 60 seconds, then automatically retested.

Pros: Self-healing, routes around problems automatically. Cons: More complex, requires stateful health tracking.

Pattern 4: Multi-Provider Failover

For mission-critical operations, run multiple proxy providers simultaneously and fail over between them when one has issues.

from dataclasses import dataclass
from typing import List, Optional
import httpx
import time

@dataclass(frozen=True)
class ProxyProvider:
    name: str
    proxy_url: str
    priority: int  # Lower = higher priority
    cost_per_gb: float

def create_provider_chain() -> List[ProxyProvider]:
    return [
        ProxyProvider(
            name="hex-residential",
            proxy_url=(
                "http://USER-country-us:PASS"
                "@gate.hexproxies.com:8080"
            ),
            priority=1,
            cost_per_gb=1.70
        ),
        ProxyProvider(
            name="hex-isp",
            proxy_url="http://USER:PASS@ISP_IP:8080",
            priority=2,
            cost_per_gb=0.00  # Fixed cost per IP
        ),
        ProxyProvider(
            name="backup-provider",
            proxy_url=(
                "http://BACKUP_USER:BACKUP_PASS"
                "@backup.provider.com:8080"
            ),
            priority=3,
            cost_per_gb=3.50
        ),
    ]

def failover_request(
    url: str, providers: List[ProxyProvider]
) -> Optional[httpx.Response]:
    """Try providers in priority order."""
    sorted_providers = sorted(providers, key=lambda p: p.priority)
    for provider in sorted_providers:
        try:
            response = httpx.get(
                url,
                proxies=provider.proxy_url,
                timeout=15.0
            )
            if response.status_code == 200:
                log_provider_usage(
                    provider.name, len(response.content)
                )
                return response
        except httpx.RequestError:
            log_provider_failure(provider.name, url)
            continue
    return None

The cost-awareness in this pattern is important. By trying the cheapest reliable provider first (Hex Proxies at $1.70/GB), the system only falls back to more expensive options when necessary — optimizing for both reliability and cost.

Pattern 5: Geographic Failover Mesh

For geo-targeted collection, implement failover across geographic regions. If proxies in one country fail, fall back to neighboring countries that may serve similar content.

from dataclasses import dataclass
from typing import Dict, List

# Define geographic fallback chains
GEO_FALLBACKS: Dict[str, List[str]] = {
    "us": ["us", "ca", "mx"],        # North America chain
    "gb": ["gb", "ie", "nl"],        # UK/Europe chain
    "de": ["de", "at", "ch"],        # DACH chain
    "fr": ["fr", "be", "ch"],        # Francophone chain
    "jp": ["jp", "kr", "sg"],        # East Asia chain
    "br": ["br", "ar", "co"],        # South America chain
    "au": ["au", "nz", "sg"],        # Oceania chain
}

def geo_failover_request(
    url: str, target_country: str,
    base_user: str, password: str
):
    """Try target country first, then geographic fallbacks."""
    countries = GEO_FALLBACKS.get(
        target_country, [target_country]
    )
    
    for country in countries:
        proxy_url = (
            f"http://{base_user}-country-{country}:{password}"
            f"@gate.hexproxies.com:8080"
        )
        try:
            response = httpx.get(
                url, proxies=proxy_url, timeout=15.0
            )
            if response.status_code == 200:
                return {
                    "response": response,
                    "country_used": country,
                    "was_fallback": country != target_country
                }
        except httpx.RequestError:
            continue
    
    return None

Best for: Geo-targeted operations where approximate geography is acceptable when the primary target fails.

Choosing the Right Pattern

PatternComplexityReliabilityBest For
1. Simple RetryLowGoodSmall-scale scraping, development
2. Type EscalationLowBetterCost-conscious operations with mixed proxy types
3. Health-AwareMediumHighProduction scraping at scale
4. Multi-ProviderMediumVery HighMission-critical data pipelines
5. Geo FailoverMediumHighMulti-market geo-targeted operations

Most teams should start with Pattern 1 (Simple Retry) and progress to Pattern 3 (Health-Aware) as their operations grow. Patterns 4 and 5 are justified when proxy downtime directly impacts revenue or compliance.

Monitoring Failover Health

Failover systems need their own monitoring to ensure they are working correctly:

  • Failover trigger rate: How often is failover activated? A high rate suggests the primary proxy needs attention
  • Failover success rate: When failover triggers, how often does it succeed? Below 95% means your fallback chain needs expansion
  • Time to recover: How long until the primary proxy is healthy again? Tracks provider reliability over time
  • Cost of failover: Track bandwidth consumed through fallback providers to understand the cost impact of outages

Implementation Tips

Exponential Backoff

Always use exponential backoff between retries. Linear delays waste time on transient failures and do not wait long enough for persistent issues:

import random

def get_backoff_delay(
    attempt: int,
    base_delay: float = 1.0,
    max_delay: float = 30.0
) -> float:
    """Calculate exponential backoff with jitter."""
    delay = min(base_delay * (2 ** attempt), max_delay)
    # Add jitter to prevent thundering herd
    jitter = random.uniform(0, delay * 0.3)
    return delay + jitter

Request Deduplication

When a request fails and is retried through a different proxy, ensure you are not creating duplicate data in your pipeline. Tag each request with a unique ID and deduplicate at the storage layer.

Cost Guards

Multi-provider failover can cause unexpected cost spikes if the primary provider has extended downtime. Implement cost guards:

  • Set daily bandwidth limits per provider
  • Alert when fallback provider usage exceeds normal levels
  • Automatically pause non-critical collection when costs exceed thresholds

Frequently Asked Questions

Do I really need proxy failover for a small scraping operation?

Pattern 1 (Simple Retry with Session Rotation) requires minimal code and handles 90% of failures. Even small operations benefit from basic retry logic. If you are scraping less than 10 GB/month through Hex Proxies residential proxies at $1.70/GB, Pattern 1 is sufficient. As operations grow, upgrade to Pattern 3 for automated health management.

How do I test my failover system?

Inject failures deliberately. Configure one proxy in your chain with invalid credentials to simulate a provider failure. Verify that your system fails over to the next provider and continues collecting data. Run these tests during development, not in production. Monitor the failover metrics to confirm the system behaves as designed.

What is the latency impact of failover?

Each failover attempt adds the timeout duration plus backoff delay. With a 15-second timeout and 2-second initial backoff, the first failover takes approximately 17 seconds. To minimize impact, set aggressive timeouts on the primary proxy (10-15 seconds) and use longer timeouts on fallback proxies (20-30 seconds). Most successful failovers complete within 20 seconds total.

Should I use the same provider for primary and failover proxies?

For proxy type failover (residential to ISP within Hex Proxies), using the same provider is fine and simplifies credential management. For provider-level failover, you need at least two providers — if the goal is surviving a provider outage, having all proxies from one provider defeats the purpose. Use Hex Proxies as your primary for cost efficiency and add a backup provider for redundancy. Visit our pricing page for current rates.

How many fallback levels do I need?

For most operations, 2-3 levels are sufficient: primary residential proxy, fallback ISP proxy, and optionally a second provider. Each additional level adds complexity with diminishing reliability returns. The sweet spot is Pattern 3 (Health-Aware Routing) with 3-5 proxy endpoints from mixed types — residential for broad access and ISP for stable fallback.