Proxy Failover Patterns: Automatic Provider Switching for High Availability
Last updated: April 2026 | Author: Hex Proxies Team
Every proxy will fail eventually. Residential IPs get banned by target sites. ISP proxies experience occasional downtime. Entire proxy providers can have outages. The question is not whether your proxy infrastructure will fail — it is whether your system handles the failure gracefully or loses data.
This guide presents five proxy failover patterns, progressing from simple to sophisticated. Each pattern addresses a different scale and reliability requirement. Choose the simplest pattern that meets your availability needs.
Understanding Proxy Failure Modes
Before implementing failover, understand what can go wrong:
| Failure Mode | Detection Signal | Typical Duration | Frequency |
|---|---|---|---|
| IP blocked by target | 403/429 response, CAPTCHA | Minutes to hours | Common |
| Proxy connection timeout | Connection timeout error | Seconds | Occasional |
| Proxy authentication failure | 407 response | Until credentials fixed | Rare |
| Provider gateway down | Connection refused | Minutes to hours | Rare |
| Geographic routing failure | Wrong geo in response | Variable | Occasional |
| Bandwidth exhausted | 402/429 from provider | Until top-up | Depends on plan |
Pattern 1: Simple Retry with Session Rotation
The most basic failover pattern: when a request fails, retry with a new proxy session (which gives a new IP).
import httpx
import uuid
import time
from dataclasses import dataclass
from typing import Optional
@dataclass(frozen=True)
class ProxyResponse:
success: bool
status_code: int
body: Optional[str]
attempts: int
final_session: str
def request_with_retry(
url: str,
base_user: str,
password: str,
country: str = "us",
max_retries: int = 3,
retry_delay: float = 2.0
) -> ProxyResponse:
"""Make a request with automatic proxy session rotation on failure."""
for attempt in range(1, max_retries + 1):
session_id = uuid.uuid4().hex[:8]
username = f"{base_user}-country-{country}-sessid-{session_id}"
proxy_url = (
f"http://{username}:{password}@gate.hexproxies.com:8080"
)
try:
response = httpx.get(
url,
proxies=proxy_url,
timeout=15.0,
follow_redirects=True
)
if response.status_code == 200:
return ProxyResponse(
True, 200, response.text, attempt, session_id
)
if response.status_code in (403, 429):
# Blocked — retry with new session
time.sleep(retry_delay * attempt)
continue
# Other status codes — return as-is
return ProxyResponse(
False, response.status_code, response.text,
attempt, session_id
)
except httpx.RequestError:
time.sleep(retry_delay * attempt)
continue
return ProxyResponse(False, 0, None, max_retries, "")
Pros: Simple, handles the most common failure (IP block). Cons: Does not handle provider-level outages.
Pattern 2: Proxy Type Escalation
Escalate through proxy types when lower-cost options fail. Start with residential (broadest compatibility), fall back to ISP (most reliable).
from dataclasses import dataclass
from typing import List, Optional
import httpx
import uuid
@dataclass(frozen=True)
class ProxyTier:
name: str
proxy_url: str
timeout: float
def create_escalation_chain(
resi_user: str, resi_pass: str,
isp_ips: List[str], isp_user: str, isp_pass: str,
country: str = "us"
) -> List[ProxyTier]:
"""Create a proxy escalation chain."""
session = uuid.uuid4().hex[:8]
resi_username = f"{resi_user}-country-{country}-sessid-{session}"
tiers = [
ProxyTier(
"residential-rotating",
f"http://{resi_user}-country-{country}:{resi_pass}"
f"@gate.hexproxies.com:8080",
15.0
),
ProxyTier(
"residential-sticky",
f"http://{resi_username}:{resi_pass}"
f"@gate.hexproxies.com:8080",
20.0
),
]
for ip in isp_ips:
tiers.append(
ProxyTier(
f"isp-{ip}",
f"http://{isp_user}:{isp_pass}@{ip}:8080",
25.0
)
)
return tiers
def request_with_escalation(
url: str, tiers: List[ProxyTier]
) -> Optional[httpx.Response]:
"""Try each proxy tier in order until one succeeds."""
for tier in tiers:
try:
response = httpx.get(
url,
proxies=tier.proxy_url,
timeout=tier.timeout,
follow_redirects=True
)
if response.status_code == 200:
return response
except httpx.RequestError:
continue
return None
Pros: Optimizes cost (cheaper options tried first) while maintaining reliability. Cons: Sequential attempts add latency.
Pattern 3: Health-Aware Routing
Track proxy health metrics and route requests to the healthiest available proxy. Unhealthy proxies are temporarily removed from the pool.
import time
from dataclasses import dataclass, field
from typing import Dict, Optional, List
import threading
import httpx
@dataclass
class ProxyHealth:
proxy_id: str
proxy_url: str
success_count: int = 0
failure_count: int = 0
last_success: float = 0.0
last_failure: float = 0.0
circuit_open: bool = False
circuit_open_until: float = 0.0
@property
def success_rate(self) -> float:
total = self.success_count + self.failure_count
if total == 0:
return 1.0
return self.success_count / total
@property
def is_available(self) -> bool:
if self.circuit_open:
if time.time() > self.circuit_open_until:
self.circuit_open = False
return True
return False
return True
class HealthAwareProxyRouter:
def __init__(
self,
proxies: List[str],
failure_threshold: int = 5,
circuit_timeout: float = 60.0
):
self._lock = threading.Lock()
self.failure_threshold = failure_threshold
self.circuit_timeout = circuit_timeout
self.health: Dict[str, ProxyHealth] = {}
for i, proxy_url in enumerate(proxies):
pid = f"proxy-{i}"
self.health[pid] = ProxyHealth(pid, proxy_url)
def get_best_proxy(self) -> Optional[ProxyHealth]:
"""Return the healthiest available proxy."""
with self._lock:
available = [
h for h in self.health.values() if h.is_available
]
if not available:
# All circuits open — try the one that closes soonest
return min(
self.health.values(),
key=lambda h: h.circuit_open_until
)
return max(available, key=lambda h: h.success_rate)
def record_success(self, proxy_id: str) -> None:
with self._lock:
h = self.health[proxy_id]
self.health[proxy_id] = ProxyHealth(
proxy_id, h.proxy_url,
h.success_count + 1, h.failure_count,
time.time(), h.last_failure,
False, 0.0
)
def record_failure(self, proxy_id: str) -> None:
with self._lock:
h = self.health[proxy_id]
new_failures = h.failure_count + 1
circuit_open = new_failures >= self.failure_threshold
circuit_until = (
time.time() + self.circuit_timeout
if circuit_open else 0.0
)
self.health[proxy_id] = ProxyHealth(
proxy_id, h.proxy_url,
h.success_count, new_failures,
h.last_success, time.time(),
circuit_open, circuit_until
)
def request(self, url: str, max_attempts: int = 3):
"""Route request through healthiest proxy."""
for _ in range(max_attempts):
proxy = self.get_best_proxy()
if proxy is None:
break
try:
response = httpx.get(
url,
proxies=proxy.proxy_url,
timeout=15.0
)
if response.status_code == 200:
self.record_success(proxy.proxy_id)
return response
self.record_failure(proxy.proxy_id)
except httpx.RequestError:
self.record_failure(proxy.proxy_id)
return None
The circuit breaker pattern prevents the system from repeatedly hitting a known-bad proxy. After 5 consecutive failures, a proxy is taken out of rotation for 60 seconds, then automatically retested.
Pros: Self-healing, routes around problems automatically. Cons: More complex, requires stateful health tracking.
Pattern 4: Multi-Provider Failover
For mission-critical operations, run multiple proxy providers simultaneously and fail over between them when one has issues.
from dataclasses import dataclass
from typing import List, Optional
import httpx
import time
@dataclass(frozen=True)
class ProxyProvider:
name: str
proxy_url: str
priority: int # Lower = higher priority
cost_per_gb: float
def create_provider_chain() -> List[ProxyProvider]:
return [
ProxyProvider(
name="hex-residential",
proxy_url=(
"http://USER-country-us:PASS"
"@gate.hexproxies.com:8080"
),
priority=1,
cost_per_gb=1.70
),
ProxyProvider(
name="hex-isp",
proxy_url="http://USER:PASS@ISP_IP:8080",
priority=2,
cost_per_gb=0.00 # Fixed cost per IP
),
ProxyProvider(
name="backup-provider",
proxy_url=(
"http://BACKUP_USER:BACKUP_PASS"
"@backup.provider.com:8080"
),
priority=3,
cost_per_gb=3.50
),
]
def failover_request(
url: str, providers: List[ProxyProvider]
) -> Optional[httpx.Response]:
"""Try providers in priority order."""
sorted_providers = sorted(providers, key=lambda p: p.priority)
for provider in sorted_providers:
try:
response = httpx.get(
url,
proxies=provider.proxy_url,
timeout=15.0
)
if response.status_code == 200:
log_provider_usage(
provider.name, len(response.content)
)
return response
except httpx.RequestError:
log_provider_failure(provider.name, url)
continue
return None
The cost-awareness in this pattern is important. By trying the cheapest reliable provider first (Hex Proxies at $1.70/GB), the system only falls back to more expensive options when necessary — optimizing for both reliability and cost.
Pattern 5: Geographic Failover Mesh
For geo-targeted collection, implement failover across geographic regions. If proxies in one country fail, fall back to neighboring countries that may serve similar content.
from dataclasses import dataclass
from typing import Dict, List
# Define geographic fallback chains
GEO_FALLBACKS: Dict[str, List[str]] = {
"us": ["us", "ca", "mx"], # North America chain
"gb": ["gb", "ie", "nl"], # UK/Europe chain
"de": ["de", "at", "ch"], # DACH chain
"fr": ["fr", "be", "ch"], # Francophone chain
"jp": ["jp", "kr", "sg"], # East Asia chain
"br": ["br", "ar", "co"], # South America chain
"au": ["au", "nz", "sg"], # Oceania chain
}
def geo_failover_request(
url: str, target_country: str,
base_user: str, password: str
):
"""Try target country first, then geographic fallbacks."""
countries = GEO_FALLBACKS.get(
target_country, [target_country]
)
for country in countries:
proxy_url = (
f"http://{base_user}-country-{country}:{password}"
f"@gate.hexproxies.com:8080"
)
try:
response = httpx.get(
url, proxies=proxy_url, timeout=15.0
)
if response.status_code == 200:
return {
"response": response,
"country_used": country,
"was_fallback": country != target_country
}
except httpx.RequestError:
continue
return None
Best for: Geo-targeted operations where approximate geography is acceptable when the primary target fails.
Choosing the Right Pattern
| Pattern | Complexity | Reliability | Best For |
|---|---|---|---|
| 1. Simple Retry | Low | Good | Small-scale scraping, development |
| 2. Type Escalation | Low | Better | Cost-conscious operations with mixed proxy types |
| 3. Health-Aware | Medium | High | Production scraping at scale |
| 4. Multi-Provider | Medium | Very High | Mission-critical data pipelines |
| 5. Geo Failover | Medium | High | Multi-market geo-targeted operations |
Most teams should start with Pattern 1 (Simple Retry) and progress to Pattern 3 (Health-Aware) as their operations grow. Patterns 4 and 5 are justified when proxy downtime directly impacts revenue or compliance.
Monitoring Failover Health
Failover systems need their own monitoring to ensure they are working correctly:
- Failover trigger rate: How often is failover activated? A high rate suggests the primary proxy needs attention
- Failover success rate: When failover triggers, how often does it succeed? Below 95% means your fallback chain needs expansion
- Time to recover: How long until the primary proxy is healthy again? Tracks provider reliability over time
- Cost of failover: Track bandwidth consumed through fallback providers to understand the cost impact of outages
Implementation Tips
Exponential Backoff
Always use exponential backoff between retries. Linear delays waste time on transient failures and do not wait long enough for persistent issues:
import random
def get_backoff_delay(
attempt: int,
base_delay: float = 1.0,
max_delay: float = 30.0
) -> float:
"""Calculate exponential backoff with jitter."""
delay = min(base_delay * (2 ** attempt), max_delay)
# Add jitter to prevent thundering herd
jitter = random.uniform(0, delay * 0.3)
return delay + jitter
Request Deduplication
When a request fails and is retried through a different proxy, ensure you are not creating duplicate data in your pipeline. Tag each request with a unique ID and deduplicate at the storage layer.
Cost Guards
Multi-provider failover can cause unexpected cost spikes if the primary provider has extended downtime. Implement cost guards:
- Set daily bandwidth limits per provider
- Alert when fallback provider usage exceeds normal levels
- Automatically pause non-critical collection when costs exceed thresholds
Frequently Asked Questions
Do I really need proxy failover for a small scraping operation?
Pattern 1 (Simple Retry with Session Rotation) requires minimal code and handles 90% of failures. Even small operations benefit from basic retry logic. If you are scraping less than 10 GB/month through Hex Proxies residential proxies at $1.70/GB, Pattern 1 is sufficient. As operations grow, upgrade to Pattern 3 for automated health management.
How do I test my failover system?
Inject failures deliberately. Configure one proxy in your chain with invalid credentials to simulate a provider failure. Verify that your system fails over to the next provider and continues collecting data. Run these tests during development, not in production. Monitor the failover metrics to confirm the system behaves as designed.
What is the latency impact of failover?
Each failover attempt adds the timeout duration plus backoff delay. With a 15-second timeout and 2-second initial backoff, the first failover takes approximately 17 seconds. To minimize impact, set aggressive timeouts on the primary proxy (10-15 seconds) and use longer timeouts on fallback proxies (20-30 seconds). Most successful failovers complete within 20 seconds total.
Should I use the same provider for primary and failover proxies?
For proxy type failover (residential to ISP within Hex Proxies), using the same provider is fine and simplifies credential management. For provider-level failover, you need at least two providers — if the goal is surviving a provider outage, having all proxies from one provider defeats the purpose. Use Hex Proxies as your primary for cost efficiency and add a backup provider for redundancy. Visit our pricing page for current rates.
How many fallback levels do I need?
For most operations, 2-3 levels are sufficient: primary residential proxy, fallback ISP proxy, and optionally a second provider. Each additional level adds complexity with diminishing reliability returns. The sweet spot is Pattern 3 (Health-Aware Routing) with 3-5 proxy endpoints from mixed types — residential for broad access and ISP for stable fallback.