Migrating Proxy Providers Without Downtime: A Step-by-Step Playbook
Switching proxy providers is an operational risk. Your scraping infrastructure depends on the proxy layer for every request -- a misconfigured migration can halt data collection for hours or days. This playbook provides a zero-downtime migration strategy using a dual-provider architecture with progressive traffic shifting.
The approach is the same used by engineering teams migrating load balancers, CDN providers, or database connections: run both systems in parallel, shift traffic gradually, validate at each step, and maintain instant rollback capability.
For provider-specific migration guides, see our pages for migrating from Bright Data, migrating from Oxylabs, and migrating from Smartproxy.
Pre-Migration Checklist
Before writing any migration code, establish baselines and prepare the new provider.
1. Document Current Performance
Record your current provider's metrics for at least 7 days before starting:
from dataclasses import dataclass
from typing import Optional
@dataclass(frozen=True)
class ProviderBaseline:
"""Immutable baseline metrics for comparison."""
provider_name: str
measurement_period_days: int
total_requests: int
success_rate: float # 0.0 - 1.0
p50_latency_ms: float
p95_latency_ms: float
p99_latency_ms: float
timeout_rate: float # 0.0 - 1.0
block_rate: float # 0.0 - 1.0
avg_bandwidth_gb_per_day: float
monthly_cost: float
@property
def cost_per_successful_request(self):
successful = self.total_requests * self.success_rate
if successful == 0:
return float('inf')
daily_cost = self.monthly_cost / 30
return (daily_cost * self.measurement_period_days) / successful
# Example baseline
current_baseline = ProviderBaseline(
provider_name="Current Provider",
measurement_period_days=7,
total_requests=350_000,
success_rate=0.923,
p50_latency_ms=185,
p95_latency_ms=420,
p99_latency_ms=890,
timeout_rate=0.031,
block_rate=0.046,
avg_bandwidth_gb_per_day=12.5,
monthly_cost=1200.00,
)
2. Test the New Provider Independently
Before integrating the new provider into production, run an isolated benchmark using the same targets and workload:
import time
import statistics
import requests
from concurrent.futures import ThreadPoolExecutor
def benchmark_provider(proxy_url, targets, requests_per_target=50, concurrency=5):
"""Run a standalone benchmark against a proxy provider.
Returns metrics comparable to ProviderBaseline.
"""
results = []
def single_request(target):
start = time.monotonic()
try:
resp = requests.get(
target,
proxies={"http": proxy_url, "https": proxy_url},
timeout=30,
)
elapsed = (time.monotonic() - start) * 1000
# Check for soft blocks
is_blocked = (
resp.status_code in (403, 429)
or "captcha" in resp.text.lower()
)
return {
"success": resp.status_code == 200 and not is_blocked,
"blocked": is_blocked,
"timeout": False,
"latency_ms": elapsed,
}
except requests.Timeout:
elapsed = (time.monotonic() - start) * 1000
return {
"success": False,
"blocked": False,
"timeout": True,
"latency_ms": elapsed,
}
except requests.RequestException:
elapsed = (time.monotonic() - start) * 1000
return {
"success": False,
"blocked": False,
"timeout": False,
"latency_ms": elapsed,
}
with ThreadPoolExecutor(max_workers=concurrency) as pool:
futures = []
for target in targets:
for _ in range(requests_per_target):
futures.append(pool.submit(single_request, target))
for f in futures:
results.append(f.result())
total = len(results)
successes = [r for r in results if r["success"]]
latencies = sorted(r["latency_ms"] for r in successes)
return {
"total": total,
"success_rate": len(successes) / total if total > 0 else 0,
"p50_latency": latencies[len(latencies) // 2] if latencies else 0,
"p95_latency": latencies[int(len(latencies) * 0.95)] if latencies else 0,
"timeout_rate": len([r for r in results if r["timeout"]]) / total,
"block_rate": len([r for r in results if r["blocked"]]) / total,
}
# Benchmark the new provider
new_provider_results = benchmark_provider(
proxy_url="http://USER:PASS@gate.hexproxies.com:8080",
targets=[
"https://httpbin.org/ip",
"https://httpbin.org/headers",
# Add your actual target URLs
],
)
Go/No-Go criteria: The new provider should meet or exceed the current provider's baseline on success rate, P95 latency, and block rate. If it falls short, address configuration issues before proceeding with migration.
3. Prepare Configuration
Create a configuration structure that supports multiple providers simultaneously:
from dataclasses import dataclass
from typing import Optional
@dataclass(frozen=True)
class ProxyProviderConfig:
"""Immutable proxy provider configuration."""
name: str
proxy_url_template: str # e.g., "http://{user}:{pass}@host:port"
username: str
password: str
supports_sessions: bool
session_param: str # How to specify session ID
max_concurrency: int
def build_url(self, session_id=None, country=None):
"""Build a proxy URL with optional session and geo targeting."""
user_parts = [self.username]
if session_id and self.supports_sessions:
user_parts.append(f"{self.session_param}{session_id}")
if country:
user_parts.append(f"country-{country}")
user = "-".join(user_parts)
return self.proxy_url_template.format(
user=user, password=self.password
)
# Define both providers
current_provider = ProxyProviderConfig(
name="current-provider",
proxy_url_template="http://{user}:{password}@old-provider.com:8080",
username="old_user",
password="old_pass",
supports_sessions=True,
session_param="session-",
max_concurrency=100,
)
new_provider = ProxyProviderConfig(
name="hex-proxies",
proxy_url_template="http://{user}:{password}@gate.hexproxies.com:8080",
username="USER",
password="PASS",
supports_sessions=True,
session_param="session-",
max_concurrency=200,
)
The Migration Architecture
Dual-Provider Router
The core of zero-downtime migration is a proxy router that distributes requests between the old and new provider based on a configurable traffic split:
import random
import time
import logging
from dataclasses import dataclass
from typing import Optional
logger = logging.getLogger(__name__)
@dataclass(frozen=True)
class TrafficSplit:
"""Immutable traffic split configuration."""
new_provider_percentage: int # 0-100
updated_at: float
reason: str
@property
def use_new_provider(self):
"""Randomly decide whether to use the new provider."""
return random.randint(1, 100) <= self.new_provider_percentage
class DualProviderRouter:
"""Routes requests between two proxy providers with configurable split.
Supports progressive traffic shifting and automatic rollback
based on error rate thresholds.
"""
def __init__(
self,
old_provider,
new_provider,
initial_split_pct=0,
error_rate_rollback_threshold=0.15,
):
self.old_provider = old_provider
self.new_provider = new_provider
self._split = TrafficSplit(
new_provider_percentage=initial_split_pct,
updated_at=time.time(),
reason="initial",
)
self.rollback_threshold = error_rate_rollback_threshold
# Metrics tracking (internal mutable state)
self._new_requests = 0
self._new_failures = 0
self._old_requests = 0
self._old_failures = 0
self._metrics_window_start = time.time()
def get_proxy_url(self, session_id=None, country=None):
"""Get a proxy URL based on the current traffic split.
Returns tuple of (proxy_url, provider_name).
"""
if self._split.use_new_provider:
return (
self.new_provider.build_url(session_id, country),
self.new_provider.name,
)
return (
self.old_provider.build_url(session_id, country),
self.old_provider.name,
)
def record_result(self, provider_name, success):
"""Record a request result for monitoring."""
if provider_name == self.new_provider.name:
self._new_requests += 1
if not success:
self._new_failures += 1
else:
self._old_requests += 1
if not success:
self._old_failures += 1
# Check for automatic rollback
self._check_rollback()
def set_split(self, percentage, reason="manual"):
"""Update the traffic split percentage."""
self._split = TrafficSplit(
new_provider_percentage=percentage,
updated_at=time.time(),
reason=reason,
)
logger.info(
"Traffic split updated to %d%% new provider (reason: %s)",
percentage, reason,
)
# Reset metrics window
self._reset_metrics()
def _check_rollback(self):
"""Auto-rollback if new provider error rate exceeds threshold."""
if self._new_requests < 100:
return # Not enough data
new_error_rate = self._new_failures / self._new_requests
if new_error_rate > self.rollback_threshold:
logger.warning(
"AUTO-ROLLBACK: New provider error rate %.1f%% exceeds "
"threshold %.1f%%. Rolling back to 0%%.",
new_error_rate * 100,
self.rollback_threshold * 100,
)
self.set_split(0, reason="auto-rollback")
def _reset_metrics(self):
"""Reset metrics counters for new measurement window."""
self._new_requests = 0
self._new_failures = 0
self._old_requests = 0
self._old_failures = 0
self._metrics_window_start = time.time()
def get_metrics(self):
"""Get current metrics for both providers."""
elapsed = time.time() - self._metrics_window_start
new_success = (
1 - (self._new_failures / self._new_requests)
if self._new_requests > 0 else None
)
old_success = (
1 - (self._old_failures / self._old_requests)
if self._old_requests > 0 else None
)
return {
"split_pct": self._split.new_provider_percentage,
"window_seconds": round(elapsed),
"new_provider": {
"requests": self._new_requests,
"success_rate": round(new_success, 4) if new_success else None,
},
"old_provider": {
"requests": self._old_requests,
"success_rate": round(old_success, 4) if old_success else None,
},
}
Integration with Existing Code
Replace your existing proxy URL construction with the router:
# Before migration (single provider)
# proxy_url = build_proxy_url(session_id)
# response = requests.get(target, proxies={"https": proxy_url})
# During migration (dual provider)
router = DualProviderRouter(
old_provider=current_provider,
new_provider=new_provider,
initial_split_pct=0, # Start with 0% to new provider
)
proxy_url, provider_name = router.get_proxy_url(session_id=session_id)
try:
response = requests.get(target, proxies={"https": proxy_url}, timeout=30)
success = response.status_code == 200
except Exception:
success = False
router.record_result(provider_name, success)
The Migration Timeline
Day 1-2: Canary (5% Traffic)
router.set_split(5, reason="canary-start")
Run 5% of traffic through the new provider. Monitor:
- Success rate compared to old provider (should be within 2 percentage points)
- P95 latency compared to old provider (should be within 20%)
- Any new error patterns (auth failures, connection issues)
Validation check:
def validate_canary(router, min_requests=500):
"""Validate canary results before proceeding."""
metrics = router.get_metrics()
new = metrics["new_provider"]
old = metrics["old_provider"]
if new["requests"] < min_requests:
return {"status": "insufficient_data", "proceed": False}
if new["success_rate"] is None or old["success_rate"] is None:
return {"status": "missing_metrics", "proceed": False}
success_delta = old["success_rate"] - new["success_rate"]
if success_delta > 0.02: # New provider more than 2% worse
return {
"status": "degraded",
"proceed": False,
"detail": f"New provider success rate {success_delta*100:.1f}% lower",
}
return {
"status": "healthy",
"proceed": True,
"new_success_rate": new["success_rate"],
"old_success_rate": old["success_rate"],
}
Day 3-4: Expand to 25%
validation = validate_canary(router)
if validation["proceed"]:
router.set_split(25, reason="expand-25pct")
At 25%, you are testing the new provider under meaningful load. This exposes concurrency issues, rate limit behavior, and session management differences.
Day 5-6: Expand to 50%
router.set_split(50, reason="expand-50pct")
Equal split. Both providers handle production load. Compare performance metrics side by side. This is the critical validation stage.
Day 7-8: Expand to 90%
router.set_split(90, reason="expand-90pct")
The new provider handles almost all traffic. The old provider serves as a fallback. This tests the new provider at full production scale.
Day 9-10: Complete Migration (100%)
router.set_split(100, reason="migration-complete")
All traffic goes to the new provider. Keep the old provider configured (but unused) for 14 days as a rollback path.
Day 24+: Decommission Old Provider
After 14 days at 100% on the new provider with stable metrics:
- Remove old provider configuration
- Cancel old provider subscription
- Replace the
DualProviderRouterwith a direct reference to the new provider - Clean up migration code
Rollback Strategies
Automatic Rollback
The DualProviderRouter includes automatic rollback when the new provider's error rate exceeds the threshold. This protects against:
- New provider outages during migration
- Authentication or configuration issues
- Rate limiting from the new provider
Manual Rollback
At any stage, revert to the old provider:
router.set_split(0, reason="manual-rollback")
Rollback Decision Matrix
| Signal | Severity | Action |
|---|---|---|
| New provider success rate 2%+ below old | Warning | Investigate, hold at current split |
| New provider success rate 5%+ below old | High | Rollback one stage (e.g., 50% → 25%) |
| New provider returning auth errors | Critical | Immediate rollback to 0% |
| New provider timeout rate doubles | High | Rollback one stage |
| New provider complete outage | Critical | Immediate rollback to 0% |
Common Migration Pitfalls
1. Session Format Differences
Different providers use different session parameter formats. Verify your session IDs work with the new provider before starting traffic.
2. Authentication Format
Some providers use user:pass while others use user-session-xxx:pass or custom header-based auth. Test authentication independently.
3. Geo-Targeting Syntax
Geo-targeting parameters vary: country-us, cc-US, geo=us. Map your existing geo-targeting to the new provider's format.
4. Rate Limit Differences
Your old provider may allow 100 concurrent connections while the new one limits to 50 (or vice versa). Check concurrency limits before shifting traffic.
5. IP Pool Overlap
In rare cases, two providers may share IP pools (through upstream sources). This means IPs burned on the old provider are also burned on the new one. Test success rates against your hardest targets during the canary phase.
Frequently Asked Questions
How long should the full migration take?
Plan for 10-14 days of active migration plus 14 days of old-provider standby. Rushing the migration (e.g., jumping from 0% to 100% in one day) eliminates the validation stages that catch problems.
Can I migrate in one step if I am in a hurry?
You can, but you lose the safety net. If you must migrate quickly (e.g., old provider is being decommissioned), at minimum run a 500-request benchmark against your actual targets before switching. Keep the old provider as a rollback option for 48 hours.
Should I notify my current provider that I am migrating?
There is no obligation to. However, some providers offer retention pricing when notified of a migration, which can be useful as negotiating leverage. Wait until after you have validated the new provider before canceling.
What if the new provider is better for some targets and worse for others?
This is common. Some providers have better success rates against specific anti-bot systems. Consider a permanent dual-provider architecture where you route requests to the best provider per target domain, rather than migrating entirely.
Do I need to re-architect my code for migration?
If your code currently hardcodes a proxy URL string, yes -- you need to introduce a routing layer. The DualProviderRouter pattern above is a minimal abstraction. For well-architected systems that already use a proxy configuration layer, migration is a configuration change, not a code change.
Zero-downtime proxy migration is an engineering discipline, not a flip-the-switch operation. The dual-provider approach with progressive traffic shifting gives you data-driven confidence at each stage. Hex Proxies supports session-based routing, HTTP CONNECT, and SOCKS5 -- compatible with any migration architecture. ISP proxies from $2.08/IP, residential from $4.25/GB. Start with a test plan or see our provider migration guides.