Proxy Migration Playbook: Switching Providers Without Downtime
Switching proxy providers is a high-stakes operational task. Your proxies are a live dependency for scraping pipelines, automation workflows, and business-critical data collection. A botched migration means downtime, failed jobs, and lost data. A well-executed migration means better performance, lower costs, and zero disruption.
This playbook gives you a repeatable, step-by-step process for migrating proxy providers with zero downtime. It covers when to migrate, how to audit your current setup, how to test the new provider in parallel, how to shift traffic gradually, and how to roll back if anything goes wrong.
---
Quick Answer
**Migrate proxy providers using a five-phase approach: (1) decide with a cost-benefit framework, (2) audit your current setup to establish baselines, (3) run parallel tests comparing old and new providers on your actual workload, (4) shift traffic gradually from 10% to 50% to 100% with monitoring at each stage, and (5) keep the old provider on standby for 2 weeks as a rollback safety net.** Never do a hard cutover. The gradual approach catches performance issues before they affect your entire operation.
---
Phase 1: The Migration Decision Framework
Not every frustration with your current provider justifies migration. Switching has real costs: engineering time, risk of disruption, and a learning curve with the new provider. Use this framework to make a rational decision.
When Migration Is Justified
| Signal | Severity | Justification | |---|---|---| | Success rate dropped 15%+ over 3 months | High | IP pool quality is degrading without recovery | | Price increased 25%+ without performance improvement | High | The provider is raising prices to compensate for customer loss | | Consistent SLA violations (3+ in 6 months) | High | Infrastructure reliability is inadequate | | Support response time exceeds 24 hours consistently | Medium | Operational issues will take too long to resolve during incidents | | Missing critical features (geo-targeting, API, rotation control) | Medium | Provider cannot support your evolving use case | | Better provider available at 30%+ cost savings for equivalent performance | Medium | Market has moved; you are overpaying |
When Migration Is NOT Justified
| Signal | Better Action | |---|---| | Occasional bad IP (1--2% of pool) | Report to provider; request IP replacement | | Temporary performance dip (1--2 weeks) | Monitor; may be target site changes, not provider | | Missing minor feature | Request feature from current provider | | Slightly cheaper alternative (< 15% savings) | Negotiate with current provider first | | New provider has better marketing | Benchmark before deciding; marketing is not performance |
Cost-Benefit Calculation
Estimate migration cost: - **Engineering time:** Hours x hourly rate for configuration, testing, and monitoring - **Risk cost:** Potential revenue impact if migration causes downtime (even with rollback) - **Opportunity cost:** What else could the engineering team build during migration time? - **Ongoing savings:** Monthly cost difference x 12 months
**Rule of thumb:** If annual savings exceed 3x the migration cost, proceed. Below that, the risk may not be worth it.
---
Phase 2: Pre-Migration Audit
Before touching any configuration, document everything about your current setup. This serves two purposes: it gives you a rollback target and it reveals requirements you might otherwise forget.
Configuration Inventory
Document every detail of your current proxy configuration:
| Item | Current Value | Notes | |---|---|---| | Provider name | -- | -- | | Proxy type(s) used | Residential / ISP / Datacenter | List all types | | Authentication method | User:pass / IP whitelist / Token | -- | | Proxy endpoint(s) | Host:port | Include all endpoints (gateway, country-specific, etc.) | | Protocol | HTTP / HTTPS / SOCKS5 | -- | | Rotation method | Per-request / Sticky session / Manual | Session duration if sticky | | Geo-targeting | Countries / Cities / ASNs | List all locations used | | Concurrency | Max concurrent requests | Check provider plan limits | | Monthly bandwidth/IP usage | GB or IP count | Last 3 months average | | Monthly cost | $/month | Including overages | | API integrations | Dashboard API, usage API, IP management | Endpoints and auth tokens |
Dependency Mapping
List every system that uses the proxy:
| System | Proxy Usage | Config Location | Update Method | |---|---|---|---| | Scraping pipeline | Residential rotation | `/config/proxy.yaml` | Restart service | | Browser automation | ISP static | Environment variables | Redeploy | | Price monitor | Residential geo-targeted | Database settings table | Hot reload | | Account manager | ISP dedicated IPs | `.env` file | Restart PM2 process |
Performance Baselines
Collect performance data from the last 30 days:
- **Success rate** per target site
- **Latency p50, p95, p99** per target site
- **Daily bandwidth consumption**
- **Daily request volume**
- **Error rate breakdown** (timeout, auth failure, blocked, other)
- **Cost per successful request**
These baselines are your comparison benchmark for the new provider.
---
Phase 3: Parallel Testing
Run the new provider alongside your current one without affecting production traffic. This is the most critical phase -- it tells you whether the new provider actually performs better for your specific workload.
Shadow Testing Method
Route a copy of your production requests to the new provider without using the responses:
// shadow-test.mjs -- send same request through both providers
async function shadowTest(url) {
const currentProxy = process.env.CURRENT_PROXY;// Production request -- uses current provider const prodResult = await fetchWithProxy(url, currentProxy);
// Shadow request -- uses new provider, result is logged but not used const shadowResult = await fetchWithProxy(url, newProxy);
// Log comparison logComparison({ url, current: { status: prodResult.status, latency: prodResult.latency, success: prodResult.success, }, new: { status: shadowResult.status, latency: shadowResult.latency, success: shadowResult.success, }, });
// Only return the production result return prodResult; } ```
A/B Testing Method
For workloads where duplicate requests are not appropriate (e.g., account actions), use random assignment:
function selectProxy(abPercentage) {
// abPercentage = 0.10 means 10% to new provider
const useNew = Math.random() < abPercentage;
return {
proxy: useNew ? process.env.NEW_PROXY : process.env.CURRENT_PROXY,
provider: useNew ? 'new' : 'current',
};
}Minimum Parallel Test Duration
| Workload Volume | Minimum Test Duration | Minimum Requests Per Provider | |---|---|---| | < 10K requests/day | 7 days | 50,000 | | 10K -- 100K requests/day | 5 days | 100,000 | | > 100K requests/day | 3 days | 100,000 |
Evaluation Criteria
Compare the parallel test results on these metrics:
| Metric | Current Provider | New Provider | Delta | Acceptable? | |---|---|---|---|---| | Success rate (easy targets) | -- | -- | -- | New >= Current | | Success rate (hard targets) | -- | -- | -- | New >= Current - 2% | | Latency p50 | -- | -- | -- | New <= Current + 20% | | Latency p99 | -- | -- | -- | New <= Current + 50% | | Error rate | -- | -- | -- | New <= Current | | Cost per 1K successful requests | -- | -- | -- | New <= Current |
If the new provider meets all criteria, proceed to traffic shifting. If it fails on any critical metric (success rate, error rate), investigate before proceeding.
---
Phase 4: Gradual Traffic Shifting
This is the core of the zero-downtime migration. Shift traffic from the old provider to the new one in controlled stages.
Stage 1: 10% Traffic (Days 1--3)
Route 10% of production traffic to the new provider:
// traffic-router.mjs
const MIGRATION_PERCENTAGE = parseFloat(
process.env.MIGRATION_PERCENTAGE || '0.10'function getProxy() { const useNew = Math.random() < MIGRATION_PERCENTAGE; return useNew ? { host: process.env.NEW_PROXY_HOST, user: process.env.NEW_PROXY_USER, pass: process.env.NEW_PROXY_PASS, provider: 'new', } : { host: process.env.CURRENT_PROXY_HOST, user: process.env.CURRENT_PROXY_USER, pass: process.env.CURRENT_PROXY_PASS, provider: 'current', }; } ```
**Monitoring checklist at 10%:** - [ ] Success rate on new provider matches or exceeds parallel test results - [ ] No increase in overall error rate - [ ] Latency within expected range - [ ] No customer-facing impact - [ ] Cost tracking aligns with projections
**Hold at 10% for 3 days minimum** before proceeding. This catches issues that only appear under sustained load.
Stage 2: 50% Traffic (Days 4--7)
Update `MIGRATION_PERCENTAGE` to `0.50`:
This is the highest-risk stage. At 50%, both providers handle significant load, and any issues with the new provider affect half your traffic.
**Monitoring checklist at 50%:** - [ ] All Stage 1 checks pass - [ ] IP pool diversity remains adequate (no IP reuse issues) - [ ] Bandwidth consumption on new provider matches expectations - [ ] Support responsiveness tested (file a test ticket) - [ ] No degradation on specific target sites
**Hold at 50% for 4 days minimum.**
Stage 3: 100% Traffic (Days 8+)
Update `MIGRATION_PERCENTAGE` to `1.0`:
All production traffic now flows through the new provider. But do NOT cancel or disconnect the old provider yet.
**Monitoring checklist at 100%:** - [ ] All Stage 2 checks pass for 72 hours - [ ] Total cost within 10% of projection - [ ] No IP quality degradation at full volume - [ ] All dependent systems functioning correctly - [ ] Performance baselines match or exceed pre-migration baselines
Timeline Summary
| Stage | Traffic Split | Duration | Total Calendar Days | |---|---|---|---| | Parallel testing | 0% (shadow) | 3--7 days | Days 1--7 | | Stage 1 | 10% new | 3 days minimum | Days 8--10 | | Stage 2 | 50% new | 4 days minimum | Days 11--14 | | Stage 3 | 100% new | 3 days minimum | Days 15--17 | | Old provider standby | 0% (standby) | 14 days | Days 18--31 | | **Total migration window** | -- | -- | **~4 weeks** |
---
Phase 5: Rollback Plan
Things go wrong. Have a rollback plan ready before you start the migration.
Rollback Triggers
Initiate rollback if any of these occur:
- Success rate drops more than 10% below baseline for 1 hour
- Error rate exceeds 2x baseline for 30 minutes
- Complete proxy outage lasting more than 5 minutes
- Authentication failures across all requests
- Customer-reported issues tied to proxy performance
Rollback Procedure
**Immediate rollback (< 2 minutes):**
# Set migration percentage to 0 (all traffic to old provider)# Restart dependent services to pick up the change # (or use hot-reload if supported) pm2 restart scraping-service pm2 restart automation-service ```
**Full rollback checklist:**
- Set `MIGRATION_PERCENTAGE` to `0.0` across all services
- Verify all traffic is flowing through the old provider
- Confirm success rate returns to baseline within 10 minutes
- Notify stakeholders of the rollback and reason
- Document the failure mode for investigation
- Do NOT retry migration until the root cause is identified and resolved
Rollback Insurance
Keep these guarantees in place for 14 days after 100% migration:
- Old provider subscription remains active
- Old provider credentials remain valid and tested
- Old provider configuration files remain in version control
- Monitoring dashboards include old provider metrics
- Rollback can be executed by any team member (not just the person who set up the migration)
---
Post-Migration Validation
After running at 100% on the new provider for 14 days without rollback:
Final Validation Checklist
- [ ] Success rate meets or exceeds pre-migration baseline for 14 consecutive days
- [ ] Latency p95 meets or exceeds pre-migration baseline
- [ ] Monthly cost is within 10% of projected savings
- [ ] All dependent systems stable (no proxy-related errors in logs)
- [ ] IP quality audit passed (blacklist check, subnet diversity check)
- [ ] Support ticket filed and resolved within SLA during the migration period
Decommission Old Provider
Only after the validation checklist is complete:
- Cancel the old provider subscription
- Revoke old provider API keys and credentials
- Remove old provider configuration from all environments
- Update documentation to reference the new provider
- Archive migration logs and comparison data for future reference
---
Migration Configuration Template
Use this template to centralize your migration configuration. Store it as an environment variable or in your configuration management system:
# proxy-migration-config.yaml
migration:
status: active # active | paused | complete | rolled-back
percentage: 0.50 # 0.0 to 1.0
started: '2026-04-01'current_provider: name: 'Provider A' host: 'old-provider-gateway:8080' auth_method: 'user_pass'
new_provider: name: 'Hex Proxies' host: 'gate.hexproxies.com:8080' auth_method: 'user_pass'
rollback: trigger_success_rate_drop: 0.10 # 10% below baseline trigger_error_rate_multiple: 2.0 # 2x baseline trigger_outage_minutes: 5 max_rollback_time_minutes: 2
monitoring: dashboard_url: 'https://monitoring.example.com/proxy-migration' alert_channels: - '#proxy-alerts' - 'oncall@example.com' ```
---
Frequently Asked Questions
**How long does a proxy migration typically take?** Plan for 4 weeks end-to-end: 1 week of parallel testing, 2 weeks of gradual traffic shifting, and 1 week of post-migration validation. Rushing the process increases risk.
**Can I migrate without any downtime?** Yes, if you follow the gradual traffic shifting approach. At no point is 100% of your traffic dependent on an untested provider. The rollback mechanism ensures you can revert within 2 minutes if issues arise.
**Should I migrate all proxy types at once or one at a time?** One at a time. If you use both residential and ISP proxies, migrate the lower-risk workload first (usually the one with lower volume or less business impact). Apply learnings from the first migration to the second.
**What if the new provider's proxy format is different?** Abstract your proxy configuration behind a provider-agnostic interface. Instead of hardcoding `http://user:pass@host:port`, use a configuration layer that translates your standard format to each provider's specific format. This also makes future migrations easier.
**How do I handle sticky session migration?** Sticky sessions (where the same IP must be used across multiple requests) are the hardest to migrate because mid-session proxy switches break workflows. Complete all active sessions on the old provider before migrating sticky-session workloads. Do not mix providers within a single session.