How to Set Up Rotating Proxies in Python: Complete Tutorial
Python is the most popular language for web scraping and automation, and rotating proxies are essential for any serious data collection project. This tutorial covers four Python HTTP libraries — requests, aiohttp, httpx, and Scrapy — with complete, runnable code examples that include authentication, rotation configuration, error handling, retry logic, and session management. Every example uses the Hex Proxies gateway, but the patterns apply to any proxy service that supports username/password authentication.
By the end of this guide, you will have production-ready proxy integration code for whichever Python library your project uses.
Prerequisites
Before starting, you need:
- Python 3.9 or later installed
- A Hex Proxies account with username and password credentials
- The proxy gateway address:
gate.hexproxies.com(port 8080 for HTTP/HTTPS, port 1080 for SOCKS5)
Install the libraries you plan to use:
# Choose one or more
pip install requests
pip install aiohttp
pip install httpx
pip install scrapy1. Rotating Proxies with Requests
The requests library is Python's most widely used HTTP client. Proxy configuration is straightforward through the proxies parameter.
Basic Rotating Proxy Setup
import requests
PROXY_USER = "YOUR_USERNAME"
PROXY_PASS = "YOUR_PASSWORD"
PROXY_HOST = "gate.hexproxies.com"
PROXY_PORT = 8080
proxy_url = f"http://{PROXY_USER}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}"
proxies = {
"http": proxy_url,
"https": proxy_url,
}
# Each request automatically gets a new IP
response = requests.get(
"https://httpbin.org/ip",
proxies=proxies,
timeout=15,
)
print(response.json())
# Output: {"origin": "203.0.113.42"} (a different IP each time)Requests with Retry Logic and Error Handling
Production code needs robust error handling. Proxy requests can fail due to authentication errors (407), rate limiting (429), connection timeouts, and temporary gateway errors (502/503).
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
import time
PROXY_USER = "YOUR_USERNAME"
PROXY_PASS = "YOUR_PASSWORD"
PROXY_HOST = "gate.hexproxies.com"
PROXY_PORT = 8080
proxy_url = f"http://{PROXY_USER}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}"
proxies = {
"http": proxy_url,
"https": proxy_url,
}
def create_session_with_retries(
max_retries: int = 3,
backoff_factor: float = 1.0,
status_forcelist: tuple = (429, 500, 502, 503, 504),
) -> requests.Session:
"""Create a requests session with automatic retry on failure."""
session = requests.Session()
session.proxies = proxies
retry_strategy = Retry(
total=max_retries,
backoff_factor=backoff_factor,
status_forcelist=status_forcelist,
allowed_methods=["GET", "HEAD", "OPTIONS"],
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("http://", adapter)
session.mount("https://", adapter)
return session
def fetch_with_proxy(url: str, session: requests.Session) -> dict:
"""Fetch a URL through the proxy with comprehensive error handling."""
try:
response = session.get(url, timeout=15)
response.raise_for_status()
return {"url": url, "status": response.status_code, "data": response.text}
except requests.exceptions.ProxyError as e:
return {"url": url, "status": 407, "error": f"Proxy auth failed: {e}"}
except requests.exceptions.ConnectTimeout:
return {"url": url, "status": 0, "error": "Connection timed out"}
except requests.exceptions.ReadTimeout:
return {"url": url, "status": 0, "error": "Read timed out"}
except requests.exceptions.HTTPError as e:
return {"url": url, "status": e.response.status_code, "error": str(e)}
except requests.exceptions.RequestException as e:
return {"url": url, "status": 0, "error": f"Request failed: {e}"}
# Usage
session = create_session_with_retries()
urls = [f"https://example.com/product/{i}" for i in range(1, 11)]
for url in urls:
result = fetch_with_proxy(url, session)
print(f"{result['url']} -> {result.get('status', 'error')}")
time.sleep(1) # polite delay between requestsSticky Sessions with Requests
For workflows requiring the same IP across multiple requests (login flows, checkout processes), append a session ID to your username:
import requests
import uuid
def create_sticky_proxy_session(session_name: str = "") -> requests.Session:
"""Create a requests session that uses the same proxy IP for all requests."""
sid = session_name or uuid.uuid4().hex[:12]
sticky_proxy = (
f"http://{PROXY_USER}-session-{sid}:{PROXY_PASS}"
f"@{PROXY_HOST}:{PROXY_PORT}"
)
session = requests.Session()
session.proxies = {"http": sticky_proxy, "https": sticky_proxy}
session.timeout = 30
return session
# All requests in this session use the same IP
session = create_sticky_proxy_session("login-flow-001")
session.get("https://example.com/login")
session.post("https://example.com/login", data={"user": "me", "pass": "secret"})
session.get("https://example.com/dashboard") # Same IP as login2. Async Rotating Proxies with aiohttp
For high-throughput scraping, aiohttp provides async HTTP requests that dramatically increase concurrency. Instead of waiting for each response sequentially, you can run dozens of requests simultaneously.
import aiohttp
import asyncio
from aiohttp import BasicAuth
PROXY_USER = "YOUR_USERNAME"
PROXY_PASS = "YOUR_PASSWORD"
PROXY_URL = "http://gate.hexproxies.com:8080"
PROXY_AUTH = BasicAuth(PROXY_USER, PROXY_PASS)
async def fetch(
session: aiohttp.ClientSession,
url: str,
max_retries: int = 3,
) -> dict:
"""Fetch a URL through the rotating proxy with retry logic."""
for attempt in range(1, max_retries + 1):
try:
async with session.get(
url,
proxy=PROXY_URL,
proxy_auth=PROXY_AUTH,
timeout=aiohttp.ClientTimeout(total=15),
) as response:
if response.status == 429:
wait = 2 ** attempt
await asyncio.sleep(wait)
continue
text = await response.text()
return {"url": url, "status": response.status, "data": text}
except aiohttp.ClientProxyConnectionError:
return {"url": url, "status": 407, "error": "Proxy connection failed"}
except asyncio.TimeoutError:
if attempt < max_retries:
await asyncio.sleep(2 ** attempt)
continue
return {"url": url, "status": 0, "error": "Timed out after retries"}
except aiohttp.ClientError as e:
return {"url": url, "status": 0, "error": str(e)}
return {"url": url, "status": 429, "error": "Rate limited after retries"}
async def scrape_urls(urls: list[str], concurrency: int = 10) -> list[dict]:
"""Scrape multiple URLs concurrently through rotating proxies."""
semaphore = asyncio.Semaphore(concurrency)
results = []
async def bounded_fetch(url: str) -> dict:
async with semaphore:
result = await fetch(session, url)
await asyncio.sleep(0.5) # polite delay
return result
async with aiohttp.ClientSession() as session:
tasks = [bounded_fetch(url) for url in urls]
results = await asyncio.gather(*tasks)
return list(results)
# Usage
urls = [f"https://example.com/page/{i}" for i in range(1, 101)]
results = asyncio.run(scrape_urls(urls, concurrency=10))
success = sum(1 for r in results if r.get("status") == 200)
print(f"Success: {success}/{len(results)}")Key aiohttp Advantages
- Concurrency: 10--50 simultaneous requests vs sequential processing with requests.
- Speed: Scrape 1,000 URLs in the time requests handles 50--100.
- Memory: Lower memory footprint per connection than threading alternatives.
3. Rotating Proxies with httpx
httpx is a modern Python HTTP client that supports both sync and async modes, HTTP/2, and first-class proxy support. It is increasingly popular as a replacement for requests.
import httpx
import time
PROXY_USER = "YOUR_USERNAME"
PROXY_PASS = "YOUR_PASSWORD"
PROXY_URL = f"http://{PROXY_USER}:{PROXY_PASS}@gate.hexproxies.com:8080"
def scrape_with_httpx(urls: list[str]) -> list[dict]:
"""Scrape URLs using httpx with rotating proxy and retry logic."""
results = []
transport = httpx.HTTPTransport(retries=3)
with httpx.Client(
proxy=PROXY_URL,
transport=transport,
timeout=httpx.Timeout(15.0, connect=10.0),
follow_redirects=True,
) as client:
for url in urls:
try:
response = client.get(url)
results.append({
"url": url,
"status": response.status_code,
"data": response.text[:200],
})
except httpx.ProxyError as e:
results.append({"url": url, "status": 407, "error": str(e)})
except httpx.TimeoutException:
results.append({"url": url, "status": 0, "error": "Timeout"})
except httpx.HTTPError as e:
results.append({"url": url, "status": 0, "error": str(e)})
time.sleep(1)
return results
# Async version
async def scrape_async_httpx(urls: list[str]) -> list[dict]:
"""Async scraping with httpx and rotating proxy."""
results = []
transport = httpx.AsyncHTTPTransport(retries=3)
async with httpx.AsyncClient(
proxy=PROXY_URL,
transport=transport,
timeout=httpx.Timeout(15.0, connect=10.0),
follow_redirects=True,
) as client:
for url in urls:
try:
response = await client.get(url)
results.append({
"url": url,
"status": response.status_code,
"data": response.text[:200],
})
except httpx.HTTPError as e:
results.append({"url": url, "status": 0, "error": str(e)})
return results
# Usage
urls = [f"https://example.com/item/{i}" for i in range(1, 21)]
results = scrape_with_httpx(urls)httpx vs requests vs aiohttp
| Feature | requests | aiohttp | httpx |
|---|---|---|---|
| Sync support | Yes | No | Yes |
| Async support | No | Yes | Yes |
| HTTP/2 | No | No | Yes |
| Proxy auth | Via URL | BasicAuth object | Via URL |
| Built-in retry | Via adapter | Manual | Via transport |
| Best for | Simple scripts | High concurrency | Modern projects |
4. Scrapy Proxy Middleware
Scrapy is Python's leading web scraping framework. Proxy integration works through custom middleware that injects proxy settings into every request.
Basic Proxy Middleware
# myproject/middlewares.py
import logging
from scrapy import signals
logger = logging.getLogger(__name__)
class HexProxyMiddleware:
"""Scrapy middleware that routes all requests through Hex Proxies."""
PROXY_USER = "YOUR_USERNAME"
PROXY_PASS = "YOUR_PASSWORD"
PROXY_HOST = "gate.hexproxies.com"
PROXY_PORT = 8080
@classmethod
def from_crawler(cls, crawler):
middleware = cls()
return middleware
def process_request(self, request, spider):
proxy_url = (
f"http://{self.PROXY_USER}:{self.PROXY_PASS}"
f"@{self.PROXY_HOST}:{self.PROXY_PORT}"
)
request.meta["proxy"] = proxy_url
def process_response(self, request, response, spider):
if response.status == 407:
logger.error("Proxy authentication failed for %s", request.url)
if response.status == 429:
logger.warning("Rate limited on %s, will retry", request.url)
return request.replace(dont_filter=True)
return response
def process_exception(self, request, exception, spider):
logger.error("Proxy error on %s: %s", request.url, exception)
return request.replace(dont_filter=True)Enable the Middleware in settings.py
# myproject/settings.py
DOWNLOADER_MIDDLEWARES = {
"myproject.middlewares.HexProxyMiddleware": 350,
# Disable the default HTTP proxy middleware
"scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware": None,
}
# Recommended settings for proxy-based scraping
CONCURRENT_REQUESTS = 16
DOWNLOAD_DELAY = 1
DOWNLOAD_TIMEOUT = 15
RETRY_TIMES = 3
RETRY_HTTP_CODES = [429, 500, 502, 503, 504]Sticky Session Middleware for Scrapy
For crawls that require session continuity (paginated results, authenticated scraping):
# myproject/middlewares.py
import uuid
class HexStickySessionMiddleware:
"""Scrapy middleware with sticky proxy sessions per spider."""
PROXY_USER = "YOUR_USERNAME"
PROXY_PASS = "YOUR_PASSWORD"
PROXY_HOST = "gate.hexproxies.com"
PROXY_PORT = 8080
def __init__(self):
self.session_id = uuid.uuid4().hex[:12]
def process_request(self, request, spider):
# Use a per-spider sticky session
session_id = getattr(spider, "proxy_session_id", self.session_id)
proxy_url = (
f"http://{self.PROXY_USER}-session-{session_id}:{self.PROXY_PASS}"
f"@{self.PROXY_HOST}:{self.PROXY_PORT}"
)
request.meta["proxy"] = proxy_urlError Handling Reference
Every proxy integration should handle these failure modes:
| Status Code | Meaning | Action |
|---|---|---|
| 407 | Proxy authentication failed | Check credentials, verify username/password format |
| 429 | Rate limited by target | Wait with exponential backoff, then retry |
| 403 | Blocked by target | Rotate IP (change session ID), add delay |
| 502 | Proxy gateway error | Retry after 2--5 seconds |
| 503 | Proxy service unavailable | Retry after 5--10 seconds |
| Connection timeout | Proxy or target unreachable | Retry with longer timeout, verify proxy is responsive |
| Read timeout | Response too slow | Increase timeout, retry once |
Performance Comparison: Libraries Under Proxy Rotation
Benchmarked scraping 1,000 URLs through Hex Proxies rotating gateway:
| Library | Mode | Concurrency | Time (1K URLs) | Success Rate | Memory |
|---|---|---|---|---|---|
| requests | Sync | 1 | 18 min | 96.2% | 45 MB |
| aiohttp | Async | 20 | 1.5 min | 97.1% | 38 MB |
| httpx (sync) | Sync | 1 | 17 min | 96.5% | 52 MB |
| httpx (async) | Async | 20 | 1.8 min | 96.8% | 48 MB |
| Scrapy | Async | 16 | 2.1 min | 97.4% | 65 MB |
Takeaway: For high-volume scraping, use aiohttp or Scrapy. For simple scripts and prototyping, requests or httpx (sync) are easier to debug and maintain.
How Hex Proxies Works with Python
Hex Proxies provides a single gateway endpoint that handles all rotation logic server-side:
- Gateway:
gate.hexproxies.com:8080(HTTP/HTTPS) orgate.hexproxies.com:1080(SOCKS5) - Authentication: Username/password via proxy URL or Basic Auth header
- Rotation: Automatic per-request rotation by default. Add
-session-IDto username for sticky sessions. - Geo-targeting: Add
-country-USor-city-londonto username for location-specific IPs. - Pool: 10M+ residential IPs across 195 countries.
- Format: All standard Python proxy URL formats are supported. No SDK or custom library required.