How to Set Up Rotating Proxies in Python: Complete Tutorial
Python is the most popular language for web scraping and automation, and rotating proxies are essential for any serious data collection project. This tutorial covers four Python HTTP libraries — requests, aiohttp, httpx, and Scrapy — with complete, runnable code examples that include authentication, rotation configuration, error handling, retry logic, and session management. Every example uses the Hex Proxies gateway, but the patterns apply to any proxy service that supports username/password authentication.
By the end of this guide, you will have production-ready proxy integration code for whichever Python library your project uses.
Prerequisites
Before starting, you need:
- Python 3.9 or later installed
- A Hex Proxies account with username and password credentials
- The proxy gateway address:
gate.hexproxies.com(port 8080 for HTTP/HTTPS, port 1080 for SOCKS5)
Install the libraries you plan to use:
# Choose one or more
pip install requests
pip install aiohttp
pip install httpx
pip install scrapy1. Rotating Proxies with Requests
The requests library is Python's most widely used HTTP client. Proxy configuration is straightforward through the proxies parameter.
Basic Rotating Proxy Setup
import requestsPROXY_USER = "YOUR_USERNAME" PROXY_PASS = "YOUR_PASSWORD" PROXY_HOST = "gate.hexproxies.com" PROXY_PORT = 8080
proxy_url = f"http://{PROXY_USER}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}" proxies = { "http": proxy_url, "https": proxy_url, }
Each request automatically gets a new IP response = requests.get( "https://httpbin.org/ip", proxies=proxies, timeout=15, ) print(response.json()) # Output: {"origin": "203.0.113.42"} (a different IP each time) ```
Requests with Retry Logic and Error Handling
Production code needs robust error handling. Proxy requests can fail due to authentication errors (407), rate limiting (429), connection timeouts, and temporary gateway errors (502/503).
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
import timePROXY_USER = "YOUR_USERNAME" PROXY_PASS = "YOUR_PASSWORD" PROXY_HOST = "gate.hexproxies.com" PROXY_PORT = 8080
proxy_url = f"http://{PROXY_USER}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}" proxies = { "http": proxy_url, "https": proxy_url, }
def create_session_with_retries( max_retries: int = 3, backoff_factor: float = 1.0, status_forcelist: tuple = (429, 500, 502, 503, 504), ) -> requests.Session: """Create a requests session with automatic retry on failure.""" session = requests.Session() session.proxies = proxies
retry_strategy = Retry( total=max_retries, backoff_factor=backoff_factor, status_forcelist=status_forcelist, allowed_methods=["GET", "HEAD", "OPTIONS"], ) adapter = HTTPAdapter(max_retries=retry_strategy) session.mount("http://", adapter) session.mount("https://", adapter) return session
def fetch_with_proxy(url: str, session: requests.Session) -> dict: """Fetch a URL through the proxy with comprehensive error handling.""" try: response = session.get(url, timeout=15) response.raise_for_status() return {"url": url, "status": response.status_code, "data": response.text} except requests.exceptions.ProxyError as e: return {"url": url, "status": 407, "error": f"Proxy auth failed: {e}"} except requests.exceptions.ConnectTimeout: return {"url": url, "status": 0, "error": "Connection timed out"} except requests.exceptions.ReadTimeout: return {"url": url, "status": 0, "error": "Read timed out"} except requests.exceptions.HTTPError as e: return {"url": url, "status": e.response.status_code, "error": str(e)} except requests.exceptions.RequestException as e: return {"url": url, "status": 0, "error": f"Request failed: {e}"}
Usage session = create_session_with_retries() urls = [f"https://example.com/product/{i}" for i in range(1, 11)]
for url in urls: result = fetch_with_proxy(url, session) print(f"{result['url']} -> {result.get('status', 'error')}") time.sleep(1) # polite delay between requests ```
Sticky Sessions with Requests
For workflows requiring the same IP across multiple requests (login flows, checkout processes), append a session ID to your username:
import requests
import uuiddef create_sticky_proxy_session(session_name: str = "") -> requests.Session: """Create a requests session that uses the same proxy IP for all requests.""" sid = session_name or uuid.uuid4().hex[:12] sticky_proxy = ( f"http://{PROXY_USER}-session-{sid}:{PROXY_PASS}" f"@{PROXY_HOST}:{PROXY_PORT}" ) session = requests.Session() session.proxies = {"http": sticky_proxy, "https": sticky_proxy} session.timeout = 30 return session
All requests in this session use the same IP session = create_sticky_proxy_session("login-flow-001") session.get("https://example.com/login") session.post("https://example.com/login", data={"user": "me", "pass": "secret"}) session.get("https://example.com/dashboard") # Same IP as login ```
2. Async Rotating Proxies with aiohttp
For high-throughput scraping, aiohttp provides async HTTP requests that dramatically increase concurrency. Instead of waiting for each response sequentially, you can run dozens of requests simultaneously.
import aiohttp
import asyncio
from aiohttp import BasicAuthPROXY_USER = "YOUR_USERNAME" PROXY_PASS = "YOUR_PASSWORD" PROXY_URL = "http://gate.hexproxies.com:8080" PROXY_AUTH = BasicAuth(PROXY_USER, PROXY_PASS)
async def fetch( session: aiohttp.ClientSession, url: str, max_retries: int = 3, ) -> dict: """Fetch a URL through the rotating proxy with retry logic.""" for attempt in range(1, max_retries + 1): try: async with session.get( url, proxy=PROXY_URL, proxy_auth=PROXY_AUTH, timeout=aiohttp.ClientTimeout(total=15), ) as response: if response.status == 429: wait = 2 attempt await asyncio.sleep(wait) continue text = await response.text() return {"url": url, "status": response.status, "data": text} except aiohttp.ClientProxyConnectionError: return {"url": url, "status": 407, "error": "Proxy connection failed"} except asyncio.TimeoutError: if attempt < max_retries: await asyncio.sleep(2 attempt) continue return {"url": url, "status": 0, "error": "Timed out after retries"} except aiohttp.ClientError as e: return {"url": url, "status": 0, "error": str(e)} return {"url": url, "status": 429, "error": "Rate limited after retries"}
async def scrape_urls(urls: list[str], concurrency: int = 10) -> list[dict]: """Scrape multiple URLs concurrently through rotating proxies.""" semaphore = asyncio.Semaphore(concurrency) results = []
async def bounded_fetch(url: str) -> dict: async with semaphore: result = await fetch(session, url) await asyncio.sleep(0.5) # polite delay return result
async with aiohttp.ClientSession() as session: tasks = [bounded_fetch(url) for url in urls] results = await asyncio.gather(*tasks) return list(results)
Usage urls = [f"https://example.com/page/{i}" for i in range(1, 101)] results = asyncio.run(scrape_urls(urls, concurrency=10))
success = sum(1 for r in results if r.get("status") == 200) print(f"Success: {success}/{len(results)}") ```
Key aiohttp Advantages
- Concurrency: 10--50 simultaneous requests vs sequential processing with requests.
- Speed: Scrape 1,000 URLs in the time requests handles 50--100.
- Memory: Lower memory footprint per connection than threading alternatives.
3. Rotating Proxies with httpx
httpx is a modern Python HTTP client that supports both sync and async modes, HTTP/2, and first-class proxy support. It is increasingly popular as a replacement for requests.
import httpx
import timePROXY_USER = "YOUR_USERNAME" PROXY_PASS = "YOUR_PASSWORD" PROXY_URL = f"http://{PROXY_USER}:{PROXY_PASS}@gate.hexproxies.com:8080"
def scrape_with_httpx(urls: list[str]) -> list[dict]: """Scrape URLs using httpx with rotating proxy and retry logic.""" results = [] transport = httpx.HTTPTransport(retries=3)
with httpx.Client( proxy=PROXY_URL, transport=transport, timeout=httpx.Timeout(15.0, connect=10.0), follow_redirects=True, ) as client: for url in urls: try: response = client.get(url) results.append({ "url": url, "status": response.status_code, "data": response.text[:200], }) except httpx.ProxyError as e: results.append({"url": url, "status": 407, "error": str(e)}) except httpx.TimeoutException: results.append({"url": url, "status": 0, "error": "Timeout"}) except httpx.HTTPError as e: results.append({"url": url, "status": 0, "error": str(e)}) time.sleep(1)
return results
Async version async def scrape_async_httpx(urls: list[str]) -> list[dict]: """Async scraping with httpx and rotating proxy.""" results = [] transport = httpx.AsyncHTTPTransport(retries=3)
async with httpx.AsyncClient( proxy=PROXY_URL, transport=transport, timeout=httpx.Timeout(15.0, connect=10.0), follow_redirects=True, ) as client: for url in urls: try: response = await client.get(url) results.append({ "url": url, "status": response.status_code, "data": response.text[:200], }) except httpx.HTTPError as e: results.append({"url": url, "status": 0, "error": str(e)})
return results
Usage urls = [f"https://example.com/item/{i}" for i in range(1, 21)] results = scrape_with_httpx(urls) ```
httpx vs requests vs aiohttp
| Feature | requests | aiohttp | httpx |
|---|---|---|---|
| Sync support | Yes | No | Yes |
| Async support | No | Yes | Yes |
| HTTP/2 | No | No | Yes |
| Proxy auth | Via URL | BasicAuth object | Via URL |
| Built-in retry | Via adapter | Manual | Via transport |
| Best for | Simple scripts | High concurrency | Modern projects |
4. Scrapy Proxy Middleware
Scrapy is Python's leading web scraping framework. Proxy integration works through custom middleware that injects proxy settings into every request.
Basic Proxy Middleware
# myproject/middlewares.pyimport logging from scrapy import signals
logger = logging.getLogger(__name__)
class HexProxyMiddleware: """Scrapy middleware that routes all requests through Hex Proxies."""
PROXY_USER = "YOUR_USERNAME" PROXY_PASS = "YOUR_PASSWORD" PROXY_HOST = "gate.hexproxies.com" PROXY_PORT = 8080
@classmethod def from_crawler(cls, crawler): middleware = cls() return middleware
def process_request(self, request, spider): proxy_url = ( f"http://{self.PROXY_USER}:{self.PROXY_PASS}" f"@{self.PROXY_HOST}:{self.PROXY_PORT}" ) request.meta["proxy"] = proxy_url
def process_response(self, request, response, spider): if response.status == 407: logger.error("Proxy authentication failed for %s", request.url) if response.status == 429: logger.warning("Rate limited on %s, will retry", request.url) return request.replace(dont_filter=True) return response
def process_exception(self, request, exception, spider): logger.error("Proxy error on %s: %s", request.url, exception) return request.replace(dont_filter=True) ```
Enable the Middleware in settings.py
# myproject/settings.pyDOWNLOADER_MIDDLEWARES = { "myproject.middlewares.HexProxyMiddleware": 350, # Disable the default HTTP proxy middleware "scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware": None, }
Recommended settings for proxy-based scraping CONCURRENT_REQUESTS = 16 DOWNLOAD_DELAY = 1 DOWNLOAD_TIMEOUT = 15 RETRY_TIMES = 3 RETRY_HTTP_CODES = [429, 500, 502, 503, 504] ```
Sticky Session Middleware for Scrapy
For crawls that require session continuity (paginated results, authenticated scraping):
# myproject/middlewares.pyimport uuid
class HexStickySessionMiddleware: """Scrapy middleware with sticky proxy sessions per spider."""
PROXY_USER = "YOUR_USERNAME" PROXY_PASS = "YOUR_PASSWORD" PROXY_HOST = "gate.hexproxies.com" PROXY_PORT = 8080
def __init__(self): self.session_id = uuid.uuid4().hex[:12]
def process_request(self, request, spider): # Use a per-spider sticky session session_id = getattr(spider, "proxy_session_id", self.session_id) proxy_url = ( f"http://{self.PROXY_USER}-session-{session_id}:{self.PROXY_PASS}" f"@{self.PROXY_HOST}:{self.PROXY_PORT}" ) request.meta["proxy"] = proxy_url ```
Error Handling Reference
Every proxy integration should handle these failure modes:
| Status Code | Meaning | Action |
|---|---|---|
| 407 | Proxy authentication failed | Check credentials, verify username/password format |
| 429 | Rate limited by target | Wait with exponential backoff, then retry |
| 403 | Blocked by target | Rotate IP (change session ID), add delay |
| 502 | Proxy gateway error | Retry after 2--5 seconds |
| 503 | Proxy service unavailable | Retry after 5--10 seconds |
| Connection timeout | Proxy or target unreachable | Retry with longer timeout, verify proxy is responsive |
| Read timeout | Response too slow | Increase timeout, retry once |
Performance Comparison: Libraries Under Proxy Rotation
Benchmarked scraping 1,000 URLs through Hex Proxies rotating gateway:
| Library | Mode | Concurrency | Time (1K URLs) | Success Rate | Memory |
|---|---|---|---|---|---|
| requests | Sync | 1 | 18 min | 96.2% | 45 MB |
| aiohttp | Async | 20 | 1.5 min | 97.1% | 38 MB |
| httpx (sync) | Sync | 1 | 17 min | 96.5% | 52 MB |
| httpx (async) | Async | 20 | 1.8 min | 96.8% | 48 MB |
| Scrapy | Async | 16 | 2.1 min | 97.4% | 65 MB |
Takeaway: For high-volume scraping, use aiohttp or Scrapy. For simple scripts and prototyping, requests or httpx (sync) are easier to debug and maintain.
How Hex Proxies Works with Python
Hex Proxies provides a single gateway endpoint that handles all rotation logic server-side:
- Gateway:
gate.hexproxies.com:8080(HTTP/HTTPS) orgate.hexproxies.com:1080(SOCKS5) - Authentication: Username/password via proxy URL or Basic Auth header
- Rotation: Automatic per-request rotation by default. Add
-session-IDto username for sticky sessions. - Geo-targeting: Add
-country-USor-city-londonto username for location-specific IPs. - Pool: 10M+ residential IPs across 195 countries.
- Format: All standard Python proxy URL formats are supported. No SDK or custom library required.