Scrapy Proxy Setup
Scrapy is Python's most popular web scraping framework, built for large-scale data extraction with built-in support for request scheduling, middleware pipelines, and data export. Scrapy's middleware architecture makes proxy integration clean and flexible — you can set a proxy per request, rotate automatically, or build custom rotation logic.
Why Use Proxies with Scrapy?
Large-scale scraping without proxies leads to rapid IP bans. Scrapy's default behavior sends all requests from a single IP, which anti-bot systems detect within minutes on protected targets. Hex Proxies' residential pool provides millions of IPs with automatic rotation, keeping your Scrapy spiders running with high success rates.
Basic Per-Request Proxy Setup
# In your spider, set proxy in request metaclass MySpider(scrapy.Spider): name = 'my_spider'
def start_requests(self): yield scrapy.Request( url='https://example.com', meta={'proxy': 'http://user:pass@gate.hexproxies.com:8080'}, )
def parse(self, response): self.logger.info(f'Status: {response.status}') ```
Global Proxy via Settings
To route all requests through Hex Proxies, configure middleware in `settings.py`:
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,# Set a default proxy for all requests HTTP_PROXY = 'http://user:pass@gate.hexproxies.com:8080' ```
Then in a custom middleware or spider, assign the proxy:
class ProxyMiddleware:
def process_request(self, request, spider):
request.meta['proxy'] = 'http://user:pass@gate.hexproxies.com:8080'IP Whitelist Authentication
Whitelist your server IP in the Hex Proxies dashboard and use the proxy URL without credentials:
request.meta['proxy'] = 'http://gate.hexproxies.com:8080'Geo-Targeting
Append country codes to your username for geographic routing:
request.meta['proxy'] = 'http://user-country-de:pass@gate.hexproxies.com:8080'Best Practices
- **Rotate IPs per request** for large-scale scraping — Hex Proxies' rotating residential pool assigns a new IP per connection by default.
- **Implement retries with exponential backoff** using Scrapy's built-in `RETRY_TIMES` and `RETRY_HTTP_CODES` settings.
- **Respect DOWNLOAD_DELAY** to avoid triggering rate limits. A delay of 1-2 seconds per request is usually sufficient with residential proxies.
- Use CONCURRENT_REQUESTS wisely — start with 8-16 concurrent requests and increase as you monitor success rates.
Troubleshooting
- **407 Proxy Authentication Required**: Double-check your username and password. Ensure credentials are URL-encoded if they contain special characters.
- **Repeated 403 or 503 responses**: The target site is blocking your requests. Reduce concurrency, add delays, and rotate user agents via Scrapy's `USER_AGENT` setting or a user-agent middleware.
- **Connection timeouts**: Residential proxies have higher latency than direct connections. Increase `DOWNLOAD_TIMEOUT` to 30-60 seconds.
- **SSL errors**: Ensure your Scrapy environment has up-to-date SSL certificates installed.