Scrapy Proxy Setup
Scrapy is Python's most popular web scraping framework, built for large-scale data extraction with built-in support for request scheduling, middleware pipelines, and data export. Scrapy's middleware architecture makes proxy integration clean and flexible — you can set a proxy per request, rotate automatically, or build custom rotation logic.
Why Use Proxies with Scrapy?
Large-scale scraping without proxies leads to rapid IP bans. Scrapy's default behavior sends all requests from a single IP, which anti-bot systems detect within minutes on protected targets. Hex Proxies' residential pool provides millions of IPs with automatic rotation, keeping your Scrapy spiders running with high success rates.
Basic Per-Request Proxy Setup
# In your spider, set proxy in request meta
import scrapy
class MySpider(scrapy.Spider):
name = 'my_spider'
def start_requests(self):
yield scrapy.Request(
url='https://example.com',
meta={'proxy': 'http://user:pass@gate.hexproxies.com:8080'},
)
def parse(self, response):
self.logger.info(f'Status: {response.status}')Global Proxy via Settings
To route all requests through Hex Proxies, configure middleware in `settings.py`:
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
}
# Set a default proxy for all requests
HTTP_PROXY = 'http://user:pass@gate.hexproxies.com:8080'Then in a custom middleware or spider, assign the proxy:
class ProxyMiddleware:
def process_request(self, request, spider):
request.meta['proxy'] = 'http://user:pass@gate.hexproxies.com:8080'IP Whitelist Authentication
Whitelist your server IP in the Hex Proxies dashboard and use the proxy URL without credentials:
request.meta['proxy'] = 'http://gate.hexproxies.com:8080'Geo-Targeting
Append country codes to your username for geographic routing:
request.meta['proxy'] = 'http://user-country-de:pass@gate.hexproxies.com:8080'Best Practices
- Rotate IPs per request for large-scale scraping — Hex Proxies' rotating residential pool assigns a new IP per connection by default.
- Implement retries with exponential backoff using Scrapy's built-in `RETRY_TIMES` and `RETRY_HTTP_CODES` settings.
- Respect DOWNLOAD_DELAY to avoid triggering rate limits. A delay of 1-2 seconds per request is usually sufficient with residential proxies.
- Use CONCURRENT_REQUESTS wisely — start with 8-16 concurrent requests and increase as you monitor success rates.
Troubleshooting
- 407 Proxy Authentication Required: Double-check your username and password. Ensure credentials are URL-encoded if they contain special characters.
- Repeated 403 or 503 responses: The target site is blocking your requests. Reduce concurrency, add delays, and rotate user agents via Scrapy's `USER_AGENT` setting or a user-agent middleware.
- Connection timeouts: Residential proxies have higher latency than direct connections. Increase `DOWNLOAD_TIMEOUT` to 30-60 seconds.
- SSL errors: Ensure your Scrapy environment has up-to-date SSL certificates installed.