v1.10.82-f67ee7d
Skip to main content
← Back to Hex Proxies

Scrapy Proxy Integration

Integrate Hex Proxies with Scrapy to reduce blocks and improve scraping success.

Scrapy Proxy Setup

Scrapy is Python's most popular web scraping framework, built for large-scale data extraction with built-in support for request scheduling, middleware pipelines, and data export. Scrapy's middleware architecture makes proxy integration clean and flexible — you can set a proxy per request, rotate automatically, or build custom rotation logic.

Why Use Proxies with Scrapy?

Large-scale scraping without proxies leads to rapid IP bans. Scrapy's default behavior sends all requests from a single IP, which anti-bot systems detect within minutes on protected targets. Hex Proxies' residential pool provides millions of IPs with automatic rotation, keeping your Scrapy spiders running with high success rates.

Basic Per-Request Proxy Setup

# In your spider, set proxy in request meta

class MySpider(scrapy.Spider): name = 'my_spider'

def start_requests(self): yield scrapy.Request( url='https://example.com', meta={'proxy': 'http://user:pass@gate.hexproxies.com:8080'}, )

def parse(self, response): self.logger.info(f'Status: {response.status}') ```

Global Proxy via Settings

To route all requests through Hex Proxies, configure middleware in `settings.py`:

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,

# Set a default proxy for all requests HTTP_PROXY = 'http://user:pass@gate.hexproxies.com:8080' ```

Then in a custom middleware or spider, assign the proxy:

class ProxyMiddleware:
    def process_request(self, request, spider):
        request.meta['proxy'] = 'http://user:pass@gate.hexproxies.com:8080'

IP Whitelist Authentication

Whitelist your server IP in the Hex Proxies dashboard and use the proxy URL without credentials:

request.meta['proxy'] = 'http://gate.hexproxies.com:8080'

Geo-Targeting

Append country codes to your username for geographic routing:

request.meta['proxy'] = 'http://user-country-de:pass@gate.hexproxies.com:8080'

Best Practices

  • **Rotate IPs per request** for large-scale scraping — Hex Proxies' rotating residential pool assigns a new IP per connection by default.
  • **Implement retries with exponential backoff** using Scrapy's built-in `RETRY_TIMES` and `RETRY_HTTP_CODES` settings.
  • **Respect DOWNLOAD_DELAY** to avoid triggering rate limits. A delay of 1-2 seconds per request is usually sufficient with residential proxies.
  • Use CONCURRENT_REQUESTS wisely — start with 8-16 concurrent requests and increase as you monitor success rates.

Troubleshooting

  • **407 Proxy Authentication Required**: Double-check your username and password. Ensure credentials are URL-encoded if they contain special characters.
  • **Repeated 403 or 503 responses**: The target site is blocking your requests. Reduce concurrency, add delays, and rotate user agents via Scrapy's `USER_AGENT` setting or a user-agent middleware.
  • **Connection timeouts**: Residential proxies have higher latency than direct connections. Increase `DOWNLOAD_TIMEOUT` to 30-60 seconds.
  • **SSL errors**: Ensure your Scrapy environment has up-to-date SSL certificates installed.

Integration Steps

1

Enable proxy middleware

Use Scrapy proxy middleware or a custom rotator.

2

Add credentials

Pass username and password in the proxy URL.

3

Rotate intelligently

Rotate per request or per domain depending on target.

4

Monitor results

Track success rate and tune delay.

Operational Tips

Keep sessions stable for workflows that depend on consistent identity. For high-volume collection, rotate IPs and reduce concurrency if you see timeouts or 403 responses.

  • Prefer sticky sessions for multi-step flows (auth, checkout, forms).
  • Rotate per request for scale and broad coverage.
  • Use timeouts and retries to handle transient failures.

Frequently Asked Questions

How often should I rotate in Scrapy?

Per-request rotation is best for large-scale scraping.

Ready to Integrate?

Start using residential proxies with Scrapy today.