Why You Need Proxies for Crawlee
Crawlee is a next-generation web scraping framework that combines HTTP crawling (via Cheerio and Got) with browser crawling (via Playwright) in a unified API. It handles request queuing, automatic retries, session management, and proxy rotation -- making it one of the most popular frameworks for building production web crawlers.
However, Crawlee's effectiveness depends entirely on the quality of its proxy infrastructure. Without clean proxies, even Crawlee's sophisticated anti-detection features cannot overcome IP-based blocking. Websites block known datacenter IP ranges regardless of browser fingerprinting, request headers, or JavaScript execution.
Hex Proxies' residential network provides the foundation Crawlee needs to achieve high success rates on any target website. With 10M+ residential IPs across 100+ countries, each Crawlee request routes through a genuine ISP-assigned address that anti-bot systems trust by default.
Crawlee's built-in ProxyConfiguration class makes integration straightforward. Point it at the Hex Proxies gateway, and Crawlee automatically handles proxy rotation, session management, and error-based IP switching using your residential proxy pool.
Best Proxy Type for Crawlee
Crawlee supports multiple crawler types, each with optimal proxy configurations:
**CheerioCrawler (HTTP-only)**: Residential proxies with per-request rotation. Each HTTP request gets a fresh IP from the 10M+ pool. This is the fastest and most bandwidth-efficient Crawlee mode, ideal for scraping static HTML content at high volume.
**PlaywrightCrawler (browser-based)**: Residential proxies with sticky sessions. Browser-based crawling often requires multiple requests per page (HTML + assets + JavaScript). Sticky sessions maintain the same IP throughout a page load cycle, preventing mixed-IP requests that some anti-bot systems detect.
**AdaptiveCrawler**: Hex Proxies works with Crawlee's adaptive mode that automatically selects between HTTP and browser crawling based on the target site's protection level. The proxy infrastructure handles both modes transparently.
For all Crawlee configurations, residential proxies from Hex Proxies deliver the highest success rates due to their genuine ISP trust scores and massive IP diversity.
How to Use Hex Proxies with Crawlee
Crawlee's ProxyConfiguration class integrates directly with Hex Proxies:
```javascript import { CheerioCrawler, ProxyConfiguration } from 'crawlee';
const proxyConfig = new ProxyConfiguration({ proxyUrls: [ 'http://user:pass@gate.hexproxies.com:8080', ], });
const crawler = new CheerioCrawler({ proxyConfiguration: proxyConfig, maxConcurrency: 50, requestHandler: async ({ request, $, enqueueLinks }) => { const title = $('title').text(); console.log(title); await enqueueLinks(); }, });
await crawler.run(['https://example.com']); ```
For geo-targeted crawling, use multiple proxy URLs with different country codes:
```javascript const proxyConfig = new ProxyConfiguration({ proxyUrls: [ 'http://user-country-us:pass@gate.hexproxies.com:8080', 'http://user-country-gb:pass@gate.hexproxies.com:8080', 'http://user-country-de:pass@gate.hexproxies.com:8080', ], }); ```
Crawlee rotates through the proxy URLs automatically, distributing requests across geographic regions.
Setup Guide
- Create a Hex Proxies account and fund your wallet for instant proxy access.
- Install Crawlee in your project: npm install crawlee
- Configure ProxyConfiguration with your Hex Proxies gateway credentials.
- Set maxConcurrency based on your target site's tolerance -- start with 10-20 concurrent requests and scale up.
- Enable Crawlee's session rotation to automatically switch proxies on blocked requests.
- Test against a known target to verify proxy integration and measure success rates.
- Monitor bandwidth consumption through the Hex Proxies dashboard to optimize cost efficiency.
Pricing for Crawlee Proxies
Residential proxy pricing at $4.25/GB aligns well with Crawlee's efficient request handling. CheerioCrawler consumes minimal bandwidth (50-200 KB per page), making residential proxies very cost-effective for HTML-only scraping. A crawl of 100,000 pages at 100 KB average uses approximately 10 GB.
PlaywrightCrawler consumes more bandwidth due to JavaScript and asset loading (1-5 MB per page). A 10,000-page browser crawl at 3 MB average uses approximately 30 GB. Volume discounts reduce costs at these levels.
No minimum commitments. Pay only for the bandwidth consumed by your Crawlee operations.