Advanced Proxy Patterns for Puppeteer
Basic Puppeteer proxy setup is straightforward — pass `--proxy-server` to the browser args. Advanced patterns involve per-page proxy rotation, stealth integration, connection pooling, and production-grade error handling that turns Puppeteer from a script into a reliable scraping system.
Per-Page Proxy Rotation
Puppeteer sets proxy at the browser level, but you can achieve per-page rotation using browser contexts:
class ProxyBrowserPool { constructor(username, password, poolSize = 5) { this.username = username; this.password = password; this.poolSize = poolSize; this.browsers = []; }
async initialize() { for (let i = 0; i < this.poolSize; i++) { const sessionId = `pool-${i}-${Date.now()}`; const browser = await puppeteer.launch({ headless: 'new', args: [ `--proxy-server=http://gate.hexproxies.com:8080`, '--no-sandbox', '--disable-setuid-sandbox', ], }); this.browsers.push({ browser, sessionId, inUse: false }); } }
async getPage() { const slot = this.browsers.find(b => !b.inUse); if (!slot) throw new Error('No available browser in pool'); slot.inUse = true;
const page = await slot.browser.newPage(); await page.authenticate({ username: `${this.username}-session-${slot.sessionId}`, password: this.password, });
return { page, release: () => { slot.inUse = false; } }; }
async close() { await Promise.all(this.browsers.map(b => b.browser.close())); } } ```
Stealth Integration
Combine proxies with stealth plugins to avoid detection:
const puppeteer = require('puppeteer-extra');puppeteer.use(StealthPlugin());
async function stealthScrape(url, username, password) { const browser = await puppeteer.launch({ headless: 'new', args: ['--proxy-server=http://gate.hexproxies.com:8080'], });
const page = await browser.newPage(); await page.authenticate({ username, password });
// Set realistic viewport await page.setViewport({ width: 1920, height: 1080 });
// Navigate with realistic timing await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 });
// Random delay to mimic human behavior await page.waitForTimeout(1000 + Math.random() * 2000);
const content = await page.content(); await browser.close(); return content; } ```
Request Interception for Performance
Block unnecessary resources to speed up proxied page loads:
async function fastScrape(url, page) {const blockedTypes = new Set(['image', 'stylesheet', 'font', 'media']); const blockedDomains = new Set([ 'google-analytics.com', 'googletagmanager.com', 'facebook.net', 'doubleclick.net', ]);
page.on('request', (req) => { const url = new URL(req.url()); if (blockedTypes.has(req.resourceType()) || blockedDomains.has(url.hostname)) { req.abort(); } else { req.continue(); } });
await page.goto(url, { waitUntil: 'domcontentloaded' }); return await page.content(); } ```
Error Handling and Retry Logic
async function scrapeWithRetry(url, username, password, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const sessionId = `retry-${attempt}-${Date.now()}`;
let browser;
try {
browser = await puppeteer.launch({
headless: 'new',
args: ['--proxy-server=http://gate.hexproxies.com:8080'],
});
const page = await browser.newPage();
await page.authenticate({
username: `${username}-session-${sessionId}`,
password,
});
await page.setDefaultTimeout(30000);if (response && response.status() === 200) { const content = await page.content(); await browser.close(); return { success: true, content, attempts: attempt + 1 }; }
if (response && response.status() === 403) { // Likely blocked — rotate proxy and retry await browser.close(); continue; } } catch (error) { if (browser) await browser.close(); if (attempt === maxRetries - 1) { return { success: false, error: error.message, attempts: maxRetries }; } } } return { success: false, error: 'Max retries exceeded', attempts: maxRetries }; } ```
Production Architecture
For production scraping systems, combine browser pooling, proxy rotation, and error handling into a managed scraping service. The proxy layer is the critical component — Hex Proxies ISP proxies deliver sub-50ms latency that keeps page load times fast, while residential proxies provide the IP diversity needed for heavily protected targets.
With 50 billion requests per week processed across our network, Hex Proxies infrastructure scales with your Puppeteer automation needs.