Advanced Proxy Patterns for Puppeteer
Basic Puppeteer proxy setup is straightforward — pass --proxy-server to the browser args. Advanced patterns involve per-page proxy rotation, stealth integration, connection pooling, and production-grade error handling that turns Puppeteer from a script into a reliable scraping system.
Per-Page Proxy Rotation
Puppeteer sets proxy at the browser level, but you can achieve per-page rotation using browser contexts:
const puppeteer = require('puppeteer');class ProxyBrowserPool { constructor(username, password, poolSize = 5) { this.username = username; this.password = password; this.poolSize = poolSize; this.browsers = []; }
async initialize() {
for (let i = 0; i < this.poolSize; i++) {
const sessionId = pool-${i}-${Date.now()};
const browser = await puppeteer.launch({
headless: 'new',
args: [
--proxy-server=http://gate.hexproxies.com:8080,
'--no-sandbox',
'--disable-setuid-sandbox',
],
});
this.browsers.push({ browser, sessionId, inUse: false });
}
}
async getPage() { const slot = this.browsers.find(b => !b.inUse); if (!slot) throw new Error('No available browser in pool'); slot.inUse = true;
const page = await slot.browser.newPage();
await page.authenticate({
username: ${this.username}-session-${slot.sessionId},
password: this.password,
});
return { page, release: () => { slot.inUse = false; } }; }
async close() { await Promise.all(this.browsers.map(b => b.browser.close())); } } ```
Stealth Integration
Combine proxies with stealth plugins to avoid detection:
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');puppeteer.use(StealthPlugin());
async function stealthScrape(url, username, password) { const browser = await puppeteer.launch({ headless: 'new', args: ['--proxy-server=http://gate.hexproxies.com:8080'], });
const page = await browser.newPage(); await page.authenticate({ username, password });
// Set realistic viewport await page.setViewport({ width: 1920, height: 1080 });
// Navigate with realistic timing await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 });
// Random delay to mimic human behavior await page.waitForTimeout(1000 + Math.random() * 2000);
const content = await page.content(); await browser.close(); return content; } ```
Request Interception for Performance
Block unnecessary resources to speed up proxied page loads:
async function fastScrape(url, page) {
await page.setRequestInterception(true);const blockedTypes = new Set(['image', 'stylesheet', 'font', 'media']); const blockedDomains = new Set([ 'google-analytics.com', 'googletagmanager.com', 'facebook.net', 'doubleclick.net', ]);
page.on('request', (req) => { const url = new URL(req.url()); if (blockedTypes.has(req.resourceType()) || blockedDomains.has(url.hostname)) { req.abort(); } else { req.continue(); } });
await page.goto(url, { waitUntil: 'domcontentloaded' }); return await page.content(); } ```
Error Handling and Retry Logic
async function scrapeWithRetry(url, username, password, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const sessionId = `retry-${attempt}-${Date.now()}`;
let browser;
try {
browser = await puppeteer.launch({
headless: 'new',
args: ['--proxy-server=http://gate.hexproxies.com:8080'],
});
const page = await browser.newPage();
await page.authenticate({
username: `${username}-session-${sessionId}`,
password,
});
await page.setDefaultTimeout(30000);
const response = await page.goto(url, { waitUntil: 'networkidle2' });if (response && response.status() === 200) { const content = await page.content(); await browser.close(); return { success: true, content, attempts: attempt + 1 }; }
if (response && response.status() === 403) { // Likely blocked — rotate proxy and retry await browser.close(); continue; } } catch (error) { if (browser) await browser.close(); if (attempt === maxRetries - 1) { return { success: false, error: error.message, attempts: maxRetries }; } } } return { success: false, error: 'Max retries exceeded', attempts: maxRetries }; } ```
Production Architecture
For production scraping systems, combine browser pooling, proxy rotation, and error handling into a managed scraping service. The proxy layer is the critical component — Hex Proxies ISP proxies deliver sub-50ms latency that keeps page load times fast, while residential proxies provide the IP diversity needed for heavily protected targets.
Hex Proxies' multi-Gbps capacity scales with your Puppeteer automation needs.