How to Scrape Amazon with Proxies

Amazon is one of the most heavily protected e-commerce platforms. Their anti-bot system combines IP fingerprinting, behavioral analysis, CAPTCHA challenges, and rate limiting. Successful Amazon scraping at scale requires intelligent proxy rotation, realistic request patterns, and robust error handling.

Disclaimer: Always review Amazon's Terms of Service before scraping. This guide covers technical implementation only. Ensure your data collection practices comply with applicable laws and platform policies.

Why Amazon Scraping Needs Proxies

Amazon blocks scraping aggressively: - Single-IP scrapers get CAPTCHAs after 20-50 requests - Datacenter IPs are identified and blocked within minutes - Rate limits are enforced per IP, per session, and per account - Geographic pricing varies by marketplace (US, UK, DE, JP)

Proxy Strategy for Amazon

Amazon Task	Proxy Type	Session	Why
Product listings	Residential rotating	Per request	IP diversity avoids blocks
Price monitoring	ISP sticky	Per product	Consistent identity per check
Review scraping	Residential rotating	Per page	Volume requires rotation
Seller data	Residential rotating	Per request	Broad access needed
Marketplace comparison	Residential geo-targeted	Per country	Different data per region

Basic Amazon Scraper

import httpx
import random
import time
from dataclasses import dataclass
from bs4 import BeautifulSoup

@dataclass(frozen=True)
class AmazonProduct:
    asin: str
    title: str
    price: str
    rating: str
    review_count: str
    url: str

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0",
]

def scrape_product(asin: str, proxy: str) -> AmazonProduct:
    url = f"https://www.amazon.com/dp/{asin}"
    headers = {
        "User-Agent": random.choice(USER_AGENTS),
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
    }

    time.sleep(random.uniform(2.0, 5.0))

    with httpx.Client(proxy=proxy, timeout=30, follow_redirects=True) as client:
        resp = client.get(url, headers=headers)
        if resp.status_code != 200:
            return AmazonProduct(asin=asin, title="", price="", rating="", review_count="", url=url)

        soup = BeautifulSoup(resp.text, "html.parser")
        title_el = soup.select_one("#productTitle")
        price_el = soup.select_one(".a-price .a-offscreen")
        rating_el = soup.select_one("#acrPopover")
        review_el = soup.select_one("#acrCustomerReviewText")

        return AmazonProduct(
            asin=asin,
            title=title_el.text.strip() if title_el else "",
            price=price_el.text.strip() if price_el else "",
            rating=rating_el.get("title", "").strip() if rating_el else "",
            review_count=review_el.text.strip() if review_el else "",
            url=url,
        )

Multi-Marketplace Price Comparison

MARKETPLACES = {
    "US": {"domain": "amazon.com", "country": "us"},
    "UK": {"domain": "amazon.co.uk", "country": "gb"},
    "DE": {"domain": "amazon.de", "country": "de"},
    "JP": {"domain": "amazon.co.jp", "country": "jp"},
}

def compare_prices(asin: str, username: str, password: str) -> list[dict]:
    results = []
    for market, config in MARKETPLACES.items():
        proxy = f"http://{username}-country-{config['country']}:{password}@gate.hexproxies.com:8080"
        url = f"https://www.{config['domain']}/dp/{asin}"

        time.sleep(random.uniform(3.0, 6.0))

        try:
            with httpx.Client(proxy=proxy, timeout=30, follow_redirects=True) as client:
                resp = client.get(url, headers={
                    "User-Agent": random.choice(USER_AGENTS),
                    "Accept": "text/html,application/xhtml+xml",
                    "Accept-Encoding": "gzip, deflate, br",
                })
                soup = BeautifulSoup(resp.text, "html.parser")
                price_el = soup.select_one(".a-price .a-offscreen")
                results = [*results, {
                    "marketplace": market,
                    "price": price_el.text.strip() if price_el else "N/A",
                    "status": resp.status_code,
                }]
        except Exception as e:
            results = [*results, {"marketplace": market, "price": "error", "status": 0}]
    return results

CAPTCHA Detection and Handling

def is_captcha_page(html: str) -> bool:
    """Detect if Amazon returned a CAPTCHA challenge."""
    captcha_signals = [
        "Type the characters you see",
        "Enter the characters you see below",
        "/captcha/",
        "robot check",
    ]
    html_lower = html.lower()
    return any(signal.lower() in html_lower for signal in captcha_signals)

def scrape_with_captcha_handling(url: str, username: str, password: str, max_retries: int = 5) -> str:
    for attempt in range(max_retries):
        session_id = f"amz-{attempt}-{int(time.time())}"
        proxy = f"http://{username}-session-{session_id}:{password}@gate.hexproxies.com:8080"
        time.sleep(random.uniform(2.0, 5.0))

        with httpx.Client(proxy=proxy, timeout=30) as client:
            resp = client.get(url, headers={
                "User-Agent": random.choice(USER_AGENTS),
                "Accept": "text/html,application/xhtml+xml",
                "Accept-Encoding": "gzip, deflate, br",
            })
            if not is_captcha_page(resp.text) and resp.status_code == 200:
                return resp.text
    return ""

Rate Limiting Best Practices for Amazon

Never exceed 1 request per 3 seconds per proxy session to Amazon
Rotate IPs per request for catalog scraping
Use sticky sessions for consecutive pages of the same product
Vary request patterns — do not fetch products in alphabetical or sequential order
Include Referer headers that mimic natural navigation

Hex Proxies residential network provides the IP diversity needed for Amazon scraping at scale. With country-level targeting across 195+ countries, you can monitor every Amazon marketplace from locally appropriate IP addresses.

Proxies for Amazon Scraping

Prerequisites

Steps

Configure residential proxies

Build the product scraper

Add CAPTCHA detection

Implement multi-marketplace monitoring

Schedule and alert