v1.8.91-d84675c
← Back to Hex Proxies

Proxies for Amazon Scraping

Last updated: April 2026

By Hex Proxies Engineering Team

A comprehensive guide to scraping Amazon product listings, pricing, reviews, and seller data at scale using proxy infrastructure with anti-detection and compliance strategies.

advanced25 minutesplatform-specific

Prerequisites

  • Python 3.10+
  • Hex Proxies residential plan
  • Understanding of web scraping

Steps

1

Configure residential proxies

Set up Hex Proxies residential credentials with country targeting for each Amazon marketplace.

2

Build the product scraper

Implement a scraper with realistic headers, random delays, and CSS selector extraction.

3

Add CAPTCHA detection

Detect CAPTCHA pages and auto-rotate to new proxy sessions on detection.

4

Implement multi-marketplace monitoring

Create geo-targeted price comparison across US, UK, DE, and JP Amazon marketplaces.

5

Schedule and alert

Automate collection runs and set up alerts for significant price changes.

How to Scrape Amazon with Proxies

Amazon is one of the most heavily protected e-commerce platforms. Their anti-bot system combines IP fingerprinting, behavioral analysis, CAPTCHA challenges, and rate limiting. Successful Amazon scraping at scale requires intelligent proxy rotation, realistic request patterns, and robust error handling.

**Disclaimer**: Always review Amazon's Terms of Service before scraping. This guide covers technical implementation only. Ensure your data collection practices comply with applicable laws and platform policies.

Why Amazon Scraping Needs Proxies

Amazon blocks scraping aggressively: - Single-IP scrapers get CAPTCHAs after 20-50 requests - Datacenter IPs are identified and blocked within minutes - Rate limits are enforced per IP, per session, and per account - Geographic pricing varies by marketplace (US, UK, DE, JP)

Proxy Strategy for Amazon

| Amazon Task | Proxy Type | Session | Why | |---|---|---|---| | Product listings | Residential rotating | Per request | IP diversity avoids blocks | | Price monitoring | ISP sticky | Per product | Consistent identity per check | | Review scraping | Residential rotating | Per page | Volume requires rotation | | Seller data | Residential rotating | Per request | Broad access needed | | Marketplace comparison | Residential geo-targeted | Per country | Different data per region |

Basic Amazon Scraper

import httpx
import random
import time
from dataclasses import dataclass

@dataclass(frozen=True) class AmazonProduct: asin: str title: str price: str rating: str review_count: str url: str

USER_AGENTS = [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0", ]

def scrape_product(asin: str, proxy: str) -> AmazonProduct: url = f"https://www.amazon.com/dp/{asin}" headers = { "User-Agent": random.choice(USER_AGENTS), "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en-US,en;q=0.9", "Accept-Encoding": "gzip, deflate, br", }

time.sleep(random.uniform(2.0, 5.0))

with httpx.Client(proxy=proxy, timeout=30, follow_redirects=True) as client: resp = client.get(url, headers=headers) if resp.status_code != 200: return AmazonProduct(asin=asin, title="", price="", rating="", review_count="", url=url)

soup = BeautifulSoup(resp.text, "html.parser") title_el = soup.select_one("#productTitle") price_el = soup.select_one(".a-price .a-offscreen") rating_el = soup.select_one("#acrPopover") review_el = soup.select_one("#acrCustomerReviewText")

return AmazonProduct( asin=asin, title=title_el.text.strip() if title_el else "", price=price_el.text.strip() if price_el else "", rating=rating_el.get("title", "").strip() if rating_el else "", review_count=review_el.text.strip() if review_el else "", url=url, ) ```

Multi-Marketplace Price Comparison

MARKETPLACES = {
    "US": {"domain": "amazon.com", "country": "us"},
    "UK": {"domain": "amazon.co.uk", "country": "gb"},
    "DE": {"domain": "amazon.de", "country": "de"},
    "JP": {"domain": "amazon.co.jp", "country": "jp"},

def compare_prices(asin: str, username: str, password: str) -> list[dict]: results = [] for market, config in MARKETPLACES.items(): proxy = f"http://{username}-country-{config['country']}:{password}@gate.hexproxies.com:8080" url = f"https://www.{config['domain']}/dp/{asin}"

time.sleep(random.uniform(3.0, 6.0))

try: with httpx.Client(proxy=proxy, timeout=30, follow_redirects=True) as client: resp = client.get(url, headers={ "User-Agent": random.choice(USER_AGENTS), "Accept": "text/html,application/xhtml+xml", "Accept-Encoding": "gzip, deflate, br", }) soup = BeautifulSoup(resp.text, "html.parser") price_el = soup.select_one(".a-price .a-offscreen") results = [*results, { "marketplace": market, "price": price_el.text.strip() if price_el else "N/A", "status": resp.status_code, }] except Exception as e: results = [*results, {"marketplace": market, "price": "error", "status": 0}] return results ```

CAPTCHA Detection and Handling

def is_captcha_page(html: str) -> bool:
    """Detect if Amazon returned a CAPTCHA challenge."""
    captcha_signals = [
        "Type the characters you see",
        "Enter the characters you see below",
        "/captcha/",
        "robot check",
    ]
    html_lower = html.lower()

def scrape_with_captcha_handling(url: str, username: str, password: str, max_retries: int = 5) -> str: for attempt in range(max_retries): session_id = f"amz-{attempt}-{int(time.time())}" proxy = f"http://{username}-session-{session_id}:{password}@gate.hexproxies.com:8080" time.sleep(random.uniform(2.0, 5.0))

with httpx.Client(proxy=proxy, timeout=30) as client: resp = client.get(url, headers={ "User-Agent": random.choice(USER_AGENTS), "Accept": "text/html,application/xhtml+xml", "Accept-Encoding": "gzip, deflate, br", }) if not is_captcha_page(resp.text) and resp.status_code == 200: return resp.text return "" ```

Rate Limiting Best Practices for Amazon

  • **Never exceed 1 request per 3 seconds** per proxy session to Amazon
  • **Rotate IPs per request** for catalog scraping
  • **Use sticky sessions** for consecutive pages of the same product
  • Vary request patterns — do not fetch products in alphabetical or sequential order
  • **Include Referer headers** that mimic natural navigation

Hex Proxies residential network provides the IP diversity needed for Amazon scraping at scale. With country-level targeting across 195+ countries, you can monitor every Amazon marketplace from locally appropriate IP addresses.

Tips

  • *Use residential proxies for Amazon — datacenter IPs are blocked almost immediately.
  • *Add 3-6 second random delays between requests to avoid triggering rate limits.
  • *Rotate proxy sessions on CAPTCHA detection — same IP retries will not solve the CAPTCHA.
  • *Use country-targeted proxies to access region-specific Amazon marketplaces.
  • *Monitor CAPTCHA rates — increasing CAPTCHAs mean you need to slow down or improve your request patterns.

Ready to Get Started?

Put this guide into practice with Hex Proxies.

Cookie Preferences

We use cookies to ensure the best experience. You can customize your preferences below. Learn more