How to Scrape Amazon with Proxies
Amazon is one of the most heavily protected e-commerce platforms. Their anti-bot system combines IP fingerprinting, behavioral analysis, CAPTCHA challenges, and rate limiting. Successful Amazon scraping at scale requires intelligent proxy rotation, realistic request patterns, and robust error handling.
**Disclaimer**: Always review Amazon's Terms of Service before scraping. This guide covers technical implementation only. Ensure your data collection practices comply with applicable laws and platform policies.
Why Amazon Scraping Needs Proxies
Amazon blocks scraping aggressively: - Single-IP scrapers get CAPTCHAs after 20-50 requests - Datacenter IPs are identified and blocked within minutes - Rate limits are enforced per IP, per session, and per account - Geographic pricing varies by marketplace (US, UK, DE, JP)
Proxy Strategy for Amazon
| Amazon Task | Proxy Type | Session | Why | |---|---|---|---| | Product listings | Residential rotating | Per request | IP diversity avoids blocks | | Price monitoring | ISP sticky | Per product | Consistent identity per check | | Review scraping | Residential rotating | Per page | Volume requires rotation | | Seller data | Residential rotating | Per request | Broad access needed | | Marketplace comparison | Residential geo-targeted | Per country | Different data per region |
Basic Amazon Scraper
import httpx
import random
import time
from dataclasses import dataclass@dataclass(frozen=True) class AmazonProduct: asin: str title: str price: str rating: str review_count: str url: str
USER_AGENTS = [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0", ]
def scrape_product(asin: str, proxy: str) -> AmazonProduct: url = f"https://www.amazon.com/dp/{asin}" headers = { "User-Agent": random.choice(USER_AGENTS), "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en-US,en;q=0.9", "Accept-Encoding": "gzip, deflate, br", }
time.sleep(random.uniform(2.0, 5.0))
with httpx.Client(proxy=proxy, timeout=30, follow_redirects=True) as client: resp = client.get(url, headers=headers) if resp.status_code != 200: return AmazonProduct(asin=asin, title="", price="", rating="", review_count="", url=url)
soup = BeautifulSoup(resp.text, "html.parser") title_el = soup.select_one("#productTitle") price_el = soup.select_one(".a-price .a-offscreen") rating_el = soup.select_one("#acrPopover") review_el = soup.select_one("#acrCustomerReviewText")
return AmazonProduct( asin=asin, title=title_el.text.strip() if title_el else "", price=price_el.text.strip() if price_el else "", rating=rating_el.get("title", "").strip() if rating_el else "", review_count=review_el.text.strip() if review_el else "", url=url, ) ```
Multi-Marketplace Price Comparison
MARKETPLACES = {
"US": {"domain": "amazon.com", "country": "us"},
"UK": {"domain": "amazon.co.uk", "country": "gb"},
"DE": {"domain": "amazon.de", "country": "de"},
"JP": {"domain": "amazon.co.jp", "country": "jp"},def compare_prices(asin: str, username: str, password: str) -> list[dict]: results = [] for market, config in MARKETPLACES.items(): proxy = f"http://{username}-country-{config['country']}:{password}@gate.hexproxies.com:8080" url = f"https://www.{config['domain']}/dp/{asin}"
time.sleep(random.uniform(3.0, 6.0))
try: with httpx.Client(proxy=proxy, timeout=30, follow_redirects=True) as client: resp = client.get(url, headers={ "User-Agent": random.choice(USER_AGENTS), "Accept": "text/html,application/xhtml+xml", "Accept-Encoding": "gzip, deflate, br", }) soup = BeautifulSoup(resp.text, "html.parser") price_el = soup.select_one(".a-price .a-offscreen") results = [*results, { "marketplace": market, "price": price_el.text.strip() if price_el else "N/A", "status": resp.status_code, }] except Exception as e: results = [*results, {"marketplace": market, "price": "error", "status": 0}] return results ```
CAPTCHA Detection and Handling
def is_captcha_page(html: str) -> bool:
"""Detect if Amazon returned a CAPTCHA challenge."""
captcha_signals = [
"Type the characters you see",
"Enter the characters you see below",
"/captcha/",
"robot check",
]
html_lower = html.lower()def scrape_with_captcha_handling(url: str, username: str, password: str, max_retries: int = 5) -> str: for attempt in range(max_retries): session_id = f"amz-{attempt}-{int(time.time())}" proxy = f"http://{username}-session-{session_id}:{password}@gate.hexproxies.com:8080" time.sleep(random.uniform(2.0, 5.0))
with httpx.Client(proxy=proxy, timeout=30) as client: resp = client.get(url, headers={ "User-Agent": random.choice(USER_AGENTS), "Accept": "text/html,application/xhtml+xml", "Accept-Encoding": "gzip, deflate, br", }) if not is_captcha_page(resp.text) and resp.status_code == 200: return resp.text return "" ```
Rate Limiting Best Practices for Amazon
- **Never exceed 1 request per 3 seconds** per proxy session to Amazon
- **Rotate IPs per request** for catalog scraping
- **Use sticky sessions** for consecutive pages of the same product
- Vary request patterns — do not fetch products in alphabetical or sequential order
- **Include Referer headers** that mimic natural navigation
Hex Proxies residential network provides the IP diversity needed for Amazon scraping at scale. With country-level targeting across 195+ countries, you can monitor every Amazon marketplace from locally appropriate IP addresses.