How to Collect Job Market Data with Proxies
Job market data drives salary benchmarking, talent strategy, competitive intelligence, and labor market research. Collecting data from Indeed, LinkedIn Jobs, Glassdoor, and other job boards at scale requires proxy infrastructure to manage rate limits and anti-bot defenses.
**Disclaimer**: Review each platform's Terms of Service. Use official APIs where available (Indeed Publisher API, LinkedIn Jobs API). This guide covers proxy configuration for legitimate data access.
Job Market Data Architecture
import httpx
import time
import random@dataclass(frozen=True) class JobListing: title: str company: str location: str salary_range: str posted_date: str source: str url: str
def search_jobs( query: str, location: str, proxy: str, source: str = "indeed", ) -> list[JobListing]: """Search for job listings through proxy.""" time.sleep(random.uniform(5.0, 10.0))
search_url = f"https://www.indeed.com/jobs?q={query}&l={location}" with httpx.Client(proxy=proxy, timeout=30, follow_redirects=True) as client: resp = client.get(search_url, headers={ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36", "Accept": "text/html,application/xhtml+xml", "Accept-Encoding": "gzip, deflate, br", }) # Parse job listings from response return [] ```
Multi-Market Salary Benchmarking
MARKETS = [
{"city": "San Francisco", "state": "CA"},
{"city": "New York", "state": "NY"},
{"city": "Austin", "state": "TX"},
{"city": "Seattle", "state": "WA"},
{"city": "Chicago", "state": "IL"},def benchmark_salaries( job_title: str, username: str, password: str, ) -> dict[str, list[JobListing]]: """Compare salary ranges for a role across markets.""" proxy = f"http://{username}-country-us:{password}@gate.hexproxies.com:8080" results: dict[str, list[JobListing]] = {}
for market in MARKETS: location = f"{market['city']}, {market['state']}" listings = search_jobs(job_title, location, proxy) results = {**results, location: listings} time.sleep(random.uniform(10.0, 20.0))
return results ```
Competitive Hiring Intelligence
@dataclass(frozen=True)
class HiringTrend:
company: str
open_positions: int
top_roles: list[str]
locations: list[str]def track_competitor_hiring( companies: list[str], proxy: str, ) -> list[HiringTrend]: """Monitor competitor hiring activity.""" from datetime import datetime trends: list[HiringTrend] = [] for company in companies: time.sleep(random.uniform(8.0, 15.0)) # Fetch company jobs page and count listings trends = [*trends, HiringTrend( company=company, open_positions=0, top_roles=[], locations=[], collected_at=datetime.utcnow().isoformat(), )] return trends ```
Best Practices
- **Residential proxies** for job boards — they block datacenter IPs
- **5-15 second delays** between searches
- **Use official APIs** (Indeed Publisher, LinkedIn Jobs API) when possible
- **Rotate sessions** per search to avoid tracking
- **US country targeting** for US job market data
Hex Proxies residential network provides the IP diversity and geographic targeting needed for comprehensive job market intelligence across all major platforms.