How to Collect Job Market Data with Proxies

Job market data drives salary benchmarking, talent strategy, competitive intelligence, and labor market research. Collecting data from Indeed, LinkedIn Jobs, Glassdoor, and other job boards at scale requires proxy infrastructure to manage rate limits and anti-bot defenses.

Disclaimer: Review each platform's Terms of Service. Use official APIs where available (Indeed Publisher API, LinkedIn Jobs API). This guide covers proxy configuration for legitimate data access.

Job Market Data Architecture

import httpx
import time
import random
from dataclasses import dataclass

@dataclass(frozen=True)
class JobListing:
    title: str
    company: str
    location: str
    salary_range: str
    posted_date: str
    source: str
    url: str

def search_jobs(
    query: str,
    location: str,
    proxy: str,
    source: str = "indeed",
) -> list[JobListing]:
    """Search for job listings through proxy."""
    time.sleep(random.uniform(5.0, 10.0))

    search_url = f"https://www.indeed.com/jobs?q={query}&l={location}"
    with httpx.Client(proxy=proxy, timeout=30, follow_redirects=True) as client:
        resp = client.get(search_url, headers={
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
            "Accept": "text/html,application/xhtml+xml",
            "Accept-Encoding": "gzip, deflate, br",
        })
        # Parse job listings from response
        return []

Multi-Market Salary Benchmarking

MARKETS = [
    {"city": "San Francisco", "state": "CA"},
    {"city": "New York", "state": "NY"},
    {"city": "Austin", "state": "TX"},
    {"city": "Seattle", "state": "WA"},
    {"city": "Chicago", "state": "IL"},
]

def benchmark_salaries(
    job_title: str,
    username: str,
    password: str,
) -> dict[str, list[JobListing]]:
    """Compare salary ranges for a role across markets."""
    proxy = f"http://{username}-country-us:{password}@gate.hexproxies.com:8080"
    results: dict[str, list[JobListing]] = {}

    for market in MARKETS:
        location = f"{market['city']}, {market['state']}"
        listings = search_jobs(job_title, location, proxy)
        results = {**results, location: listings}
        time.sleep(random.uniform(10.0, 20.0))

    return results

Competitive Hiring Intelligence

@dataclass(frozen=True)
class HiringTrend:
    company: str
    open_positions: int
    top_roles: list[str]
    locations: list[str]
    collected_at: str

def track_competitor_hiring(
    companies: list[str],
    proxy: str,
) -> list[HiringTrend]:
    """Monitor competitor hiring activity."""
    from datetime import datetime
    trends: list[HiringTrend] = []
    for company in companies:
        time.sleep(random.uniform(8.0, 15.0))
        # Fetch company jobs page and count listings
        trends = [*trends, HiringTrend(
            company=company,
            open_positions=0,
            top_roles=[],
            locations=[],
            collected_at=datetime.utcnow().isoformat(),
        )]
    return trends

Best Practices

Residential proxies for job boards — they block datacenter IPs
5-15 second delays between searches
Use official APIs (Indeed Publisher, LinkedIn Jobs API) when possible
Rotate sessions per search to avoid tracking
US country targeting for US job market data

Hex Proxies residential network provides the IP diversity and geographic targeting needed for comprehensive job market intelligence across all major platforms.

Proxies for Job Market Data

Prerequisites

Steps

Configure residential proxies

Build job search collector

Add salary benchmarking

Implement competitor monitoring

Schedule collection