v1.8.91-d84675c
ProxiesTravelCase Study

How Travel Companies Use Proxies for Fare Aggregation and Parity Checks

11 min read

By Hex Proxies Engineering Team

How Travel Companies Use Proxies for Fare Aggregation and Parity Checks

The travel industry runs on data asymmetry. An airline's revenue management system adjusts prices hundreds of times per day based on demand signals, competitor pricing, fuel costs, and seat inventory. A hotel chain's rates shift based on occupancy forecasts, event calendars, and distribution channel strategy. The companies that can observe these price movements in real time -- across every competitor, every market, and every booking channel -- hold a structural advantage.

Fare aggregation and rate parity monitoring are the two primary proxy-dependent workflows in travel technology. Aggregators like Kayak, Google Flights, and Skyscanner built their businesses on the ability to collect pricing data from hundreds of sources simultaneously. Smaller OTAs, corporate travel platforms, and hotel revenue teams depend on the same capability, implemented through proxy infrastructure.

This guide covers the engineering behind these systems, including how anti-bot protection on airline and hotel sites affects proxy selection, how geographic routing produces accurate localized pricing, and how to architect a fare collection pipeline that scales. For proxy fundamentals, see our travel industry page and fare aggregation use case.

The Travel Pricing Landscape: Why Proxies Are Non-Negotiable

Airlines: Dynamic Pricing at Extreme Scale

Major airlines adjust pricing on the order of millions of fare changes per day across their route networks. The pricing varies by:

  • Point of sale (POS): The same flight from New York to London is priced differently for a customer searching from the US vs the UK vs India. This is legal geographic price discrimination based on market willingness to pay.
  • Distribution channel: Prices on the airline's direct website may differ from prices on OTAs, metasearch engines, and GDS systems.
  • Device type: Some airlines serve different prices to mobile vs desktop users.
  • Search history: Repeated searches for the same route can trigger price increases (the "I see you looking" strategy, though airlines deny this publicly).
  • Time of day: Revenue management systems adjust prices in response to real-time demand signals.
For a fare aggregator monitoring 50 airlines across 10,000 popular routes, this creates a matrix of millions of price points that change continuously. The only way to capture this data is through automated requests, and airlines deploy aggressive anti-bot systems to prevent exactly that.

Hotels: Rate Parity and Channel Management

Hotel pricing adds a distribution complexity layer. A hotel room might be available through:

  • The hotel's direct website
  • Booking.com, Expedia, Hotels.com
  • Metasearch engines (Trivago, Google Hotels)
  • Wholesale and opaque channels
  • Corporate and negotiated rate programs
Rate parity is the contractual obligation (in many markets) for hotels to offer the same rate across all public distribution channels. In practice, rate parity violations are widespread. A hotel might offer a lower rate on Booking.com than on its own website, or vice versa. Revenue managers and OTAs both monitor for these violations.

Detecting rate parity violations requires checking the same room type on multiple channels from the same geographic location simultaneously. Proxies make this geographic consistency possible.

Anti-Bot Protection in Travel

Travel websites are among the most aggressively protected sites on the internet. The reason is economic: every automated price query costs the site real money (server resources, GDS query fees) without generating revenue.

Site CategoryTypical ProtectionDetection Sophistication
Major airlines (United, Lufthansa, Emirates)Akamai, PerimeterX, custom solutionsVery high
Major OTAs (Booking.com, Expedia)DataDome, Cloudflare EnterpriseVery high
Metasearch engines (Kayak, Skyscanner)Moderate Cloudflare, rate limitingHigh
Regional airlines (low-cost carriers)Basic Cloudflare, simple rate limitingModerate
Independent hotelsMinimal to noneLow
The protection level directly determines proxy requirements. Major airlines and OTAs require residential proxies because they actively filter ISP and datacenter IP ranges. Regional carriers and independent hotels can often be scraped with ISP proxies for better speed and lower cost.

Case Study: Multi-Market Flight Fare Monitoring

The Business Problem

A mid-size OTA needs to monitor round-trip flight prices on 50 airlines across 500 popular route pairs, checking prices from 8 geographic markets (US, UK, Germany, France, Japan, Australia, Brazil, India) to capture point-of-sale pricing variation. Prices need to be refreshed every 2 hours during peak booking windows (6 AM - 11 PM local time).

The Data Volume

Routes × Airlines × Markets × Checks/day = Daily queries
500 × 50 × 8 × 9 = 1,800,000 queries/day

Not every airline serves every route, so the actual query volume is approximately 400,000 per day after filtering to airline-route combinations that exist.

The Proxy Architecture

import requests
import random
import time
from datetime import datetime

# Market-specific proxy configuration
MARKET_PROXIES = {
    "us": "http://USER-country-us:PASS@gate.hexproxies.com:8080",
    "gb": "http://USER-country-gb:PASS@gate.hexproxies.com:8080",
    "de": "http://USER-country-de:PASS@gate.hexproxies.com:8080",
    "fr": "http://USER-country-fr:PASS@gate.hexproxies.com:8080",
    "jp": "http://USER-country-jp:PASS@gate.hexproxies.com:8080",
    "au": "http://USER-country-au:PASS@gate.hexproxies.com:8080",
    "br": "http://USER-country-br:PASS@gate.hexproxies.com:8080",
    "in": "http://USER-country-in:PASS@gate.hexproxies.com:8080",
}

# Airline protection levels determine proxy routing
AIRLINE_PROTECTION = {
    "united": "high",
    "delta": "high",
    "lufthansa": "high",
    "emirates": "high",
    "ryanair": "medium",
    "easyjet": "medium",
    "southwest": "high",
    "jetblue": "medium",
}

def query_flight_fare(
    airline: str,
    origin: str,
    destination: str,
    departure_date: str,
    return_date: str,
    market: str,
) -> dict:
    """
    Query a specific airline for fare data from a specific market.
    Uses residential proxies for all travel sites due to aggressive protection.
    """
    proxy_url = MARKET_PROXIES[market]
    proxy = {"http": proxy_url, "https": proxy_url}
    
    headers = {
        "User-Agent": random.choice(BROWSER_USER_AGENTS),
        "Accept-Language": get_locale_header(market),
        "Accept-Currency": get_currency(market),
    }
    
    # The actual fare query implementation varies by airline
    # Some use APIs, some require browser rendering
    fare_data = execute_airline_query(
        airline, origin, destination, 
        departure_date, return_date,
        proxy, headers,
    )
    
    return {
        "airline": airline,
        "route": f"{origin}-{destination}",
        "market": market,
        "fare": fare_data.get("price"),
        "currency": fare_data.get("currency"),
        "cabin_class": fare_data.get("cabin"),
        "timestamp": datetime.utcnow().isoformat(),
    }

Cost Analysis

400,000 daily queries at an average response size of 200 KB = 80 GB per day, or 2,400 GB per month.

At Hex Proxies residential pricing of $4.25-$4.75/GB: $10,200-$11,400/month in proxy costs.

This seems high until you consider the business context: the OTA's fare data feeds its pricing engine, which generates millions in booking revenue. The proxy cost is a fraction of a percent of the revenue it enables.

Optimization: Reducing Bandwidth by 60-70%

The raw bandwidth number above assumes full HTML page loads for every query. Several optimization techniques dramatically reduce bandwidth:

1. API endpoints over page loads. Many airline websites use internal APIs to fetch pricing data. These API responses are 5-20 KB (JSON) vs 200+ KB (full HTML page). Identifying and targeting these API endpoints reduces bandwidth by 90% per query.

2. Compression. Ensure requests include Accept-Encoding: gzip, deflate, br. Most airline APIs return compressed responses, reducing transfer by 60-80%.

3. Differential monitoring. Cache the last known fare for each route-market combination. Only process full responses when the fare has changed. This does not reduce proxy bandwidth but reduces downstream processing costs.

4. Smart scheduling. Not all routes need equal monitoring frequency. Popular routes (JFK-LHR, LAX-NRT) change prices more frequently than niche routes. Monitor high-traffic routes every 2 hours and niche routes every 6-12 hours.

With these optimizations, the effective bandwidth drops to approximately 700-900 GB/month, reducing proxy costs to $2,975-$4,275/month.

Case Study: Hotel Rate Parity Monitoring

The Business Problem

A hotel chain with 500 properties across 30 countries needs to monitor its own rates across 5 distribution channels (own website, Booking.com, Expedia, Hotels.com, Agoda) to detect rate parity violations. Each property has an average of 4 room types. Rates need to be checked from the local market (matching the hotel's geographic location).

Architecture: Simultaneous Multi-Channel Checks

Rate parity detection requires checking the same room, same dates, same occupancy across all channels within a tight time window. If Channel A is checked at 9:00 AM and Channel B at 9:30 AM, a price change between those times creates a false parity violation.

import concurrent.futures
import time

def check_rate_parity(
    hotel_id: str,
    room_type: str,
    check_in: str,
    check_out: str,
    market: str,
) -> dict:
    """
    Check rates for the same room across all channels simultaneously.
    Concurrent execution minimizes the time window between checks.
    """
    channels = {
        "direct": f"https://www.hotelchain.com/book/{hotel_id}",
        "booking": f"https://www.booking.com/hotel/{hotel_id}",
        "expedia": f"https://www.expedia.com/hotel/{hotel_id}",
        "hotels_com": f"https://www.hotels.com/hotel/{hotel_id}",
        "agoda": f"https://www.agoda.com/hotel/{hotel_id}",
    }
    
    proxy_url = MARKET_PROXIES[market]
    proxy = {"http": proxy_url, "https": proxy_url}
    
    results = {}
    
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        futures = {
            executor.submit(
                fetch_rate, channel_url, room_type, 
                check_in, check_out, proxy
            ): channel_name
            for channel_name, channel_url in channels.items()
        }
        
        for future in concurrent.futures.as_completed(futures):
            channel = futures[future]
            try:
                rate = future.result()
                results[channel] = rate
            except Exception as e:
                results[channel] = {"error": str(e)}
    
    # Detect parity violations
    valid_rates = {k: v["rate"] for k, v in results.items() if "rate" in v}
    
    if valid_rates:
        min_rate = min(valid_rates.values())
        max_rate = max(valid_rates.values())
        parity_violation = (max_rate - min_rate) / min_rate > 0.02  # 2% threshold
    else:
        parity_violation = None
    
    return {
        "hotel_id": hotel_id,
        "room_type": room_type,
        "market": market,
        "rates": results,
        "parity_violation": parity_violation,
        "timestamp": time.time(),
    }

The proxy requirement: All five channel checks must originate from the same geographic market. This ensures the price comparison is valid. A rate from Booking.com's US view vs Expedia's UK view is not a valid parity comparison because the rates legitimately differ by market.

Cost Analysis

500 properties x 4 room types x 5 channels x 3 checks/day = 30,000 queries/day

At 150 KB per query (OTA pages are heavy with images and JavaScript): 4.5 GB/day, or 135 GB/month.

At $4.25-$4.75/GB: $573.75-$641.25/month for comprehensive rate parity monitoring across 500 properties.

Handling Travel-Specific Anti-Bot Challenges

Challenge 1: CAPTCHA Walls on Airlines

Major airlines frequently serve CAPTCHAs on fare search pages. The solution is a combination of:

  • Residential proxies to minimize CAPTCHA frequency (ISP/datacenter IPs trigger CAPTCHAs at 3-5x the rate)
  • Human-like request patterns including randomized delays, realistic header sets, and cookie persistence
  • CAPTCHA solving fallback for the small percentage of requests that still trigger challenges
With properly configured residential proxies, Hex Proxies internal testing shows CAPTCHA rates below 3% on major airline sites. Without proxies, the CAPTCHA rate exceeds 40%.

Challenge 2: JavaScript-Rendered Pricing

Modern travel sites load pricing data asynchronously via JavaScript. A simple HTTP request returns a page skeleton without prices. You need either:

Option A: Headless browser. Use Playwright or Puppeteer with proxy configuration to render the page fully. This produces accurate results but consumes 3-8 MB per page load.

Option B: API reverse engineering. Identify the XHR/fetch requests that the JavaScript makes to retrieve pricing data, and replicate those API calls directly. This is more complex to set up but uses 90% less bandwidth.

Most production fare aggregation systems use Option B for established airlines (where the API endpoints are known and stable) and Option A as a fallback for new sources or when APIs change.

Challenge 3: Session-Based Pricing Manipulation

Some travel sites increase prices for repeated searches from the same session. The solution: use per-request IP rotation and clear cookies between queries. With Hex Proxies residential proxies, per-request rotation is the default -- each query automatically uses a fresh IP with no session history.

Frequently Asked Questions

Why can't I just use airline APIs (Amadeus, Sabre) instead of scraping?

GDS APIs have limited coverage (not all carriers, not all fare classes), significant per-query costs ($0.01-$0.05 per query), and do not capture web-exclusive fares or dynamic pricing variations by point-of-sale. Web collection through proxies captures the full picture that consumers see.

How do I handle currency conversion in multi-market monitoring?

Collect fares in the local currency as displayed on the site, then normalize to a base currency (usually USD) using the same exchange rate source. Do not rely on the site's currency converter, as this adds noise. The proxy's geographic targeting ensures you see the natural local currency presentation.

What success rates should I expect on major airline sites?

With residential proxies and well-configured requests: 92-97% success rate on major airlines. The remaining 3-8% consists of CAPTCHAs (1-3%), temporary blocks (1-2%), and timeout/server errors (1-3%). Build retry logic for these failures. See our regional access testing page for more on handling geographic restrictions.

Is fare scraping legal?

This varies by jurisdiction and by the specific terms of service of each site. The US legal landscape (post-hiQ v. LinkedIn) generally permits scraping publicly available data. EU data protection regulations add additional considerations. Consult legal counsel for your specific use case. Our compliance page covers the legal landscape in more detail.


Build your fare aggregation infrastructure on Hex Proxies residential proxies. With IPs in 195+ countries, per-request rotation, and $4.25-$4.75/GB pricing, you can monitor airline and hotel pricing across every market that matters. Explore our travel industry page for more use cases, or visit the fare aggregation page for setup guides.

Cookie Preferences

We use cookies to ensure the best experience. You can customize your preferences below. Learn more