How to Use Proxies with LangChain

Proxies for LangChain Applications

LangChain is the dominant framework for building LLM-powered applications. Many LangChain workflows — document loaders, web research agents, retrieval chains — need to fetch data from external websites. Without proxy infrastructure, these requests hit rate limits, geographic blocks, and anti-bot defenses.

LangChain Components That Need Proxies

WebBaseLoader: Fetches HTML from URLs for document ingestion
RecursiveUrlLoader: Crawls entire sites for knowledge base construction
WebResearchRetriever: Searches the web and fetches results in real-time
Custom Tools: Agent tools that call external APIs or scrape data

Configuring WebBaseLoader with Proxies

LangChain's WebBaseLoader uses requests under the hood. Pass proxy configuration through the session:

import requests
from langchain_community.document_loaders import WebBaseLoader

def create_proxied_session(username: str, password: str) -> requests.Session:
    """Create a requests session configured with Hex Proxies."""
    session = requests.Session()
    proxy_url = f"http://{username}:{password}@gate.hexproxies.com:8080"
    session.proxies = {"http": proxy_url, "https": proxy_url}
    session.headers.update({
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"
    })
    return session

# Use with WebBaseLoader
session = create_proxied_session("YOUR_USER", "YOUR_PASS")
loader = WebBaseLoader(
    web_paths=["https://example.com/page1", "https://example.com/page2"],
    requests_kwargs={"proxies": session.proxies},
)
docs = loader.load()

Custom Proxy-Aware Tool for Agents

Build a LangChain tool that routes all web requests through proxies:

from langchain.tools import tool
import httpx

PROXY_URL = "http://YOUR_USER:YOUR_PASS@gate.hexproxies.com:8080"

@tool
def fetch_webpage(url: str) -> str:
    """Fetch a webpage through proxy infrastructure. Use for any URL that needs scraping."""
    with httpx.Client(proxy=PROXY_URL, timeout=30, follow_redirects=True) as client:
        resp = client.get(url, headers={
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
            "Accept": "text/html,application/xhtml+xml",
        })
        resp.raise_for_status()
        return resp.text[:10000]  # Limit context size for LLM

@tool
def fetch_api_data(url: str) -> str:
    """Fetch JSON data from an API through proxy. Use for structured data endpoints."""
    with httpx.Client(proxy=PROXY_URL, timeout=30) as client:
        resp = client.get(url, headers={"Accept": "application/json"})
        resp.raise_for_status()
        return resp.text[:5000]

Geo-Targeted Research Agent

Build an agent that can research topics from specific geographic perspectives:

from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate

def build_geo_proxy(country: str) -> str:
    return f"http://YOUR_USER-country-{country.lower()}:YOUR_PASS@gate.hexproxies.com:8080"

@tool
def fetch_geo_content(url: str, country: str = "US") -> str:
    """Fetch content as seen from a specific country. Useful for regional pricing or localized content."""
    proxy = build_geo_proxy(country)
    with httpx.Client(proxy=proxy, timeout=30) as client:
        resp = client.get(url)
        return resp.text[:8000]

llm = ChatOpenAI(model="gpt-4.1-mini")
tools = [fetch_webpage, fetch_api_data, fetch_geo_content]
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a research assistant with web access through proxies."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

RecursiveUrlLoader for Knowledge Bases

When building a RAG knowledge base that requires crawling entire documentation sites:

from langchain_community.document_loaders import RecursiveUrlLoader
from bs4 import BeautifulSoup

def bs4_extractor(html: str) -> str:
    soup = BeautifulSoup(html, "html.parser")
    return soup.get_text(separator="\n", strip=True)

loader = RecursiveUrlLoader(
    url="https://docs.example.com",
    max_depth=3,
    extractor=bs4_extractor,
    requests_kwargs={
        "proxies": {
            "http": PROXY_URL,
            "https": PROXY_URL,
        }
    },
)
docs = loader.load()

Performance Tips for LangChain + Proxies

LangChain agents can make many sequential web requests. Use ISP proxies for the lowest latency — their sub-50ms response time keeps agent chains fast. For broad web research that hits many different domains, residential rotating proxies provide the IP diversity needed to avoid blocks.

Hex Proxies' multi-Gbps capacity ensures your LangChain agents are never bottlenecked by proxy throughput.

Proxies for LangChain Applications

Prerequisites

Steps

Install dependencies

Configure document loaders

Build proxy-aware tools

Assemble the agent

Optimize for production

Proxies for LangChain Applications

LangChain Components That Need Proxies

Configuring WebBaseLoader with Proxies

Custom Proxy-Aware Tool for Agents

Geo-Targeted Research Agent

RecursiveUrlLoader for Knowledge Bases

Performance Tips for LangChain + Proxies

Tips

Ready to Get Started?

Related Resources

Best Proxies for Web Scraping in 2026

How Many Proxies Do I Need? Sizing Guide by Use Case

How to Set Up Rotating Proxies in Python

Mechanize (Ruby) Integration

Proxies for AI Agent Web Access

Residential Proxies