Proxies for LangChain Applications
LangChain is the dominant framework for building LLM-powered applications. Many LangChain workflows — document loaders, web research agents, retrieval chains — need to fetch data from external websites. Without proxy infrastructure, these requests hit rate limits, geographic blocks, and anti-bot defenses.
LangChain Components That Need Proxies
- WebBaseLoader: Fetches HTML from URLs for document ingestion
- RecursiveUrlLoader: Crawls entire sites for knowledge base construction
- WebResearchRetriever: Searches the web and fetches results in real-time
- Custom Tools: Agent tools that call external APIs or scrape data
Configuring WebBaseLoader with Proxies
LangChain's WebBaseLoader uses requests under the hood. Pass proxy configuration through the session:
import requests
from langchain_community.document_loaders import WebBaseLoaderdef create_proxied_session(username: str, password: str) -> requests.Session: """Create a requests session configured with Hex Proxies.""" session = requests.Session() proxy_url = f"http://{username}:{password}@gate.hexproxies.com:8080" session.proxies = {"http": proxy_url, "https": proxy_url} session.headers.update({ "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36" }) return session
Use with WebBaseLoader session = create_proxied_session("YOUR_USER", "YOUR_PASS") loader = WebBaseLoader( web_paths=["https://example.com/page1", "https://example.com/page2"], requests_kwargs={"proxies": session.proxies}, ) docs = loader.load() ```
Custom Proxy-Aware Tool for Agents
Build a LangChain tool that routes all web requests through proxies:
from langchain.tools import tool
import httpxPROXY_URL = "http://YOUR_USER:YOUR_PASS@gate.hexproxies.com:8080"
@tool def fetch_webpage(url: str) -> str: """Fetch a webpage through proxy infrastructure. Use for any URL that needs scraping.""" with httpx.Client(proxy=PROXY_URL, timeout=30, follow_redirects=True) as client: resp = client.get(url, headers={ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36", "Accept": "text/html,application/xhtml+xml", }) resp.raise_for_status() return resp.text[:10000] # Limit context size for LLM
@tool def fetch_api_data(url: str) -> str: """Fetch JSON data from an API through proxy. Use for structured data endpoints.""" with httpx.Client(proxy=PROXY_URL, timeout=30) as client: resp = client.get(url, headers={"Accept": "application/json"}) resp.raise_for_status() return resp.text[:5000] ```
Geo-Targeted Research Agent
Build an agent that can research topics from specific geographic perspectives:
from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplatedef build_geo_proxy(country: str) -> str: return f"http://YOUR_USER-country-{country.lower()}:YOUR_PASS@gate.hexproxies.com:8080"
@tool def fetch_geo_content(url: str, country: str = "US") -> str: """Fetch content as seen from a specific country. Useful for regional pricing or localized content.""" proxy = build_geo_proxy(country) with httpx.Client(proxy=proxy, timeout=30) as client: resp = client.get(url) return resp.text[:8000]
llm = ChatOpenAI(model="gpt-4.1-mini") tools = [fetch_webpage, fetch_api_data, fetch_geo_content] prompt = ChatPromptTemplate.from_messages([ ("system", "You are a research assistant with web access through proxies."), ("human", "{input}"), ("placeholder", "{agent_scratchpad}"), ]) agent = create_tool_calling_agent(llm, tools, prompt) executor = AgentExecutor(agent=agent, tools=tools, verbose=True) ```
RecursiveUrlLoader for Knowledge Bases
When building a RAG knowledge base that requires crawling entire documentation sites:
from langchain_community.document_loaders import RecursiveUrlLoader
from bs4 import BeautifulSoupdef bs4_extractor(html: str) -> str: soup = BeautifulSoup(html, "html.parser") return soup.get_text(separator="\n", strip=True)
loader = RecursiveUrlLoader( url="https://docs.example.com", max_depth=3, extractor=bs4_extractor, requests_kwargs={ "proxies": { "http": PROXY_URL, "https": PROXY_URL, } }, ) docs = loader.load() ```
Performance Tips for LangChain + Proxies
LangChain agents can make many sequential web requests. Use ISP proxies for the lowest latency — their sub-50ms response time keeps agent chains fast. For broad web research that hits many different domains, residential rotating proxies provide the IP diversity needed to avoid blocks.
Hex Proxies' multi-Gbps capacity ensures your LangChain agents are never bottlenecked by proxy throughput.