Proxies for LangChain Applications
LangChain is the dominant framework for building LLM-powered applications. Many LangChain workflows — document loaders, web research agents, retrieval chains — need to fetch data from external websites. Without proxy infrastructure, these requests hit rate limits, geographic blocks, and anti-bot defenses.
LangChain Components That Need Proxies
- **WebBaseLoader**: Fetches HTML from URLs for document ingestion
- **RecursiveUrlLoader**: Crawls entire sites for knowledge base construction
- **WebResearchRetriever**: Searches the web and fetches results in real-time
- **Custom Tools**: Agent tools that call external APIs or scrape data
Configuring WebBaseLoader with Proxies
LangChain's WebBaseLoader uses `requests` under the hood. Pass proxy configuration through the session:
import requestsdef create_proxied_session(username: str, password: str) -> requests.Session: """Create a requests session configured with Hex Proxies.""" session = requests.Session() proxy_url = f"http://{username}:{password}@gate.hexproxies.com:8080" session.proxies = {"http": proxy_url, "https": proxy_url} session.headers.update({ "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36" }) return session
# Use with WebBaseLoader session = create_proxied_session("YOUR_USER", "YOUR_PASS") loader = WebBaseLoader( web_paths=["https://example.com/page1", "https://example.com/page2"], requests_kwargs={"proxies": session.proxies}, ) docs = loader.load() ```
Custom Proxy-Aware Tool for Agents
Build a LangChain tool that routes all web requests through proxies:
from langchain.tools import toolPROXY_URL = "http://YOUR_USER:YOUR_PASS@gate.hexproxies.com:8080"
@tool def fetch_webpage(url: str) -> str: """Fetch a webpage through proxy infrastructure. Use for any URL that needs scraping.""" with httpx.Client(proxy=PROXY_URL, timeout=30, follow_redirects=True) as client: resp = client.get(url, headers={ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36", "Accept": "text/html,application/xhtml+xml", }) resp.raise_for_status() return resp.text[:10000] # Limit context size for LLM
@tool def fetch_api_data(url: str) -> str: """Fetch JSON data from an API through proxy. Use for structured data endpoints.""" with httpx.Client(proxy=PROXY_URL, timeout=30) as client: resp = client.get(url, headers={"Accept": "application/json"}) resp.raise_for_status() return resp.text[:5000] ```
Geo-Targeted Research Agent
Build an agent that can research topics from specific geographic perspectives:
from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent, AgentExecutordef build_geo_proxy(country: str) -> str: return f"http://YOUR_USER-country-{country.lower()}:YOUR_PASS@gate.hexproxies.com:8080"
@tool def fetch_geo_content(url: str, country: str = "US") -> str: """Fetch content as seen from a specific country. Useful for regional pricing or localized content.""" proxy = build_geo_proxy(country) with httpx.Client(proxy=proxy, timeout=30) as client: resp = client.get(url) return resp.text[:8000]
llm = ChatOpenAI(model="gpt-4o") tools = [fetch_webpage, fetch_api_data, fetch_geo_content] prompt = ChatPromptTemplate.from_messages([ ("system", "You are a research assistant with web access through proxies."), ("human", "{input}"), ("placeholder", "{agent_scratchpad}"), ]) agent = create_tool_calling_agent(llm, tools, prompt) executor = AgentExecutor(agent=agent, tools=tools, verbose=True) ```
RecursiveUrlLoader for Knowledge Bases
When building a RAG knowledge base that requires crawling entire documentation sites:
from langchain_community.document_loaders import RecursiveUrlLoaderdef bs4_extractor(html: str) -> str: soup = BeautifulSoup(html, "html.parser") return soup.get_text(separator="\n", strip=True)
loader = RecursiveUrlLoader( url="https://docs.example.com", max_depth=3, extractor=bs4_extractor, requests_kwargs={ "proxies": { "http": PROXY_URL, "https": PROXY_URL, } }, ) docs = loader.load() ```
Performance Tips for LangChain + Proxies
LangChain agents can make many sequential web requests. Use ISP proxies for the lowest latency — their sub-50ms response time keeps agent chains fast. For broad web research that hits many different domains, residential rotating proxies provide the IP diversity needed to avoid blocks.
Hex Proxies' infrastructure handles 50 billion requests per week. Your LangChain agents will never be bottlenecked by proxy capacity.