CrewAI Proxy Configuration
CrewAI is a multi-agent orchestration framework where specialized AI agents collaborate to complete complex tasks. When agents need web access -- for research, data collection, or verification -- they require proxy infrastructure to avoid IP-based blocking on target websites.
CrewAI Agent Types That Need Proxies
In a typical CrewAI crew, several agent roles require web access:
- **Research agents** that search the web and gather information
- **Verification agents** that cross-check facts against live sources
- **Monitoring agents** that track website changes and updates
- **Data collection agents** that scrape structured data from web pages
These agents use tools like WebSearchTool, ScrapeWebsiteTool, and custom browsing tools that make HTTP requests to external websites. Without proxies, these requests originate from your server's IP, which gets rate-limited or blocked as the crew runs multiple tasks.
Configuring Proxy for CrewAI Tools
CrewAI tools that make web requests can be configured with proxy settings. The approach depends on which tools your agents use:
#### HTTP-Based Tools (requests/httpx)
from crewai import Agent, Task, Crew
from crewai_tools import ScrapeWebsiteTool# Set proxy environment variables for all HTTP requests os.environ["HTTP_PROXY"] = "http://user:pass@gate.hexproxies.com:8080" os.environ["HTTPS_PROXY"] = "http://user:pass@gate.hexproxies.com:8080"
# Create research agent with web scraping tool researcher = Agent( role="Senior Research Analyst", goal="Gather comprehensive data from web sources", tools=[ScrapeWebsiteTool()], verbose=True )
research_task = Task( description="Research current pricing for cloud hosting providers", agent=researcher, expected_output="A comparison table of pricing across major providers" )
crew = Crew( agents=[researcher], tasks=[research_task] )
result = crew.kickoff() ```
#### Browser-Based Tools
For agents that need browser automation (JavaScript rendering, interaction):
# Custom browser tool with proxy configuration class ProxiedBrowserTool(BrowserTool): def __init__(self): super().__init__( browser_config={ "proxy": { "server": "http://gate.hexproxies.com:8080", "username": "user-country-us", "password": "your-password" } } )
researcher = Agent( role="Web Research Specialist", goal="Extract data from JavaScript-heavy websites", tools=[ProxiedBrowserTool()], ) ```
Multi-Agent Proxy Strategy
Different agents in a CrewAI crew may need different proxy configurations:
# Research agent: rotating residential proxies for broad web accessresearcher = Agent( role="Researcher", goal="Gather information from diverse web sources", tools=[ScrapeWebsiteTool()], )
# Verification agent: geo-targeted proxies for location-specific data verifier = Agent( role="Data Verifier", goal="Verify pricing data from specific geographic markets", tools=[ProxiedBrowserTool(country="gb")], )
# Writer agent: no proxy needed (no web access) writer = Agent( role="Report Writer", goal="Compile findings into a structured report", tools=[], ) ```
Scaling CrewAI with Proxies
For production CrewAI deployments running multiple crews simultaneously:
- **Isolate proxy sessions per crew**: Each crew should use a unique session to prevent IP contamination between unrelated research tasks.
2. **Implement bandwidth limits**: Research agents can follow links extensively. Set per-crew bandwidth budgets to prevent runaway costs.
3. **Use rotating proxies by default**: Unless an agent needs session persistence, per-request rotation provides the highest success rates across diverse websites.
4. **Monitor per-agent proxy usage**: Track which agents consume the most bandwidth and optimize their tools accordingly.
Error Handling in CrewAI
Configure agents to handle proxy-related failures:
researcher = Agent(
role="Resilient Researcher",
goal="Gather data with fallback strategies",
backstory="""You are an expert researcher. If a website blocks your access,
try an alternative source or report the block. Never retry the same
blocked URL more than twice.""",
tools=[ScrapeWebsiteTool()],
max_retry_limit=3,
)The LLM agent will learn to adapt its browsing strategy based on access failures, choosing alternative sources when a target site blocks requests.