Why Typhoeus for Proxy Work
Typhoeus is built on libcurl and its multi interface, which handles parallel HTTP connections at the C level. Where HTTParty and Net::HTTP are single-threaded, Typhoeus's Hydra runs up to 200 concurrent requests through proxies with minimal Ruby thread overhead.
For proxy scraping at scale in Ruby, Typhoeus is the clear winner. A single Hydra can process thousands of proxied requests per minute, with libcurl handling connection pooling, DNS caching, and keep-alive transparently.
Typhoeus-Specific Proxy Features
Single request proxy: `Typhoeus.get(url, proxy: 'http://user:pass@host:port')`. The `proxy` option accepts full URLs with embedded credentials. Typhoeus handles CONNECT tunneling for HTTPS targets automatically.
Hydra parallel requests all share the proxy configuration. Queue requests with `hydra.queue(request)` and run them with `hydra.run`. libcurl multiplexes connections to the proxy, reusing TCP sockets across requests.
Typhoeus exposes libcurl options directly. Set `proxytype: :http` or `proxytype: :socks5` for protocol selection. Use `ssl_verifypeer: true` and `ssl_verifyhost: 2` for secure proxy tunnels.
Common Pitfalls with Typhoeus
Typhoeus requires the libcurl development headers to compile its native extension. On Docker Alpine, install `curl-dev` before bundling. Missing headers cause gem installation failures that look unrelated to Typhoeus.
Hydra's default max_concurrency is 200. Setting it higher than your proxy plan's concurrency limit causes connection failures. Match Hydra concurrency to your plan: `Typhoeus::Hydra.new(max_concurrency: 50)`.
Response callbacks run synchronously between batches. If your callback does heavy processing, it blocks the next batch of proxy requests. Keep callbacks lightweight and defer processing to a background queue.
Advanced Configuration
Use `on_complete` callbacks with status code checking for proxy-aware error handling: `request.on_complete { |response| retry_queue << url if response.code == 429 }`. Build a retry queue that re-queues failed URLs with different proxy sessions.
Set `connecttimeout: 10` and `timeout: 30` per request. Typhoeus passes these to libcurl, which enforces them at the socket level. This is more reliable than Ruby-level timeouts for proxy connections.
Performance Tuning for Typhoeus
libcurl's DNS cache is per-handle. When using Hydra, DNS results are shared across queued requests, reducing DNS lookups through the proxy. For even better performance, set `dns_cache_timeout: 300` to cache DNS results for 5 minutes.
Use `followlocation: false` and handle redirects in your callback if you need to track which proxied redirects occur. Following redirects automatically can mask proxy routing issues.