v1.10.90-0e025b8
Skip to main content
TutorialRustCode

A Rust Proxy Pool Built on reqwest and tokio

12 min read

By Hex Proxies Engineering Team

A Rust Proxy Pool Built on reqwest and tokio

Rust is the right language when you want a scraper that saturates a 1 Gbps link on a $5 VPS without garbage-collection pauses. reqwest handles the HTTP layer, tokio provides the async runtime, and Arc<Mutex<T>> gives you the shared-mutable-state pattern everyone who wrote a Rust scraper eventually needs.

This guide builds a small proxy pool around those three primitives: shared pool state behind a tokio Mutex, per-proxy reqwest clients, retry with exponential backoff, circuit breaking, and graceful shutdown on Ctrl-C.

Cargo.toml

# Cargo.toml
[package]
name = "hex-proxy-pool"
version = "0.1.0"
edition = "2021"

[dependencies]
tokio = { version = "1.40", features = ["rt-multi-thread", "macros", "sync", "time", "signal"] }
reqwest = { version = "0.12", features = ["json", "gzip", "brotli", "rustls-tls"], default-features = false }
anyhow = "1.0"
thiserror = "1.0"
tracing = "0.1"
tracing-subscriber = "0.3"
rand = "0.8"

A couple of notes on the dependency choices. rustls-tls instead of native-tls means no OpenSSL at build time, cross-compiles cleanly, and gives deterministic TLS behavior across Linux distros. http2_adaptive_window(true) lets reqwest tune HTTP/2 flow control on the fly, which matters when you're multiplexing through a gateway.

Endpoint type

Each ProxyEntry owns its own reqwest::Client. Do not share one client across all proxies — Client holds a connection pool, and sharing means your 10 proxies fight over one pool.

use std::sync::Arc;
use std::time::{Duration, Instant};

use anyhow::{Context, Result};
use rand::seq::SliceRandom;
use reqwest::{Client, Proxy, Response};
use tokio::sync::Mutex;
use tokio::time::sleep;

#[derive(Debug, Clone)]
pub struct ProxyEndpoint {
    pub url: String,   // "http://user:pass@gate.hexproxies.com:7777"
    pub label: String,
}

#[derive(Debug)]
struct ProxyEntry {
    endpoint: ProxyEndpoint,
    client: Client,
    failures: u32,
    circuit_open_until: Option<Instant>,
}

impl ProxyEntry {
    fn new(ep: ProxyEndpoint) -> Result<Self> {
        let proxy = Proxy::all(&ep.url)
            .with_context(|| format!("invalid proxy url: {}", ep.url))?;
        let client = Client::builder()
            .proxy(proxy)
            .timeout(Duration::from_secs(20))
            .connect_timeout(Duration::from_secs(5))
            .pool_idle_timeout(Duration::from_secs(90))
            .pool_max_idle_per_host(25)
            .http2_adaptive_window(true)
            .build()?;
        Ok(Self { endpoint: ep, client, failures: 0, circuit_open_until: None })
    }

    fn available(&self, now: Instant) -> bool {
        self.circuit_open_until.map_or(true, |t| now >= t)
    }

    fn trip(&mut self, cooldown: Duration) {
        self.circuit_open_until = Some(Instant::now() + cooldown);
    }
}

The pool itself

The pool wraps Vec<ProxyEntry> in an Arc<Mutex> so many tokio tasks can share it. A tokio Mutex (not std::sync::Mutex) is required because we await inside the critical section — standard library mutexes cannot be held across await points.

#[derive(Debug)]
pub struct ProxyPool {
    entries: Arc<Mutex<Vec<ProxyEntry>>>,
}

impl ProxyPool {
    pub fn new(endpoints: Vec<ProxyEndpoint>) -> Result<Self> {
        if endpoints.is_empty() {
            anyhow::bail!("at least one endpoint required");
        }
        let entries = endpoints
            .into_iter()
            .map(ProxyEntry::new)
            .collect::<Result<Vec<_>>>()?;
        Ok(Self { entries: Arc::new(Mutex::new(entries)) })
    }

    async fn pick_client(&self) -> Option<(usize, Client)> {
        let entries = self.entries.lock().await;
        let now = Instant::now();
        let candidates: Vec<usize> = entries
            .iter()
            .enumerate()
            .filter(|(_, e)| e.available(now))
            .map(|(i, _)| i)
            .collect();
        let chosen = *candidates.choose(&mut rand::thread_rng())?;
        Some((chosen, entries[chosen].client.clone()))
    }

    pub async fn get(&self, url: &str, max_attempts: u32) -> Result<Response> {
        let mut last_err: Option<anyhow::Error> = None;
        for attempt in 1..=max_attempts {
            let Some((idx, client)) = self.pick_client().await else {
                sleep(Duration::from_secs(1)).await;
                continue;
            };
            match client.get(url).send().await {
                Ok(resp) if resp.status().is_success() => {
                    self.record_success(idx).await;
                    return Ok(resp);
                }
                Ok(resp) if resp.status().as_u16() == 429
                    || resp.status().is_server_error() =>
                {
                    let status = resp.status();
                    self.record_failure(idx).await;
                    last_err = Some(anyhow::anyhow!("retryable status {status}"));
                }
                Ok(resp) => {
                    // Non-retryable client error.
                    return Ok(resp);
                }
                Err(err) => {
                    self.record_failure(idx).await;
                    last_err = Some(err.into());
                }
            }
            let backoff = Duration::from_millis(200 * (1 << attempt.min(6)));
            sleep(backoff).await;
        }
        Err(last_err.unwrap_or_else(|| anyhow::anyhow!("no attempts made")))
    }

    async fn record_success(&self, idx: usize) {
        let mut entries = self.entries.lock().await;
        if let Some(e) = entries.get_mut(idx) {
            e.failures = 0;
            e.circuit_open_until = None;
        }
    }

    async fn record_failure(&self, idx: usize) {
        let mut entries = self.entries.lock().await;
        if let Some(e) = entries.get_mut(idx) {
            e.failures += 1;
            if e.failures >= 5 {
                e.trip(Duration::from_secs(30));
                tracing::warn!(label = %e.endpoint.label, "circuit tripped");
            }
        }
    }
}

One subtle thing: client.clone() on a reqwest::Client is cheap. The client is internally Arc-wrapped, so cloning only bumps a refcount. This lets us drop the pool mutex before making the actual HTTP call, which keeps the critical section tiny.

Driving the pool

The main function spawns 200 tasks and races them against a Ctrl-C signal. On shutdown, tokio::select drops the work future and outstanding requests are cancelled — this is the idiomatic Rust shutdown pattern and it is exactly what you want.

#[tokio::main]
async fn main() -> Result<()> {
    tracing_subscriber::fmt::init();

    let pool = Arc::new(ProxyPool::new(vec![
        ProxyEndpoint {
            url: "http://USER:PASS@gate.hexproxies.com:7777".into(),
            label: "hex-us".into(),
        },
        ProxyEndpoint {
            url: "http://USER:PASS@gate-eu.hexproxies.com:7777".into(),
            label: "hex-eu".into(),
        },
    ])?);

    // Graceful shutdown on Ctrl-C.
    let shutdown = tokio::signal::ctrl_c();
    let work = async {
        let mut handles = Vec::new();
        for i in 0..200 {
            let pool = Arc::clone(&pool);
            handles.push(tokio::spawn(async move {
                let url = format!("https://httpbin.org/ip?n={i}");
                match pool.get(&url, 4).await {
                    Ok(resp) => tracing::info!(i, status = %resp.status(), "ok"),
                    Err(err) => tracing::error!(i, %err, "failed"),
                }
            }));
        }
        for h in handles {
            let _ = h.await;
        }
    };

    tokio::select! {
        _ = work => {},
        _ = shutdown => {
            tracing::info!("shutdown signal received");
        }
    }
    Ok(())
}

When Rust is worth it

A Python httpx pool can do 2,000-5,000 req/s. A Go pool can do 15,000-25,000 req/s. A tuned Rust pool like this can do 50,000+ req/s on the same hardware, with 10x less memory. That only matters if your workload is actually that large. For most scrapers, Python is fine — reach for Rust when you have a multi-day job, a fleet of cloud VMs, or a strict latency budget.

The other Rust strength is deployment: a single static binary with no runtime. You drop it on a Hetzner box or into a distroless Docker image and it runs. See our distributed scraping pipeline guide for how this fits into a larger architecture.

Error handling

anyhow is used for internal error aggregation. For library code, replace it with thiserror-backed enums so callers can match on specific failure modes (timeout vs DNS vs auth). The pool in this guide is application-level code, so anyhow is appropriate.

Hex Proxies residential and ISP gateways work with reqwest unchanged — the Proxy::all() constructor accepts the gateway URL verbatim. Pricing.