v1.10.82-f67ee7d
Skip to main content
← Back to Code Snippets

Go Colly Proxy

Complete Colly proxy integration example for Go with Hex Proxies. Includes authentication, rotation, and error handling.

GoColly
Install:go get github.com/gocolly/colly/v2
Go / Colly
package main

import (
	"fmt"
	"log"

	"github.com/gocolly/colly/v2"
)

func main() {
	c := colly.NewCollector()

	err := c.SetProxy("http://user:pass@gate.hexproxies.com:8080")
	if err != nil {
		log.Fatal("Failed to set proxy:", err)
	}

	c.SetRequestTimeout(30 * time.Second)

	c.OnResponse(func(r *colly.Response) {
		fmt.Println("Status:", r.StatusCode)
		fmt.Println("Body:", string(r.Body))
	})

	c.OnError(func(r *colly.Response, err error) {
		log.Println("Request failed:", err)
	})

	c.Visit("https://httpbin.org/ip")
}

Why Colly for Proxy Work

Colly is the most popular web scraping framework in the Go ecosystem, combining Go's raw performance with a developer-friendly callback-based API inspired by jQuery and Scrapy. For proxy-driven scraping, Colly provides built-in proxy rotation, request queuing, rate limiting, and robots.txt compliance, all features that would require significant custom code with raw `net/http`. Colly's scraping speed is unmatched in its class: a single Colly process can crawl thousands of pages per minute through gate.hexproxies.com:8080 while maintaining clean separation between navigation logic, data extraction, and error handling.

Colly's architecture is event-driven through a collector pattern. You register callbacks for HTML elements, response events, and errors, and Colly manages the request lifecycle. This design is ideal for proxy scraping because proxy failures, retries, and rotation can be handled in dedicated callbacks without cluttering your parsing logic. The collector also supports recursive crawling with automatic deduplication, so you can point it at a website root and let it discover and proxy-route all linked pages automatically.

Configuration Patterns

Colly offers two proxy configuration approaches. The simple method uses `c.SetProxy()` for a single static proxy, which is sufficient when Hex Proxies handles rotation at the gateway level. The advanced method uses `c.SetProxyFunc()` with a custom function that returns a different proxy URL for each request, enabling client-side rotation logic or geographic proxy selection based on the target URL.

Colly's `LimitRule` system controls per-domain request rates with `DomainGlob`, `Parallelism`, `Delay`, and `RandomDelay` fields. Set `Parallelism` to the number of concurrent requests per domain and `Delay` to the minimum gap between requests. These rules apply after proxy routing, so they control how aggressively your scraper hits each target domain through the proxy, not how many proxy connections you open overall.

Common Pitfalls

Colly's collector cloning (`c.Clone()`) creates a new collector that inherits the parent's configuration, including proxy settings. However, cloned collectors do not share visited URL state, which means the same URL can be scraped multiple times across clones. When using cloned collectors for parallel domain scraping through proxies, implement external deduplication using a shared set or database to avoid wasting proxy requests on duplicate pages.

A subtle issue arises when Colly's built-in async mode interacts with proxy rate limits. Enabling `c.Async = true` allows Visit calls to return immediately and execute in parallel, but without a `LimitRule`, this can fire hundreds of simultaneous requests through the proxy. Always define limit rules before enabling async mode, and use `c.Wait()` to block until all queued requests complete before exiting the program.

Performance Optimization

Colly's request queue can be backed by an in-memory store (default) or a persistent store using the `queue` package. For large-scale proxy crawls spanning millions of URLs, switch to a Redis-backed or SQLite-backed queue so your crawl survives process restarts without re-fetching already-visited pages. This saves proxy bandwidth and prevents redundant requests when a crawl takes hours or days.

Enable Colly's caching layer with `colly.CacheDir()` to store raw HTML responses on disk. When developing parsing logic, you can iterate on your extraction code using cached responses without routing new requests through the proxy. This development workflow saves significant proxy usage during the extraction tuning phase. Once your parser is stable, disable caching for production crawls. Profile your Colly scraper with Go's `pprof` tool to identify whether bottlenecks are in network I/O (proxy latency), HTML parsing (CPU-bound), or pipeline processing (data handling).

Tips

  • 1
    Colly supports proxy rotation via a custom ProxyFunc for round-robin selection.
  • 2
    Use LimitRule to throttle requests and stay within proxy rate limits.
  • 3
    Set OnError callbacks to log and handle proxy failures gracefully.

Ready to Integrate?

Get proxy credentials and start coding in minutes.