The State of Anti-Bot Detection in 2026: What Changed and What Works
Anti-bot detection in 2026 operates on fundamentals that would be unrecognizable to someone last configuring Selenium in 2023. The arms race between scrapers and protection vendors has moved past IP-level blocking into protocol fingerprinting, hardware attestation, and behavioral modeling. Understanding what changed -- and what actually works against modern systems -- requires examining the detection stack layer by layer.
This analysis is based on Hex Proxies internal testing across 200+ protected sites from January through April 2026, cross-referenced with published research from major anti-bot vendors and academic papers on bot detection (source: Hex Proxies internal testing, April 2026).
The 2026 Detection Stack
Modern anti-bot systems operate as a layered pipeline. A request must pass every layer to reach the origin server. Failing at any layer triggers a block or challenge.
Request arrives
│
▼
┌─────────────────────────┐
│ Layer 1: Network │ IP reputation, ASN classification,
│ Intelligence │ geo-consistency, connection metadata
│ │
│ Blocks: ~15% of bots │
└────────────┬────────────┘
│ Pass
▼
┌─────────────────────────┐
│ Layer 2: Protocol │ TLS fingerprint (JA4+), HTTP/2 frame
│ Fingerprinting │ ordering, ALPN negotiation, cipher
│ │ suite analysis
│ Blocks: ~25% of bots │
└────────────┬────────────┘
│ Pass
▼
┌─────────────────────────┐
│ Layer 3: Browser │ JavaScript execution environment,
│ Environment Analysis │ Canvas/WebGL fingerprint, API
│ │ presence, DOM property consistency
│ Blocks: ~30% of bots │
└────────────┬────────────┘
│ Pass
▼
┌─────────────────────────┐
│ Layer 4: Behavioral │ Mouse movement patterns, scroll
│ Biometrics │ velocity, keystroke timing, session
│ │ navigation patterns
│ Blocks: ~20% of bots │
└────────────┬────────────┘
│ Pass
▼
┌─────────────────────────┐
│ Layer 5: Hardware │ Device attestation tokens (Apple
│ Attestation (Emerging) │ Private Access Tokens, Android
│ │ integrity checks)
│ Blocks: ~10% of bots │
└────────────┬────────────┘
│ Pass
▼
Origin server reached
The percentages represent the proportion of bot traffic each layer catches that passed the previous layers (source: Hex Proxies internal testing against Cloudflare Bot Management, Akamai Bot Manager, and PerimeterX/HUMAN, January-April 2026). The cumulative effect means that a scraper failing at even one layer gets blocked.
Layer 1: Network Intelligence in 2026
What Changed
IP reputation databases became dramatically more granular in 2025-2026. The major shift: anti-bot vendors now classify IPs not just by whether they are "residential" or "datacenter," but by behavioral history at the individual IP level.
ASN-level scoring. Every Autonomous System Number (the network block an IP belongs to) now carries a bot probability score. An IP from a clean residential ISP ASN starts with a trust score of 90+. The same request from a known hosting ASN starts at 20. This is not new, but the granularity increased -- anti-bot vendors now track subnet-level (/24) reputation, not just ASN-level.
Cross-site correlation. Cloudflare, Akamai, and HUMAN (formerly PerimeterX) all operate as reverse proxies for millions of sites. They share threat intelligence across their customer base. If an IP scrapes aggressively on one Cloudflare site, every other Cloudflare site sees that IP's reputation drop within minutes. This is the single biggest change from 2024 -- IP reputation is now effectively global and real-time.
Connection metadata analysis. Beyond the IP itself, detection systems examine TCP characteristics: initial window size, MSS (Maximum Segment Size), TTL (Time to Live) values, and TCP option ordering. These vary by operating system and network stack. A connection claiming to be from a Windows Chrome browser but showing Linux TCP characteristics triggers an anomaly signal.
What Works
Clean residential IPs with low request volume. The most effective strategy against network-layer detection remains using genuine residential IPs with disciplined request rates. ISP proxies (static IPs from residential ASNs) are particularly effective because they maintain consistent reputation -- the IP is yours for the duration, so its history is your history.
IP diversity across subnets. Using 100 IPs from the same /24 subnet is almost as detectable as using one IP. Modern detection flags when multiple IPs from the same subnet exhibit scraping behavior simultaneously. Distribute requests across diverse subnets and ASNs. See our IP pool diversity page for how Hex Proxies sources IPs across 1,400+ ASNs.
Geo-consistency. If your request claims to come from London (via Accept-Language and timezone headers) but the IP geolocates to Brazil, detection systems flag the mismatch. Always match your proxy location to the locale your scraper presents.
Layer 2: Protocol Fingerprinting
TLS Fingerprinting: JA3 Is Dead, JA4+ Is Standard
TLS fingerprinting identifies the client software based on how it negotiates the TLS handshake. The original JA3 fingerprint (introduced 2017) hashed the cipher suites, extensions, and supported groups from the Client Hello message. By 2025, JA3 was largely obsoleted by:
- JA4+ (2024): A modular fingerprinting system that generates separate hashes for TLS, HTTP, and TCP characteristics. JA4 is now the standard across Cloudflare, Akamai, and Fastly.
- Extension ordering matters. JA3 sorted extensions, losing ordering information. JA4 preserves extension order, which differs between browser versions and HTTP client libraries.
- GREASE values. Modern browsers inject random GREASE (Generate Random Extensions And Sustain Extensibility) values into their TLS handshakes. These values change between connections but follow browser-specific patterns. Python's
requestslibrary does not generate GREASE values at all -- an immediate signal that the client is not a browser.
Chrome 124 (genuine):
JA4: t13d1517h2_8daaf6152771_b0da82dd1658
- TLS 1.3, 15 ciphers, 17 extensions, HTTP/2
- GREASE values in cipher list and extensions
- ALPS extension present
- Compressed certificate support
Python requests (urllib3/OpenSSL):
JA4: t13d0912h1_fcb2b523e794_3c3857d7b627
- TLS 1.3, 9 ciphers, 12 extensions, HTTP/1.1
- No GREASE values
- No ALPS extension
- Different extension ordering
Detection systems maintain a database of known JA4 fingerprints mapped to client types. A request claiming User-Agent: Chrome/124 but presenting a Python JA4 fingerprint is immediately flagged.
HTTP/2 Frame Analysis
HTTP/2 introduced binary framing, and different HTTP clients construct their frames differently. In 2026, anti-bot systems analyze:
SETTINGS frame parameters. When an HTTP/2 connection opens, the client sends a SETTINGS frame declaring its preferences. Different clients declare different values:
| Parameter | Chrome 124 | Firefox 125 | curl | Python httpx |
|---|---|---|---|---|
| HEADER_TABLE_SIZE | 65536 | 65536 | 4096 | 4096 |
| MAX_CONCURRENT_STREAMS | 1000 | (not sent) | 100 | 100 |
| INITIAL_WINDOW_SIZE | 6291456 | 131072 | 65535 | 65535 |
| MAX_HEADER_LIST_SIZE | 262144 | (not sent) | (not sent) | (not sent) |
| ENABLE_PUSH | (not sent) | 0 | (not sent) | 0 |
:method, :authority, :scheme, :path) can appear in any order. Browsers use a consistent order that differs from most HTTP client libraries. Chrome sends :method, :authority, :scheme, :path. Python's httpx sends :method, :path, :scheme, :authority.
Priority frames (HTTP/2) and priority signals (HTTP/3). Chrome sends PRIORITY frames for resource prioritization; most scraping tools do not.
What Works
Use browser automation with real browser TLS stacks. Playwright, Puppeteer, and Selenium drive actual Chrome or Firefox processes, producing genuine TLS and HTTP/2 fingerprints. For high-value targets with protocol fingerprinting, headless browsers are now a necessity, not an optimization.
TLS fingerprint impersonation libraries. Libraries like curl-impersonate, tls-client (Go), and cycletls (Node.js) modify the TLS handshake to match specific browser fingerprints. These are effective against JA4-only detection but fail against systems that cross-reference JA4 with HTTP/2 frame analysis.
Proxy protocol has no impact. Whether you use HTTP CONNECT or SOCKS5, the TLS fingerprint is generated by your client, not the proxy. Switching proxy protocols does not help with TLS detection. See our protocol comparison post for details on what proxy protocols actually affect.
Layer 3: Browser Environment Analysis
JavaScript Execution Environment
Anti-bot systems inject JavaScript into the page (typically via a first-party script or a script served from the anti-bot vendor's domain) that interrogates the browser environment. In 2026, the checks go far beyond navigator.webdriver:
Execution environment integrity checks:
// Simplified version of what detection scripts check
// (based on deobfuscated Cloudflare Turnstile and HUMAN scripts)
// 1. WebDriver detection (basic -- caught by headless browsers since 2022)
navigator.webdriver === true
// 2. Chrome DevTools Protocol detection
window.cdc_adoQpoasnfa76pfcZLmcfl_Array // CDP signature
window.cdc_adoQpoasnfa76pfcZLmcfl_Promise
// 3. Automation framework artifacts
window.__selenium_unwrapped !== undefined
window.__webdriver_evaluate !== undefined
window.__driver_evaluate !== undefined
document.__webdriver_script_fn !== undefined
// 4. Browser API consistency (2026 focus)
// Real Chrome has thousands of native API implementations.
// Detection scripts check that API prototypes have not been
// modified and that toString() returns "[native code]"
Notification.permission // headless browsers often differ
navigator.permissions.query({name: "notifications"})
// 5. Plugin and media device enumeration
navigator.plugins.length > 0 // headless Chrome returns 0
navigator.mediaDevices.enumerateDevices() // empty in headless
// 6. Canvas and WebGL fingerprinting
// Render a specific scene via Canvas 2D and WebGL,
// hash the pixel output. Headless browsers produce
// different renders than headed browsers due to GPU
// differences or software rendering.
The 2026 problem with headless browsers: Playwright and Puppeteer can now pass most individual checks above. But detection systems do not check them individually -- they build a composite "environment consistency score." A browser that passes navigator.webdriver but has 0 plugins, no media devices, and software-rendered Canvas is obviously automated, even if no single check fails.
What Works
Headed browser automation. Running Playwright or Puppeteer in headed mode (with a visible browser window) on a real desktop or VPS with a GPU produces an environment that is significantly harder to distinguish from a real user. The --headless=new flag in Chrome 124+ is better than old headless mode but still detectable.
Browser profile management. Maintain persistent browser profiles with cookies, local storage, and browsing history across sessions. Detection systems check for "blank slate" indicators -- a browser with no cookies, no history, and no cached data visiting a complex web application is suspicious.
Patched browsers. Projects like undetected-chromedriver and playwright-stealth patch known detection vectors. These work against basic detection but require constant updates as anti-bot vendors discover and fingerprint the patches themselves.
Layer 4: Behavioral Biometrics
The Biggest Shift in 2026
Behavioral analysis became the primary detection layer for sophisticated anti-bot systems in 2026. The logic: even if a bot perfectly impersonates a browser's technical fingerprint, it cannot perfectly impersonate a human's behavior.
Mouse movement analysis. Detection scripts track mouse cursor position at 60+ samples per second and analyze:
- Movement velocity and acceleration curves (humans produce Bezier-like curves; bots produce linear movements)
- Micro-movements and tremor (humans cannot hold a cursor perfectly still)
- Movement-to-click timing (humans decelerate before clicking)
- Hover patterns over interactive elements
Scroll behavior. Human scrolling exhibits variable velocity, momentum, and occasional reversals. Programmatic scrolling is uniform.
Session navigation patterns. Detection systems build a model of expected user behavior:
- Time on page follows a log-normal distribution for real users
- Real users visit multiple pages per session in predictable patterns
- Bots tend to access deep URLs directly without visiting the homepage or category pages first
Keystroke dynamics (for sites with forms). Typing speed, inter-key intervals, and key-press duration vary in characteristic patterns for humans.
What Works
Realistic behavior injection. The most effective approach is injecting human-like behavior into automated sessions:
import random
import math
import time
def human_like_mouse_move(page, target_x, target_y, steps=25):
"""Move mouse along a curved path with human-like characteristics.
Uses a Bezier curve with randomized control points and
variable speed (accelerate then decelerate).
"""
current = page.evaluate("() => ({x: 0, y: 0})")
start_x, start_y = current["x"], current["y"]
# Generate Bezier control points with randomness
ctrl1_x = start_x + (target_x - start_x) * random.uniform(0.2, 0.5)
ctrl1_y = start_y + (target_y - start_y) * random.uniform(-0.3, 0.3)
ctrl2_x = start_x + (target_x - start_x) * random.uniform(0.5, 0.8)
ctrl2_y = target_y + (target_y - start_y) * random.uniform(-0.3, 0.3)
points = []
for i in range(steps + 1):
t = i / steps
# Ease-in-out timing (slow start, fast middle, slow end)
t_eased = t * t * (3 - 2 * t)
# Cubic Bezier interpolation
x = (
(1 - t_eased) ** 3 * start_x
+ 3 * (1 - t_eased) ** 2 * t_eased * ctrl1_x
+ 3 * (1 - t_eased) * t_eased ** 2 * ctrl2_x
+ t_eased ** 3 * target_x
)
y = (
(1 - t_eased) ** 3 * start_y
+ 3 * (1 - t_eased) ** 2 * t_eased * ctrl1_y
+ 3 * (1 - t_eased) * t_eased ** 2 * ctrl2_y
+ t_eased ** 3 * target_y
)
# Add micro-jitter (human hand tremor)
jitter_x = random.gauss(0, 0.5)
jitter_y = random.gauss(0, 0.5)
points.append((x + jitter_x, y + jitter_y))
for px, py in points:
page.mouse.move(px, py)
# Variable delay between movements (faster in middle)
time.sleep(random.uniform(0.005, 0.02))
return points
Rate discipline over speed. The single most effective behavioral strategy is slowing down. A scraper making 1 request every 3-5 seconds with realistic session patterns achieves higher long-term success rates than one making 10 requests per second that gets blocked after 50 requests. This is where proxy cost matters -- using premium residential IPs at disciplined rates costs less per successful request than burning through cheap IPs at aggressive rates.
Session warmup. Before scraping target pages, visit the homepage, accept cookies, and browse a few category pages. This establishes a "normal" session pattern that behavioral models expect.
Layer 5: Hardware Attestation (Emerging)
Private Access Tokens
Apple introduced Private Access Tokens in iOS 16/macOS Ventura, and Cloudflare adopted them for bot detection. The mechanism:
- A website requests a token from the client
- The client's operating system generates a cryptographic token signed by the device manufacturer (Apple, Google)
- The token proves the request comes from a genuine device without revealing the user's identity
- The website verifies the token's signature against the manufacturer's public key
Current Impact
As of April 2026, hardware attestation is used sparingly:
- Cloudflare offers it as an option; few sites require it
- Apple's Safari browser passes tokens automatically
- Chrome on Android is beginning to support a similar mechanism
- No website we tested required hardware tokens for all traffic
Our assessment: Hardware attestation will become a significant factor by 2027-2028 but is not yet a blocking issue for most scraping workloads. Monitor Cloudflare's deployment pace.
Success Rates by Protection Level: April 2026
We tested Hex Proxies residential and ISP products against sites grouped by protection level (source: Hex Proxies internal testing, 10,000+ requests per category, April 2026):
| Protection Level | Example Systems | Residential Success Rate | ISP Success Rate |
|---|---|---|---|
| None / Basic WAF | Simple rate limiting, IP blocking | 99.2% | 99.5% |
| Cloudflare Free | JS challenge, basic bot score | 96.8% | 97.3% |
| Cloudflare Pro/Business | Managed challenge, bot score threshold | 91.4% | 93.1% |
| Cloudflare Enterprise + Bot Management | Full behavioral analysis | 82.7% | 85.2% |
| Akamai Bot Manager | Sensor data, behavioral modeling | 80.3% | 83.0% |
| HUMAN (PerimeterX) | Advanced behavioral biometrics | 78.9% | 81.5% |
| DataDome | ML-based real-time detection | 79.5% | 82.1% |
Practical Strategy Recommendations
For Targets with Basic Protection (Cloudflare Free, Simple WAFs)
- Use rotating residential proxies with per-request rotation
- Standard HTTP client libraries (
requests,axios) are sufficient - Rate limit to 1-2 requests per second per IP
- Success rate expectation: 95%+
For Targets with Advanced Protection (Cloudflare Enterprise, Akamai, HUMAN)
- Use ISP proxies with sticky sessions (same IP for the session)
- Use headless browser automation (Playwright or Puppeteer)
- Apply TLS fingerprint impersonation matching the browser
- Inject human-like mouse movement and scroll behavior
- Rate limit to 1 request every 3-5 seconds
- Warm up sessions before accessing target pages
- Success rate expectation: 80-90%
For Maximum-Security Targets
- Use headed browser automation on real desktop/VPS with GPU
- Maintain persistent browser profiles across sessions
- Use ISP proxies with long sticky sessions (30+ minutes)
- Implement full behavioral simulation (mouse, scroll, navigation)
- Accept lower throughput (1 request per 5-10 seconds)
- Success rate expectation: 70-85%
What to Expect in Late 2026 and Beyond
Based on current trajectories, we expect:
- HTTP/3 fingerprinting will mature. As QUIC adoption increases, anti-bot vendors will build detection around QUIC transport parameters, just as they did for TLS and HTTP/2.
- Behavioral models will use longer observation windows. Current systems analyze single sessions. The next generation will correlate behavior across sessions, days, and sites.
- Hardware attestation will expand beyond Apple. Google's Android integrity API and potential desktop attestation will narrow the options for server-side scraping.
- AI-generated behavior will improve. Scraping tools will use generative models to produce more realistic human-like interactions, and detection systems will use adversarial models to detect synthetic behavior.
Frequently Asked Questions
Does using SOCKS5 instead of HTTP proxies help avoid detection?
No. The proxy protocol is invisible to the target site's anti-bot system. Detection operates on the IP reputation, TLS fingerprint, browser environment, and behavior -- none of which are affected by the proxy protocol. See our protocol comparison for what each protocol actually affects.
Can residential proxies bypass all anti-bot systems?
No proxy type bypasses all detection. Residential proxies provide clean IP reputation (Layer 1), but modern detection operates across five layers. You still need appropriate TLS fingerprints, browser environments, and behavioral patterns. Residential proxies are necessary but not sufficient.
How often do anti-bot systems update their detection?
Cloudflare updates its bot detection models continuously -- some rule updates deploy multiple times per day. Major detection logic changes (new fingerprinting techniques, new behavioral models) typically roll out quarterly. This is why scraping solutions require ongoing maintenance.
Is web scraping legal in 2026?
Web scraping of publicly available data is legal in most jurisdictions, but the legal landscape varies. The 2022 hiQ v. LinkedIn decision affirmed that scraping public data does not violate the CFAA in the US. However, scraping personal data may implicate GDPR in the EU. See our compliance guide for detailed legal analysis.
Understanding the detection stack is the first step to building scraping infrastructure that works reliably. Hex Proxies provides the IP layer -- clean residential and ISP proxies across 1,400+ ASNs with anti-detection technology built in. Residential proxies start at $4.25/GB; ISP proxies start at $2.08/IP. Explore proxy plans.