Enterprise Reliability Without Enterprise Complexity
Uptime is not a feature. It is a prerequisite. Every minute of proxy downtime translates directly to missed data, failed automations, and lost revenue. Hex Proxies treats reliability as the foundational layer upon which every other feature is built, and we back that commitment with a 99.9% uptime SLA that includes financial guarantees.
What 99.9% Actually Means
A 99.9% monthly uptime target allows for a maximum of approximately 43 minutes of downtime per month. In practice, our average monthly uptime over the past 12 months has been 99.97%, which translates to less than 13 minutes of unplanned downtime per month. We achieve this through redundancy at every layer of the stack.
Infrastructure Redundancy
Our gateway servers run in active-active pairs in geographically separated data centers. Each pair shares session state through a distributed cache, so if one gateway fails, the surviving gateway serves all traffic without session loss. There is no primary or secondary distinction; both gateways handle production traffic simultaneously.
The routing layer uses a consensus-based distributed system. Routing decisions are made by a quorum of nodes, and the system tolerates the failure of any single routing node without degradation. Session affinity data is replicated across at least three routing nodes at all times.
Egress nodes (the IPs that carry your requests to their destination) are monitored individually. Each egress node is health-checked every 10 seconds with a synthetic request that validates connectivity, latency, and response correctness. Nodes that fail health checks are removed from the active pool immediately and only re-added after passing five consecutive checks.
Incident Response
Our network operations center operates 24/7 with dedicated on-call engineers. Automated alerts fire when any metric crosses predefined thresholds, including latency spikes, error rate increases, capacity utilization above 70%, and individual node failures. The mean time to detection for infrastructure issues is under 60 seconds, and the mean time to mitigation is under 5 minutes.
Maintenance Windows
We do not have scheduled maintenance windows. All infrastructure changes are deployed using blue-green or canary deployment patterns that eliminate downtime. When we need to replace hardware, we provision new capacity first, migrate traffic gradually, and then decommission the old hardware. Customers never need to plan around our maintenance schedule because there is none.
SLA Credits
If our uptime falls below 99.9% in any calendar month (measured from our internal monitoring, not customer reports), affected customers receive automatic service credits. Credits are applied to the next billing cycle without requiring a support ticket. We publish monthly uptime reports on our status page so customers can verify our performance independently.
Disaster Recovery
Our disaster recovery plan covers scenarios from individual server failures to complete data center outages. For single-server failures, failover is automatic and completes in under 500 milliseconds. For data center-level events, we maintain warm standby capacity in alternative facilities that can absorb full production traffic within 15 minutes. We test disaster recovery procedures quarterly with full-scale failover drills.
Monitoring Stack
We use a multi-layer monitoring approach. Network-level monitoring tracks BGP announcements, link utilization, and packet loss. Application-level monitoring tracks request success rates, latency distributions, and error classifications. Business-level monitoring tracks customer impact metrics like active sessions, requests per second, and SLA compliance in real time. All monitoring data is retained for 90 days and is available to customers through our dashboard.
Historical Performance
Over the past 24 months, we have experienced zero complete outages. The longest single incident was an 18-minute partial degradation in one geographic region caused by a carrier routing anomaly. Our systems detected the issue within 30 seconds and began rerouting traffic within 2 minutes. Full resolution, including carrier coordination, took 18 minutes.