Redundancy and Failover: Surviving the Outages That Eventually Happen
Every system fails eventually. Trading systems that survive failures keep operating; ones that don't survive lose money on every outage. Designing for failure is what separates the two.
Every system fails eventually. Internet drops. Exchanges go down. Power fails. APIs throw errors. Bots crash. The question isn't whether failures will happen, it's whether your trading setup is designed to handle them. The traders who lose the least to outages are the ones who pre-build redundancy and failover. The traders who lose the most are the ones who learned the hard way.
What "redundancy" actually means in trading
Redundancy: having backup systems that can take over when primary systems fail. The goal is no single point of failure, every critical function has at least one backup.
For trading specifically:
1. Network redundancy. If your primary internet fails, you have a backup connection. Mobile hotspot from a different carrier is the cheapest version.
2. Power redundancy. UPS for outages; backup generator for extended outages. Most retail doesn't need extensive power redundancy unless your area is prone to outages.
3. Device redundancy. If your laptop fails, you can access positions from your phone. Multiple devices with exchange access provide redundancy.
4. Exchange redundancy. For traders concerned about exchange-specific failures, distributing positions across multiple exchanges reduces single- exchange risk.
5. Bot/system redundancy. For automated systems, backup bots that can take over if the primary fails. More complex; not needed for most retail.
The level of redundancy should match the consequences of failure for your specific situation. Active leveraged traders need more redundancy than swing traders.
Common failure modes
Real failures that happen in crypto trading:
Internet outage. Your home internet goes down for a few hours. If you have active leveraged positions, they're moving without your ability to manage. Mobile hotspot fixes most of these.
Power outage. Your area loses power for hours or days. Your devices (laptop, router) are off. Mobile phone (with cell signal) is your only access.
Device failure. Your laptop fails (battery dies, hardware fails, OS issues). Backup device takes over.
Exchange outage. The exchange itself becomes unresponsive. You can't manage positions on that exchange even if your devices work fine.
API outage. The exchange's API stops responding even if the web UI works. Affects automated systems; manual fallback through web UI may work.
Network routing issues. Some path between you and the exchange has issues; web UI might work but with high latency. Trading is degraded but not blocked.
Wallet / browser issues. For DEX trading, wallet or browser extension can have issues that prevent trades.
Each requires different mitigation. Most can be handled with modest pre-planning.
Redundancy for manual traders
For traders who execute manually:
1. Backup access device. Phone with exchange app, separate from primary laptop/desktop. Verify it works before you need it.
2. Backup network connection. Mobile hotspot from a different carrier than your home internet. Test it works.
3. Account access redundancy. 2FA backup codes printed and stored safely. If your phone is lost, you can still recover account access via the codes.
4. Pre-defined emergency procedures. "If I lose access during volatile conditions, my default is to..." Pre-decided. The decision is made in calm conditions, not in panic.
These are the basics. Total cost: low ($30/month for backup connection, free for the rest). Total benefit: prevents most manual-trading outage disasters.
Redundancy for automated systems
For traders running bots:
1. Backup VPS / hosting. Primary VPS hosts the bot; backup VPS in a different data center is ready to take over. Failover can be automatic (load balancer) or manual (you start the backup if primary fails).
2. Multi-region deployment. Critical bots deployed across multiple geographic regions. Outage in one region doesn't kill the bot.
3. Database redundancy. If the bot stores state (open positions, recent trades), the storage should be backed up regularly. Restore procedures should be tested.
4. Code repository backup. Source code in version control with backups (GitHub, GitLab, etc.). If your local copy is lost, you can restore.
5. Automatic failover logic. If the bot detects it can't reach the exchange, it should fail safely (close positions, alert you, shut down) rather than continuing in a degraded state.
6. Exchange-side fallbacks. For positions the bot manages, having pre-set stop orders on the exchange itself (not just in bot logic) means even if the bot fails, the exchange will still execute stops. Defense in depth.
For serious algo trading, this level of redundancy is necessary. For casual automation, simpler redundancy is fine.
A common mistake: testing failover only when it matters
A trader has redundancy in theory. They've never tested it. When primary fails, they discover the failover doesn't actually work, wrong configuration, expired credentials, untested assumptions.
The fix: test failover regularly. Once a quarter, deliberately fail the primary system to verify the backup works. Test in low-stakes conditions. The test exposes problems while you can fix them calmly.
A common mistake: redundancy with same single points of failure
A trader has two laptops as backup. Both connect through the same router. The router fails. Both laptops are useless.
A trader has primary + backup VPS. Both with the same hosting provider. Provider has outage. Both VPSes go down.
The fix: redundancy must be across independent failure domains. Different routers, different hosting providers, different geographic regions. Otherwise you're paying for backup that fails together with primary.
A common mistake: too much redundancy for the actual risk
A casual swing trader builds enterprise-grade redundancy: multiple data center deployments, custom failover orchestration, etc. They never use any of it because their trading doesn't need it. The investment was wasted.
The fix: match redundancy to consequences. Casual swing trading needs basic redundancy (backup device, backup connection). Active automated trading at scale needs more. Don't over-engineer for a problem you don't have.
A common mistake: not having pre-defined procedures
The system fails. The trader didn't pre-decide what to do. Under stress, they make poor decisions. Maybe wrong device, maybe wrong exit prices, maybe panic action.
The fix: write down your procedures. "If X fails, my response is: 1. ..., 2. ..., 3. ..." The written procedure removes decision-making from the panic moment. You execute the pre-decided sequence rather than improvising.
A common mistake: ignoring exchange-side redundancy
A trader runs sophisticated infrastructure redundancy but has no exchange-side stops. The bot manages stops in its own logic. Bot fails. Position runs without protection.
The fix: every position should have exchange-side stops as a final safety net. Even sophisticated bots should set stops on the exchange itself. The exchange becomes the failover when your other systems fail.
A common mistake: redundancy without monitoring
A trader has backup systems but no monitoring of whether they're functional. The backups silently fail (expired credentials, broken configurations). When primary fails, backup also fails because no one was watching it.
The fix: monitor the redundancy. Backup systems should have their own health checks. If the backup is broken, you should know before primary fails.
The pragmatic redundancy stack
For most retail active traders:
Tier 1 (essential):
- Phone with exchange app (backup access)
- Mobile hotspot capability (backup connection)
- 2FA backup codes printed and stored
- Exchange-side stops on all positions
Tier 2 (active traders):
- Wired primary connection (more reliable than wifi)
- UPS for trading machine
- Pre-written emergency procedures
- Periodic failover testing
Tier 3 (algo traders):
- Backup VPS in different region
- Multi-region deployment for critical bots
- Database backups with tested restore
- Comprehensive monitoring of all redundancy
- Automatic safe-mode if primary detects failures
Each tier costs more but addresses larger risks. Don't jump tiers; add as your trading scale and sophistication justify.
Mental model, redundancy as insurance for operational failure
Insurance is something you hope you never need. You pay for it constantly. When you need it, you're glad you had it. The cost-benefit is favorable across many years even though most of those years the insurance isn't used.
Redundancy in trading is operational insurance. You pay for it in setup time and ongoing complexity. Most days, it's not used. The day you need it (outage, failure, disaster), it's the difference between bounded damage and catastrophic damage.
The cost-benefit is favorable for most active traders. Don't skip the basics.
Why this matters for trading
Failures in trading don't politely wait for you to be ready. They happen during active trades, often during volatile conditions, often when you're least prepared. Pre-built redundancy keeps you operational during failures. Hex37's infrastructure handles platform-side reliability; your client-side redundancy (devices, connections, procedures) is what keeps you operational from your end.
Takeaway
Failures happen, internet drops, power fails, exchanges go down, bots crash. Redundancy is backup systems that take over when primary fails. Match redundancy to consequences: casual traders need basic redundancy (backup device, backup connection); active traders need more (UPS, wired network, procedures); algo traders need sophisticated multi-region failover. Test redundancy regularly, untested redundancy fails when needed. Avoid common single points of failure across "redundant" systems. Always have exchange- side stops as final safety net. Pre-write procedures for failure scenarios.
Related chapters
- Execution Systems7 min read
Monitoring and Alerts: How to Notice When Something's Wrong Before It Costs You
Effective monitoring catches problems early. The right alerts prevent disasters; too many alerts produce noise. Designing the system is what makes monitoring actually useful.
Read chapter - Execution Systems7 min read
Trading Infrastructure: What You Actually Need (And What You Don't)
Pro traders run elaborate infrastructure. Most of it isn't needed at retail scale. Knowing what infrastructure actually moves your performance prevents over-engineering.
Read chapter - Execution Systems7 min read
Trading Automation and Bots: When to Automate (And When Not To)
Automation lets your strategy run without your attention. Done well, it removes emotional execution problems. Done badly, it amplifies bugs and broken strategies at scale.
Read chapter